linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] namei: permit linking with CAP_FOWNER in userns
@ 2015-10-10 14:59 Dirk Steinmetz
  2015-10-20 14:09 ` Dirk Steinmetz
  2015-11-03 17:51 ` Kees Cook
  0 siblings, 2 replies; 35+ messages in thread
From: Dirk Steinmetz @ 2015-10-10 14:59 UTC (permalink / raw)
  To: Alexander Viro, linux-fsdevel, linux-kernel; +Cc: Dirk Steinmetz

Attempting to hardlink to an unsafe file (e.g. a setuid binary) from
within an unprivileged user namespace fails, even if CAP_FOWNER is held
within the namespace. This may cause various failures, such as a gentoo
installation within a lxc container failing to build and install specific
packages.

This change permits hardlinking of files owned by mapped uids, if
CAP_FOWNER is held for that namespace. Furthermore, it improves consistency
by using the existing inode_owner_or_capable(), which is aware of
namespaced capabilities as of 23adbe12ef7d3 ("fs,userns: Change
inode_capable to capable_wrt_inode_uidgid").

Signed-off-by: Dirk Steinmetz <public@rsjtdrjgfuzkfg.com>
---
 fs/namei.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 726d211..29fc6a6 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -955,26 +955,23 @@ static bool safe_hardlink_source(struct inode *inode)
  *  - sysctl_protected_hardlinks enabled
  *  - fsuid does not match inode
  *  - hardlink source is unsafe (see safe_hardlink_source() above)
- *  - not CAP_FOWNER
+ *  - not CAP_FOWNER in a namespace with the inode owner uid mapped
  *
  * Returns 0 if successful, -ve on error.
  */
 static int may_linkat(struct path *link)
 {
-	const struct cred *cred;
 	struct inode *inode;
 
 	if (!sysctl_protected_hardlinks)
 		return 0;
 
-	cred = current_cred();
 	inode = link->dentry->d_inode;
 
 	/* Source inode owner (or CAP_FOWNER) can hardlink all they like,
 	 * otherwise, it must be a safe source.
 	 */
-	if (uid_eq(cred->fsuid, inode->i_uid) || safe_hardlink_source(inode) ||
-	    capable(CAP_FOWNER))
+	if (inode_owner_or_capable(inode) || safe_hardlink_source(inode))
 		return 0;
 
 	audit_log_link_denied("linkat", link);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH] namei: permit linking with CAP_FOWNER in userns
  2015-10-10 14:59 [PATCH] namei: permit linking with CAP_FOWNER in userns Dirk Steinmetz
@ 2015-10-20 14:09 ` Dirk Steinmetz
  2015-10-27 14:33   ` Seth Forshee
  2015-11-03 17:51 ` Kees Cook
  1 sibling, 1 reply; 35+ messages in thread
From: Dirk Steinmetz @ 2015-10-20 14:09 UTC (permalink / raw)
  To: Alexander Viro, linux-fsdevel, linux-kernel; +Cc: Dirk Steinmetz

Attempting to hardlink to an unsafe file (e.g. a setuid binary) from
within an unprivileged user namespace fails, even if CAP_FOWNER is held
within the namespace. This may cause various failures, such as a gentoo
installation within a lxc container failing to build and install specific
packages.

This change permits hardlinking of files owned by mapped uids, if
CAP_FOWNER is held for that namespace. Furthermore, it improves consistency
by using the existing inode_owner_or_capable(), which is aware of
namespaced capabilities as of 23adbe12ef7d3 ("fs,userns: Change
inode_capable to capable_wrt_inode_uidgid").

Signed-off-by: Dirk Steinmetz <public@rsjtdrjgfuzkfg.com>
---
This is the third time I'm sending the patch, as the first two attempts did
not provoke a reply. Feel free to point out any issues you see with it --
including formal requirements, as this is the first patch I'm submitting.
I'd really appreciate your time.

Maybe a bit of rationale behind it would be helpful as well: some linux
distributions, especially gentoo in which I discovered the behaviour,
rely on root being able to hardlink arbitrary files. In the case of gentoo,
this happens when building and installing 'man': the built binary has the
suid-flag set and is owned by a user 'man'. The installation script
(running as root) then attempts to insert a hardlink towards that binary.

Thanks to user namespaces, a regular user can use subuids to create a user
namespace, and acquire root-like capabilities within said namespace. It is
then possible to install and use arbitrary linux distributions within such
namespaces. When installing gentoo in that manner, building and installing
'man' fails, as may_linkat checks the capabilities in the init namespace,
where the installation process is owned by a regular user.

In my opinion may_linkat should permit linking in this case, as the file to
link to is owned by one of the regular user's mapped subids. Note that, in
the scenario described above, it is already possible to create the hardlink
through other means (the following listing is from an unprivileged user
namespace):
> # cat /proc/$$/status | grep CapEff
> CapEff:	0000003cfdfeffff
> # ls -l
> total 0
> -rwSr--r-- 1 nobody nobody 0 Oct 20 15:40 file
> # ln file link
> ln: failed to create hard link 'link' => 'file': Operation not permitted
> # su nobody -s /bin/bash -c "ln file link"
> # ls -l
> total 0
> -rwSr--r-- 2 nobody nobody 0 Oct 20 15:40 file
> -rwSr--r-- 2 nobody nobody 0 Oct 20 15:40 link
As you can see, the process has CAP_FOWNER in the namespace, but cannot
hardlink the file owned by 'nobody'. It can, however, use su to switch to
'nobody' and then create the link. After applying this patch, linking
works as expected.

Diffstat:
 fs/namei.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 33e9495..0d3340b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -955,26 +955,23 @@ static bool safe_hardlink_source(struct inode *inode)
  *  - sysctl_protected_hardlinks enabled
  *  - fsuid does not match inode
  *  - hardlink source is unsafe (see safe_hardlink_source() above)
- *  - not CAP_FOWNER
+ *  - not CAP_FOWNER in a namespace with the inode owner uid mapped
  *
  * Returns 0 if successful, -ve on error.
  */
 static int may_linkat(struct path *link)
 {
-	const struct cred *cred;
 	struct inode *inode;
 
 	if (!sysctl_protected_hardlinks)
 		return 0;
 
-	cred = current_cred();
 	inode = link->dentry->d_inode;
 
 	/* Source inode owner (or CAP_FOWNER) can hardlink all they like,
 	 * otherwise, it must be a safe source.
 	 */
-	if (uid_eq(cred->fsuid, inode->i_uid) || safe_hardlink_source(inode) ||
-	    capable(CAP_FOWNER))
+	if (inode_owner_or_capable(inode) || safe_hardlink_source(inode))
 		return 0;
 
 	audit_log_link_denied("linkat", link);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH] namei: permit linking with CAP_FOWNER in userns
  2015-10-20 14:09 ` Dirk Steinmetz
@ 2015-10-27 14:33   ` Seth Forshee
  2015-10-27 18:08     ` Dirk Steinmetz
  2015-10-27 21:04     ` [PATCH] namei: permit linking with CAP_FOWNER in userns Eric W. Biederman
  0 siblings, 2 replies; 35+ messages in thread
From: Seth Forshee @ 2015-10-27 14:33 UTC (permalink / raw)
  To: Dirk Steinmetz
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, Eric W. Biederman,
	Andy Lutomirski, Kees Cook, Serge Hallyn

On Tue, Oct 20, 2015 at 04:09:19PM +0200, Dirk Steinmetz wrote:
> Attempting to hardlink to an unsafe file (e.g. a setuid binary) from
> within an unprivileged user namespace fails, even if CAP_FOWNER is held
> within the namespace. This may cause various failures, such as a gentoo
> installation within a lxc container failing to build and install specific
> packages.
> 
> This change permits hardlinking of files owned by mapped uids, if
> CAP_FOWNER is held for that namespace. Furthermore, it improves consistency
> by using the existing inode_owner_or_capable(), which is aware of
> namespaced capabilities as of 23adbe12ef7d3 ("fs,userns: Change
> inode_capable to capable_wrt_inode_uidgid").
> 
> Signed-off-by: Dirk Steinmetz <public@rsjtdrjgfuzkfg.com>

Tested-by: Seth Forshee <seth.forshee@canonical.com>

This is hitting us in Ubuntu during some dpkg upgrades in containers.
When upgrading a file dpkg creates a hard link to the old file to back
it up before overwriting it. When packages upgrade suid files owned by a
non-root user the link isn't permitted, and the package upgrade fails.
This patch fixes our problem.

I did want to point what seems to be an inconsistency in how
capabilities in user namespaces are handled with respect to inodes. When
I started looking at this my initial thought was to replace
capable(CAP_FOWNER) with capable_wrt_inode_uidgid(inode, CAP_FOWNER). On
the face of it this should be equivalent to what's done here, but it
turns out that capable_wrt_inode_uidgid requires that the inode's uid
and gid are both mapped into the namespace whereas
inode_owner_or_capable only requires the uid be mapped. I'm not sure how
significant that is, but it seems a bit odd.

Seth

> ---
> This is the third time I'm sending the patch, as the first two attempts did
> not provoke a reply. Feel free to point out any issues you see with it --
> including formal requirements, as this is the first patch I'm submitting.
> I'd really appreciate your time.
> 
> Maybe a bit of rationale behind it would be helpful as well: some linux
> distributions, especially gentoo in which I discovered the behaviour,
> rely on root being able to hardlink arbitrary files. In the case of gentoo,
> this happens when building and installing 'man': the built binary has the
> suid-flag set and is owned by a user 'man'. The installation script
> (running as root) then attempts to insert a hardlink towards that binary.
> 
> Thanks to user namespaces, a regular user can use subuids to create a user
> namespace, and acquire root-like capabilities within said namespace. It is
> then possible to install and use arbitrary linux distributions within such
> namespaces. When installing gentoo in that manner, building and installing
> 'man' fails, as may_linkat checks the capabilities in the init namespace,
> where the installation process is owned by a regular user.
> 
> In my opinion may_linkat should permit linking in this case, as the file to
> link to is owned by one of the regular user's mapped subids. Note that, in
> the scenario described above, it is already possible to create the hardlink
> through other means (the following listing is from an unprivileged user
> namespace):
> > # cat /proc/$$/status | grep CapEff
> > CapEff:	0000003cfdfeffff
> > # ls -l
> > total 0
> > -rwSr--r-- 1 nobody nobody 0 Oct 20 15:40 file
> > # ln file link
> > ln: failed to create hard link 'link' => 'file': Operation not permitted
> > # su nobody -s /bin/bash -c "ln file link"
> > # ls -l
> > total 0
> > -rwSr--r-- 2 nobody nobody 0 Oct 20 15:40 file
> > -rwSr--r-- 2 nobody nobody 0 Oct 20 15:40 link
> As you can see, the process has CAP_FOWNER in the namespace, but cannot
> hardlink the file owned by 'nobody'. It can, however, use su to switch to
> 'nobody' and then create the link. After applying this patch, linking
> works as expected.
> 
> Diffstat:
>  fs/namei.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 33e9495..0d3340b 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -955,26 +955,23 @@ static bool safe_hardlink_source(struct inode *inode)
>   *  - sysctl_protected_hardlinks enabled
>   *  - fsuid does not match inode
>   *  - hardlink source is unsafe (see safe_hardlink_source() above)
> - *  - not CAP_FOWNER
> + *  - not CAP_FOWNER in a namespace with the inode owner uid mapped
>   *
>   * Returns 0 if successful, -ve on error.
>   */
>  static int may_linkat(struct path *link)
>  {
> -	const struct cred *cred;
>  	struct inode *inode;
>  
>  	if (!sysctl_protected_hardlinks)
>  		return 0;
>  
> -	cred = current_cred();
>  	inode = link->dentry->d_inode;
>  
>  	/* Source inode owner (or CAP_FOWNER) can hardlink all they like,
>  	 * otherwise, it must be a safe source.
>  	 */
> -	if (uid_eq(cred->fsuid, inode->i_uid) || safe_hardlink_source(inode) ||
> -	    capable(CAP_FOWNER))
> +	if (inode_owner_or_capable(inode) || safe_hardlink_source(inode))
>  		return 0;
>  
>  	audit_log_link_denied("linkat", link);
> -- 
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] namei: permit linking with CAP_FOWNER in userns
  2015-10-27 14:33   ` Seth Forshee
@ 2015-10-27 18:08     ` Dirk Steinmetz
  2015-10-27 20:28       ` Serge Hallyn
  2015-10-27 21:04     ` [PATCH] namei: permit linking with CAP_FOWNER in userns Eric W. Biederman
  1 sibling, 1 reply; 35+ messages in thread
From: Dirk Steinmetz @ 2015-10-27 18:08 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, Eric W. Biederman,
	Andy Lutomirski, Kees Cook, Serge Hallyn, Dirk Steinmetz

On Tue, 27 Oct 2015 09:33:44 -0500, Seth Forshee wrote:
> On Tue, Oct 20, 2015 at 04:09:19PM +0200, Dirk Steinmetz wrote:
> > Attempting to hardlink to an unsafe file (e.g. a setuid binary) from
> > within an unprivileged user namespace fails, even if CAP_FOWNER is held
> > within the namespace. This may cause various failures, such as a gentoo
> > installation within a lxc container failing to build and install specific
> > packages.
> > 
> > This change permits hardlinking of files owned by mapped uids, if
> > CAP_FOWNER is held for that namespace. Furthermore, it improves consistency
> > by using the existing inode_owner_or_capable(), which is aware of
> > namespaced capabilities as of 23adbe12ef7d3 ("fs,userns: Change
> > inode_capable to capable_wrt_inode_uidgid").
> > 
> > Signed-off-by: Dirk Steinmetz <public@rsjtdrjgfuzkfg.com>
> 
> Tested-by: Seth Forshee <seth.forshee@canonical.com>
> 
> This is hitting us in Ubuntu during some dpkg upgrades in containers.
> When upgrading a file dpkg creates a hard link to the old file to back
> it up before overwriting it. When packages upgrade suid files owned by a
> non-root user the link isn't permitted, and the package upgrade fails.
> This patch fixes our problem.
> 
> I did want to point what seems to be an inconsistency in how
> capabilities in user namespaces are handled with respect to inodes. When
> I started looking at this my initial thought was to replace
> capable(CAP_FOWNER) with capable_wrt_inode_uidgid(inode, CAP_FOWNER). On
> the face of it this should be equivalent to what's done here, but it
> turns out that capable_wrt_inode_uidgid requires that the inode's uid
> and gid are both mapped into the namespace whereas
> inode_owner_or_capable only requires the uid be mapped. I'm not sure how
> significant that is, but it seems a bit odd.

I agree that this seems odd. I've chosen inode_owner_or_capable over
capable_wrt_inode_uidgid(inode, CAP_FOWNER) as it seemed consistent:
a privileged user (with CAP_SETUID) can impersonate the owner UID and thus
bypass the check completely; this also matches the documented behavior of
CAP_FOWNER: "Bypass permission checks on operations that normally require
the filesystem UID of the process to match the UID of the file".

However, thinking about it I get the feeling that checking the gid seems
reasonable as well. This is, however, independently of user namespaces.
Consider the following scenario in any namespace, including the init one:
- A file has the setgid and user/group executable bits set, and is owned
  by user:group.
- The user 'user' is not in the group 'group', and does not have any
  capabilities.
- The user 'user' hardlinks the file. The permission check will succeed,
  as the user is the owner of the file.
- The file is replaced with a newer version (for example fixing a security
  issue)
- Now user can still use the hardlink-pinned version to execute the file
  as 'user:group' (and for example exploit the security issue).
I would have expected the user to not be able to hardlink, as he lacks
CAP_FSETID, and thus is not allowed to chmod, change or move the file
without loosing the setgid bit. So it is impossible for him to make a non-
hardlink copy with the setgid bit set -- why should he be able to make a
hardlinked one?

It seems to me as if may_linkat would additionally require a check
verifying that either
- not both setgid and group executable bit set
- fsgid == owner gid
- capable_wrt_inode_uidgid(CAP_FSETID) -- or CAP_FOWNER, depending on
  whether to adapt chmod's behavior or keeping everything hardlink-
  related in CAP_FOWNER; I don't feel qualified enough to pick ;)
This would change documented behavior (at least man proc.5's description
of /proc/sys/fs/protected_hardlinks), and I'd consider it a separate
issue, if any (as I'm unsure how realistic that scenario is). I'd
appreciate comments on that.

For other situations than setgid-executable files I do not see issues with
not checking the group id's mapping, as linking would be permitted without
privileges outside of the user namespace (disregarding namespace-internal
setuid bits).

Independently of that, it might be reasonable to consider switching
inode_owner_or_capable towards checking the gid as well and define
something along "uid checks in user namespaces with uid/gid maps require
the file's uid and gid to be mapped, else they will fail" for consistency.

Dirk


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] namei: permit linking with CAP_FOWNER in userns
  2015-10-27 18:08     ` Dirk Steinmetz
@ 2015-10-27 20:28       ` Serge Hallyn
  2015-10-28 15:07         ` Dirk Steinmetz
  0 siblings, 1 reply; 35+ messages in thread
From: Serge Hallyn @ 2015-10-27 20:28 UTC (permalink / raw)
  To: Dirk Steinmetz
  Cc: Seth Forshee, Alexander Viro, linux-fsdevel, linux-kernel,
	Eric W. Biederman, Andy Lutomirski, Kees Cook, Serge Hallyn

Quoting Dirk Steinmetz (public@rsjtdrjgfuzkfg.com):
> On Tue, 27 Oct 2015 09:33:44 -0500, Seth Forshee wrote:
> > On Tue, Oct 20, 2015 at 04:09:19PM +0200, Dirk Steinmetz wrote:
> > > Attempting to hardlink to an unsafe file (e.g. a setuid binary) from
> > > within an unprivileged user namespace fails, even if CAP_FOWNER is held
> > > within the namespace. This may cause various failures, such as a gentoo
> > > installation within a lxc container failing to build and install specific
> > > packages.
> > > 
> > > This change permits hardlinking of files owned by mapped uids, if
> > > CAP_FOWNER is held for that namespace. Furthermore, it improves consistency
> > > by using the existing inode_owner_or_capable(), which is aware of
> > > namespaced capabilities as of 23adbe12ef7d3 ("fs,userns: Change
> > > inode_capable to capable_wrt_inode_uidgid").
> > > 
> > > Signed-off-by: Dirk Steinmetz <public@rsjtdrjgfuzkfg.com>
> > 
> > Tested-by: Seth Forshee <seth.forshee@canonical.com>
> > 
> > This is hitting us in Ubuntu during some dpkg upgrades in containers.
> > When upgrading a file dpkg creates a hard link to the old file to back
> > it up before overwriting it. When packages upgrade suid files owned by a
> > non-root user the link isn't permitted, and the package upgrade fails.
> > This patch fixes our problem.
> > 
> > I did want to point what seems to be an inconsistency in how
> > capabilities in user namespaces are handled with respect to inodes. When
> > I started looking at this my initial thought was to replace
> > capable(CAP_FOWNER) with capable_wrt_inode_uidgid(inode, CAP_FOWNER). On
> > the face of it this should be equivalent to what's done here, but it
> > turns out that capable_wrt_inode_uidgid requires that the inode's uid
> > and gid are both mapped into the namespace whereas
> > inode_owner_or_capable only requires the uid be mapped. I'm not sure how
> > significant that is, but it seems a bit odd.
> 
> I agree that this seems odd. I've chosen inode_owner_or_capable over
> capable_wrt_inode_uidgid(inode, CAP_FOWNER) as it seemed consistent:
> a privileged user (with CAP_SETUID) can impersonate the owner UID and thus
> bypass the check completely; this also matches the documented behavior of
> CAP_FOWNER: "Bypass permission checks on operations that normally require
> the filesystem UID of the process to match the UID of the file".
> 
> However, thinking about it I get the feeling that checking the gid seems
> reasonable as well. This is, however, independently of user namespaces.
> Consider the following scenario in any namespace, including the init one:
> - A file has the setgid and user/group executable bits set, and is owned
>   by user:group.
> - The user 'user' is not in the group 'group', and does not have any
>   capabilities.
> - The user 'user' hardlinks the file. The permission check will succeed,
>   as the user is the owner of the file.
> - The file is replaced with a newer version (for example fixing a security
>   issue)
> - Now user can still use the hardlink-pinned version to execute the file
>   as 'user:group' (and for example exploit the security issue).
> I would have expected the user to not be able to hardlink, as he lacks
> CAP_FSETID, and thus is not allowed to chmod, change or move the file
> without loosing the setgid bit. So it is impossible for him to make a non-
> hardlink copy with the setgid bit set -- why should he be able to make a
> hardlinked one?

Yeah, this sounds sensible.  It allows a user without access to 'disk',
for instance, to become that group.

> It seems to me as if may_linkat would additionally require a check
> verifying that either
> - not both setgid and group executable bit set
> - fsgid == owner gid
> - capable_wrt_inode_uidgid(CAP_FSETID) -- or CAP_FOWNER, depending on
>   whether to adapt chmod's behavior or keeping everything hardlink-
>   related in CAP_FOWNER; I don't feel qualified enough to pick ;)

In particular just changing it is not ok since people who are using file
capabilities to grant what they currently need would be stuck with a
mysterious new failure.

> This would change documented behavior (at least man proc.5's description
> of /proc/sys/fs/protected_hardlinks), and I'd consider it a separate
> issue, if any (as I'm unsure how realistic that scenario is). I'd
> appreciate comments on that.
> 
> For other situations than setgid-executable files I do not see issues with
> not checking the group id's mapping, as linking would be permitted without
> privileges outside of the user namespace (disregarding namespace-internal
> setuid bits).
> 
> Independently of that, it might be reasonable to consider switching
> inode_owner_or_capable towards checking the gid as well and define
> something along "uid checks in user namespaces with uid/gid maps require
> the file's uid and gid to be mapped, else they will fail" for consistency.
> 
> Dirk
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] namei: permit linking with CAP_FOWNER in userns
  2015-10-27 14:33   ` Seth Forshee
  2015-10-27 18:08     ` Dirk Steinmetz
@ 2015-10-27 21:04     ` Eric W. Biederman
  1 sibling, 0 replies; 35+ messages in thread
From: Eric W. Biederman @ 2015-10-27 21:04 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Dirk Steinmetz, Alexander Viro, linux-fsdevel, linux-kernel,
	Andy Lutomirski, Kees Cook, Serge Hallyn

Seth Forshee <seth.forshee@canonical.com> writes:

> On Tue, Oct 20, 2015 at 04:09:19PM +0200, Dirk Steinmetz wrote:
>> Attempting to hardlink to an unsafe file (e.g. a setuid binary) from
>> within an unprivileged user namespace fails, even if CAP_FOWNER is held
>> within the namespace. This may cause various failures, such as a gentoo
>> installation within a lxc container failing to build and install specific
>> packages.
>> 
>> This change permits hardlinking of files owned by mapped uids, if
>> CAP_FOWNER is held for that namespace. Furthermore, it improves consistency
>> by using the existing inode_owner_or_capable(), which is aware of
>> namespaced capabilities as of 23adbe12ef7d3 ("fs,userns: Change
>> inode_capable to capable_wrt_inode_uidgid").
>> 
>> Signed-off-by: Dirk Steinmetz <public@rsjtdrjgfuzkfg.com>
>
> Tested-by: Seth Forshee <seth.forshee@canonical.com>

If multiple groups are hitting this issue for different reasons
I am applying the supplied patch.

> This is hitting us in Ubuntu during some dpkg upgrades in containers.
> When upgrading a file dpkg creates a hard link to the old file to back
> it up before overwriting it. When packages upgrade suid files owned by a
> non-root user the link isn't permitted, and the package upgrade fails.
> This patch fixes our problem.
>
> I did want to point what seems to be an inconsistency in how
> capabilities in user namespaces are handled with respect to inodes. When
> I started looking at this my initial thought was to replace
> capable(CAP_FOWNER) with capable_wrt_inode_uidgid(inode, CAP_FOWNER). On
> the face of it this should be equivalent to what's done here, but it
> turns out that capable_wrt_inode_uidgid requires that the inode's uid
> and gid are both mapped into the namespace whereas
> inode_owner_or_capable only requires the uid be mapped. I'm not sure how
> significant that is, but it seems a bit odd.

It is a bit odd.

inode_owner_or_capable in this context is a gimme, as only being the
owner of the file in question is enough to create a hard link, and root
(in the user namespace) can become that user.



That said I think there have been some legitimate questions about setgid
executables in may_linkat (raised down thread), as well as legitimate
questions about capable_wrt_uidgid.  I will add the additional question
is it sane for us to ignore the acls in capable_wrt_uidgid.

All of this appears to be an area that no one except bad actors cares
about so I expect we can change things without causing regressions, and
on that note I encourage the conversation on the oddness to continue.

Eric

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] namei: permit linking with CAP_FOWNER in userns
  2015-10-27 20:28       ` Serge Hallyn
@ 2015-10-28 15:07         ` Dirk Steinmetz
  2015-10-28 17:33           ` Serge Hallyn
  0 siblings, 1 reply; 35+ messages in thread
From: Dirk Steinmetz @ 2015-10-28 15:07 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Seth Forshee, Alexander Viro, linux-fsdevel, linux-kernel,
	Eric W. Biederman, Andy Lutomirski, Kees Cook, Serge Hallyn

On Tue, 27 Oct 2015 20:28:02 +0000, Serge Hallyn wrote:
> Quoting Dirk Steinmetz (public@rsjtdrjgfuzkfg.com):
> > On Tue, 27 Oct 2015 09:33:44 -0500, Seth Forshee wrote:
> > > I did want to point what seems to be an inconsistency in how
> > > capabilities in user namespaces are handled with respect to inodes. When
> > > I started looking at this my initial thought was to replace
> > > capable(CAP_FOWNER) with capable_wrt_inode_uidgid(inode, CAP_FOWNER). On
> > > the face of it this should be equivalent to what's done here, but it
> > > turns out that capable_wrt_inode_uidgid requires that the inode's uid
> > > and gid are both mapped into the namespace whereas
> > > inode_owner_or_capable only requires the uid be mapped. I'm not sure how
> > > significant that is, but it seems a bit odd.
> > 
> > I agree that this seems odd. I've chosen inode_owner_or_capable over
> > capable_wrt_inode_uidgid(inode, CAP_FOWNER) as it seemed consistent:
> > a privileged user (with CAP_SETUID) can impersonate the owner UID and thus
> > bypass the check completely; this also matches the documented behavior of
> > CAP_FOWNER: "Bypass permission checks on operations that normally require
> > the filesystem UID of the process to match the UID of the file".
> > 
> > However, thinking about it I get the feeling that checking the gid seems
> > reasonable as well. This is, however, independently of user namespaces.
> > Consider the following scenario in any namespace, including the init one:
> > - A file has the setgid and user/group executable bits set, and is owned
> >   by user:group.
> > - The user 'user' is not in the group 'group', and does not have any
> >   capabilities.
> > - The user 'user' hardlinks the file. The permission check will succeed,
> >   as the user is the owner of the file.
> > - The file is replaced with a newer version (for example fixing a security
> >   issue)
> > - Now user can still use the hardlink-pinned version to execute the file
> >   as 'user:group' (and for example exploit the security issue).
> > I would have expected the user to not be able to hardlink, as he lacks
> > CAP_FSETID, and thus is not allowed to chmod, change or move the file
> > without loosing the setgid bit. So it is impossible for him to make a non-
> > hardlink copy with the setgid bit set -- why should he be able to make a
> > hardlinked one?
> 
> Yeah, this sounds sensible.  It allows a user without access to 'disk',
> for instance, to become that group.
> 
> > It seems to me as if may_linkat would additionally require a check
> > verifying that either
> > - not both setgid and group executable bit set
> > - fsgid == owner gid
> > - capable_wrt_inode_uidgid(CAP_FSETID) -- or CAP_FOWNER, depending on
> >   whether to adapt chmod's behavior or keeping everything hardlink-
> >   related in CAP_FOWNER; I don't feel qualified enough to pick ;)
> 
> In particular just changing it is not ok since people who are using file
> capabilities to grant what they currently need would be stuck with a
> mysterious new failure.

Is there any use case (besides exploiting hardlinks with malicious intent)
that would be broken when changing this? There are some (imho) rather
unlikely conditions to be met in order to observe changed behavior:
- a user owns an executable setgid-file belonging to a group he is not in
- the user does not have CAP_FSETID (or CAP_FOWNER, depending on which one
  is chosen to be required)
- the user is for some legitimate reason supposed to hardlink the file
If these conditions are not met in practice, the change would not break
anything. In that case, it would be imho better to not provide
backward-compatibility to reduce complexity in these checks. Else, I'd
propose adding a new possible value '2' for
/proc/sys/fs/protected_hardlinks, while keeping '1' for the current
behavior.

I can provide an initial draft for either of the options, but would like
recommendations to which of the two ways to take (or is there a third
one?), as well as comments on the new condition itself: may_linkat would
block hardlinks when all of the following conditions are met:
- sysctrl_protected_hardlinks is enabled or 2 (depending on way)
- inode uid != fsuid and no CAP_FOWNER (for userns: with mapping on uid),
  while the hardlink source is not a regular file, is a setuid-executable
  or is not accessible for reading and writing
- inode gid not fsgid or in supplemental gids and no CAP_FSETID (for
  userns: with mapping on gid -- not sure whether the uid is relevant?),
  while the hardlink source is a setgid-executable (with group executable
  bit set)

If anyone else wants to fix the issue, thats fine with me as well.

Dirk

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] namei: permit linking with CAP_FOWNER in userns
  2015-10-28 15:07         ` Dirk Steinmetz
@ 2015-10-28 17:33           ` Serge Hallyn
  2015-11-02 15:10             ` Dirk Steinmetz
  0 siblings, 1 reply; 35+ messages in thread
From: Serge Hallyn @ 2015-10-28 17:33 UTC (permalink / raw)
  To: Dirk Steinmetz
  Cc: Seth Forshee, Alexander Viro, linux-fsdevel, linux-kernel,
	Eric W. Biederman, Andy Lutomirski, Kees Cook, Serge Hallyn

Quoting Dirk Steinmetz (public@rsjtdrjgfuzkfg.com):
> On Tue, 27 Oct 2015 20:28:02 +0000, Serge Hallyn wrote:
> > Quoting Dirk Steinmetz (public@rsjtdrjgfuzkfg.com):
> > > On Tue, 27 Oct 2015 09:33:44 -0500, Seth Forshee wrote:
> > > > I did want to point what seems to be an inconsistency in how
> > > > capabilities in user namespaces are handled with respect to inodes. When
> > > > I started looking at this my initial thought was to replace
> > > > capable(CAP_FOWNER) with capable_wrt_inode_uidgid(inode, CAP_FOWNER). On
> > > > the face of it this should be equivalent to what's done here, but it
> > > > turns out that capable_wrt_inode_uidgid requires that the inode's uid
> > > > and gid are both mapped into the namespace whereas
> > > > inode_owner_or_capable only requires the uid be mapped. I'm not sure how
> > > > significant that is, but it seems a bit odd.
> > > 
> > > I agree that this seems odd. I've chosen inode_owner_or_capable over
> > > capable_wrt_inode_uidgid(inode, CAP_FOWNER) as it seemed consistent:
> > > a privileged user (with CAP_SETUID) can impersonate the owner UID and thus
> > > bypass the check completely; this also matches the documented behavior of
> > > CAP_FOWNER: "Bypass permission checks on operations that normally require
> > > the filesystem UID of the process to match the UID of the file".
> > > 
> > > However, thinking about it I get the feeling that checking the gid seems
> > > reasonable as well. This is, however, independently of user namespaces.
> > > Consider the following scenario in any namespace, including the init one:
> > > - A file has the setgid and user/group executable bits set, and is owned
> > >   by user:group.
> > > - The user 'user' is not in the group 'group', and does not have any
> > >   capabilities.
> > > - The user 'user' hardlinks the file. The permission check will succeed,
> > >   as the user is the owner of the file.
> > > - The file is replaced with a newer version (for example fixing a security
> > >   issue)
> > > - Now user can still use the hardlink-pinned version to execute the file
> > >   as 'user:group' (and for example exploit the security issue).
> > > I would have expected the user to not be able to hardlink, as he lacks
> > > CAP_FSETID, and thus is not allowed to chmod, change or move the file
> > > without loosing the setgid bit. So it is impossible for him to make a non-
> > > hardlink copy with the setgid bit set -- why should he be able to make a
> > > hardlinked one?
> > 
> > Yeah, this sounds sensible.  It allows a user without access to 'disk',
> > for instance, to become that group.
> > 
> > > It seems to me as if may_linkat would additionally require a check
> > > verifying that either
> > > - not both setgid and group executable bit set
> > > - fsgid == owner gid
> > > - capable_wrt_inode_uidgid(CAP_FSETID) -- or CAP_FOWNER, depending on
> > >   whether to adapt chmod's behavior or keeping everything hardlink-
> > >   related in CAP_FOWNER; I don't feel qualified enough to pick ;)
> > 
> > In particular just changing it is not ok since people who are using file
> > capabilities to grant what they currently need would be stuck with a
> > mysterious new failure.
> 
> Is there any use case (besides exploiting hardlinks with malicious intent)
> that would be broken when changing this? There are some (imho) rather
> unlikely conditions to be met in order to observe changed behavior:

The simplest example would be if I wanted to run a very quick program to
just add the symbolic link.  Let's say the link /usr/sbin/uuidd were owned
by root:disk and setuid and setgid.  The proposed change would force me
to bind in both the root user and disk group, whereas without it I can
just bind in only the root user.

We've already dealt with such regressions and iirc agreed that they were
worthwhile.

> - a user owns an executable setgid-file belonging to a group he is not in
> - the user does not have CAP_FSETID (or CAP_FOWNER, depending on which one
>   is chosen to be required)
> - the user is for some legitimate reason supposed to hardlink the file
> If these conditions are not met in practice, the change would not break
> anything. In that case, it would be imho better to not provide
> backward-compatibility to reduce complexity in these checks. Else, I'd
> propose adding a new possible value '2' for
> /proc/sys/fs/protected_hardlinks, while keeping '1' for the current
> behavior.
> 
> I can provide an initial draft for either of the options, but would like
> recommendations to which of the two ways to take (or is there a third
> one?), as well as comments on the new condition itself: may_linkat would
> block hardlinks when all of the following conditions are met:
> - sysctrl_protected_hardlinks is enabled or 2 (depending on way)
> - inode uid != fsuid and no CAP_FOWNER (for userns: with mapping on uid),
>   while the hardlink source is not a regular file, is a setuid-executable
>   or is not accessible for reading and writing
> - inode gid not fsgid or in supplemental gids and no CAP_FSETID (for
>   userns: with mapping on gid -- not sure whether the uid is relevant?),
>   while the hardlink source is a setgid-executable (with group executable
>   bit set)
> 
> If anyone else wants to fix the issue, thats fine with me as well.
> 
> Dirk

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] namei: permit linking with CAP_FOWNER in userns
  2015-10-28 17:33           ` Serge Hallyn
@ 2015-11-02 15:10             ` Dirk Steinmetz
  2015-11-02 18:02               ` Serge Hallyn
  0 siblings, 1 reply; 35+ messages in thread
From: Dirk Steinmetz @ 2015-11-02 15:10 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Seth Forshee, Alexander Viro, linux-fsdevel, linux-kernel,
	Eric W. Biederman, Andy Lutomirski, Kees Cook, Serge Hallyn

On Wed, 28 Oct 2015 17:33:10 +0000, Serge Hallyn wrote:
> Quoting Dirk Steinmetz (public@rsjtdrjgfuzkfg.com):
> > On Tue, 27 Oct 2015 20:28:02 +0000, Serge Hallyn wrote:
> > > Quoting Dirk Steinmetz (public@rsjtdrjgfuzkfg.com):
> > > > On Tue, 27 Oct 2015 09:33:44 -0500, Seth Forshee wrote:
> > > > > I did want to point what seems to be an inconsistency in how
> > > > > capabilities in user namespaces are handled with respect to inodes. When
> > > > > I started looking at this my initial thought was to replace
> > > > > capable(CAP_FOWNER) with capable_wrt_inode_uidgid(inode, CAP_FOWNER). On
> > > > > the face of it this should be equivalent to what's done here, but it
> > > > > turns out that capable_wrt_inode_uidgid requires that the inode's uid
> > > > > and gid are both mapped into the namespace whereas
> > > > > inode_owner_or_capable only requires the uid be mapped. I'm not sure how
> > > > > significant that is, but it seems a bit odd.
> > > > 
> > > > I agree that this seems odd. I've chosen inode_owner_or_capable over
> > > > capable_wrt_inode_uidgid(inode, CAP_FOWNER) as it seemed consistent:
> > > > a privileged user (with CAP_SETUID) can impersonate the owner UID and thus
> > > > bypass the check completely; this also matches the documented behavior of
> > > > CAP_FOWNER: "Bypass permission checks on operations that normally require
> > > > the filesystem UID of the process to match the UID of the file".
> > > > 
> > > > However, thinking about it I get the feeling that checking the gid seems
> > > > reasonable as well. This is, however, independently of user namespaces.
> > > > Consider the following scenario in any namespace, including the init one:
> > > > - A file has the setgid and user/group executable bits set, and is owned
> > > >   by user:group.
> > > > - The user 'user' is not in the group 'group', and does not have any
> > > >   capabilities.
> > > > - The user 'user' hardlinks the file. The permission check will succeed,
> > > >   as the user is the owner of the file.
> > > > - The file is replaced with a newer version (for example fixing a security
> > > >   issue)
> > > > - Now user can still use the hardlink-pinned version to execute the file
> > > >   as 'user:group' (and for example exploit the security issue).
> > > > I would have expected the user to not be able to hardlink, as he lacks
> > > > CAP_FSETID, and thus is not allowed to chmod, change or move the file
> > > > without loosing the setgid bit. So it is impossible for him to make a non-
> > > > hardlink copy with the setgid bit set -- why should he be able to make a
> > > > hardlinked one?
> > > 
> > > Yeah, this sounds sensible.  It allows a user without access to 'disk',
> > > for instance, to become that group.
> > > 
> > > > It seems to me as if may_linkat would additionally require a check
> > > > verifying that either
> > > > - not both setgid and group executable bit set
> > > > - fsgid == owner gid
> > > > - capable_wrt_inode_uidgid(CAP_FSETID) -- or CAP_FOWNER, depending on
> > > >   whether to adapt chmod's behavior or keeping everything hardlink-
> > > >   related in CAP_FOWNER; I don't feel qualified enough to pick ;)
> > > 
> > > In particular just changing it is not ok since people who are using file
> > > capabilities to grant what they currently need would be stuck with a
> > > mysterious new failure.
> > 
> > Is there any use case (besides exploiting hardlinks with malicious intent)
> > that would be broken when changing this? There are some (imho) rather
> > unlikely conditions to be met in order to observe changed behavior:
> 
> The simplest example would be if I wanted to run a very quick program to
> just add the symbolic link.  Let's say the link /usr/sbin/uuidd were owned
> by root:disk and setuid and setgid.  The proposed change would force me
> to bind in both the root user and disk group, whereas without it I can
> just bind in only the root user.
While root usually has CAP_FSETID and CAP_FOWNER, which would still permit
linking in this case, I agree that the change could be visible when
performing specific maintenance tasks in some rare setups.

> We've already dealt with such regressions and iirc agreed that they were
> worthwhile.
Would you prefer to not fix the issue at all, then? Or would you prefer to
add a new value on /proc/sys/fs/protected_hardlinks -- which would still
cause the symptoms you describe on distributions using the new value, but
would be more easy to change for users knowing that this is an issue?

I personally still favor changing the behavior and documentation over a new
value there, as -- once identified by the user -- the user can easily adapt
his usage or disable protected hardlinks globally, depending on the actual
requirements. And another value will not improve the abilities of the user
to identify protected hardlinks as reason for the changed behavior.

> > - a user owns an executable setgid-file belonging to a group he is not in
> > - the user does not have CAP_FSETID (or CAP_FOWNER, depending on which one
> >   is chosen to be required)
> > - the user is for some legitimate reason supposed to hardlink the file
> > If these conditions are not met in practice, the change would not break
> > anything. In that case, it would be imho better to not provide
> > backward-compatibility to reduce complexity in these checks. Else, I'd
> > propose adding a new possible value '2' for
> > /proc/sys/fs/protected_hardlinks, while keeping '1' for the current
> > behavior.
> > 
> > I can provide an initial draft for either of the options, but would like
> > recommendations to which of the two ways to take (or is there a third
> > one?), as well as comments on the new condition itself: may_linkat would
> > block hardlinks when all of the following conditions are met:
> > - sysctrl_protected_hardlinks is enabled or 2 (depending on way)
> > - inode uid != fsuid and no CAP_FOWNER (for userns: with mapping on uid),
> >   while the hardlink source is not a regular file, is a setuid-executable
> >   or is not accessible for reading and writing
> > - inode gid not fsgid or in supplemental gids and no CAP_FSETID (for
> >   userns: with mapping on gid -- not sure whether the uid is relevant?),
> >   while the hardlink source is a setgid-executable (with group executable
> >   bit set)
> > 
> > If anyone else wants to fix the issue, thats fine with me as well.
> > 
> > Dirk


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] namei: permit linking with CAP_FOWNER in userns
  2015-11-02 15:10             ` Dirk Steinmetz
@ 2015-11-02 18:02               ` Serge Hallyn
  2015-11-02 19:57                 ` Andy Lutomirski
  0 siblings, 1 reply; 35+ messages in thread
From: Serge Hallyn @ 2015-11-02 18:02 UTC (permalink / raw)
  To: Dirk Steinmetz
  Cc: Seth Forshee, Alexander Viro, linux-fsdevel, linux-kernel,
	Eric W. Biederman, Andy Lutomirski, Kees Cook, Serge Hallyn

Quoting Dirk Steinmetz (public@rsjtdrjgfuzkfg.com):
> On Wed, 28 Oct 2015 17:33:10 +0000, Serge Hallyn wrote:
> > Quoting Dirk Steinmetz (public@rsjtdrjgfuzkfg.com):
> > > On Tue, 27 Oct 2015 20:28:02 +0000, Serge Hallyn wrote:
> > > > Quoting Dirk Steinmetz (public@rsjtdrjgfuzkfg.com):
> > > > > On Tue, 27 Oct 2015 09:33:44 -0500, Seth Forshee wrote:
> > > > > > I did want to point what seems to be an inconsistency in how
> > > > > > capabilities in user namespaces are handled with respect to inodes. When
> > > > > > I started looking at this my initial thought was to replace
> > > > > > capable(CAP_FOWNER) with capable_wrt_inode_uidgid(inode, CAP_FOWNER). On
> > > > > > the face of it this should be equivalent to what's done here, but it
> > > > > > turns out that capable_wrt_inode_uidgid requires that the inode's uid
> > > > > > and gid are both mapped into the namespace whereas
> > > > > > inode_owner_or_capable only requires the uid be mapped. I'm not sure how
> > > > > > significant that is, but it seems a bit odd.
> > > > > 
> > > > > I agree that this seems odd. I've chosen inode_owner_or_capable over
> > > > > capable_wrt_inode_uidgid(inode, CAP_FOWNER) as it seemed consistent:
> > > > > a privileged user (with CAP_SETUID) can impersonate the owner UID and thus
> > > > > bypass the check completely; this also matches the documented behavior of
> > > > > CAP_FOWNER: "Bypass permission checks on operations that normally require
> > > > > the filesystem UID of the process to match the UID of the file".
> > > > > 
> > > > > However, thinking about it I get the feeling that checking the gid seems
> > > > > reasonable as well. This is, however, independently of user namespaces.
> > > > > Consider the following scenario in any namespace, including the init one:
> > > > > - A file has the setgid and user/group executable bits set, and is owned
> > > > >   by user:group.
> > > > > - The user 'user' is not in the group 'group', and does not have any
> > > > >   capabilities.
> > > > > - The user 'user' hardlinks the file. The permission check will succeed,
> > > > >   as the user is the owner of the file.
> > > > > - The file is replaced with a newer version (for example fixing a security
> > > > >   issue)
> > > > > - Now user can still use the hardlink-pinned version to execute the file
> > > > >   as 'user:group' (and for example exploit the security issue).
> > > > > I would have expected the user to not be able to hardlink, as he lacks
> > > > > CAP_FSETID, and thus is not allowed to chmod, change or move the file
> > > > > without loosing the setgid bit. So it is impossible for him to make a non-
> > > > > hardlink copy with the setgid bit set -- why should he be able to make a
> > > > > hardlinked one?
> > > > 
> > > > Yeah, this sounds sensible.  It allows a user without access to 'disk',
> > > > for instance, to become that group.
> > > > 
> > > > > It seems to me as if may_linkat would additionally require a check
> > > > > verifying that either
> > > > > - not both setgid and group executable bit set
> > > > > - fsgid == owner gid
> > > > > - capable_wrt_inode_uidgid(CAP_FSETID) -- or CAP_FOWNER, depending on
> > > > >   whether to adapt chmod's behavior or keeping everything hardlink-
> > > > >   related in CAP_FOWNER; I don't feel qualified enough to pick ;)
> > > > 
> > > > In particular just changing it is not ok since people who are using file
> > > > capabilities to grant what they currently need would be stuck with a
> > > > mysterious new failure.
> > > 
> > > Is there any use case (besides exploiting hardlinks with malicious intent)
> > > that would be broken when changing this? There are some (imho) rather
> > > unlikely conditions to be met in order to observe changed behavior:
> > 
> > The simplest example would be if I wanted to run a very quick program to
> > just add the symbolic link.  Let's say the link /usr/sbin/uuidd were owned
> > by root:disk and setuid and setgid.  The proposed change would force me
> > to bind in both the root user and disk group, whereas without it I can
> > just bind in only the root user.
> While root usually has CAP_FSETID and CAP_FOWNER, which would still permit
> linking in this case, I agree that the change could be visible when
> performing specific maintenance tasks in some rare setups.
> 
> > We've already dealt with such regressions and iirc agreed that they were
> > worthwhile.
> Would you prefer to not fix the issue at all, then? Or would you prefer to

No.  I think I was saying I think it's worth adding the 'gid must be mapped'
requirement.

And I was saying that changing the capability needed is not ok.

> add a new value on /proc/sys/fs/protected_hardlinks -- which would still
> cause the symptoms you describe on distributions using the new value, but
> would be more easy to change for users knowing that this is an issue?
> 
> I personally still favor changing the behavior and documentation over a new
> value there, as -- once identified by the user -- the user can easily adapt

I agree.

Note the difference - changing the capability required to link the
file can affect (probably rare, but definately) normal, non-user-namespace
setups.  Changing the link requirements in a user namespace so that gid
must be mapped only affects a case which we've previously said should not
be supported.

Linus may still disagree - not changing what userspace can do is pretty
fundamental, but this was purely a missed security fix iiuc.

> his usage or disable protected hardlinks globally, depending on the actual
> requirements. And another value will not improve the abilities of the user
> to identify protected hardlinks as reason for the changed behavior.

Agreed, and more to the point changing the sysctl required woudl still be
breaking userspace.  We may as well require the proper, safe setup, rather
than requiring a unsafe sysctl.

-serge

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] namei: permit linking with CAP_FOWNER in userns
  2015-11-02 18:02               ` Serge Hallyn
@ 2015-11-02 19:57                 ` Andy Lutomirski
  2015-11-03  0:39                   ` [RFC] namei: prevent sgid-hardlinks for unmapped gids Dirk Steinmetz
  0 siblings, 1 reply; 35+ messages in thread
From: Andy Lutomirski @ 2015-11-02 19:57 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Dirk Steinmetz, Seth Forshee, Alexander Viro, Linux FS Devel,
	linux-kernel, Eric W. Biederman, Kees Cook, Serge Hallyn

On Mon, Nov 2, 2015 at 10:02 AM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
> Quoting Dirk Steinmetz (public@rsjtdrjgfuzkfg.com):
>>
>> > We've already dealt with such regressions and iirc agreed that they were
>> > worthwhile.
>> Would you prefer to not fix the issue at all, then? Or would you prefer to
>
> No.  I think I was saying I think it's worth adding the 'gid must be mapped'
> requirement.
>
> And I was saying that changing the capability needed is not ok.
>
>> add a new value on /proc/sys/fs/protected_hardlinks -- which would still
>> cause the symptoms you describe on distributions using the new value, but
>> would be more easy to change for users knowing that this is an issue?
>>
>> I personally still favor changing the behavior and documentation over a new
>> value there, as -- once identified by the user -- the user can easily adapt
>
> I agree.
>
> Note the difference - changing the capability required to link the
> file can affect (probably rare, but definately) normal, non-user-namespace
> setups.  Changing the link requirements in a user namespace so that gid
> must be mapped only affects a case which we've previously said should not
> be supported.

I think it would have no effect at all on setups that don't use
userns, so at least the exposure to potential ABI issues would be
small.

>
> Linus may still disagree - not changing what userspace can do is pretty
> fundamental, but this was purely a missed security fix iiuc.

IIRC I just didn't do it because I didn't want to think about it at
the time, and it didn't look like a *big* security issue.

--Andy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-02 19:57                 ` Andy Lutomirski
@ 2015-11-03  0:39                   ` Dirk Steinmetz
  2015-11-03 15:44                     ` Serge Hallyn
  2015-11-03 18:20                     ` Kees Cook
  0 siblings, 2 replies; 35+ messages in thread
From: Dirk Steinmetz @ 2015-11-03  0:39 UTC (permalink / raw)
  To: Serge Hallyn, Seth Forshee, Alexander Viro, Linux FS Devel,
	linux-kernel, Eric W . Biederman, Kees Cook, Serge Hallyn
  Cc: Dirk Steinmetz

In order to hardlink to a sgid-executable, it is sufficient to be the
file's owner. When hardlinking within an unprivileged user namespace, the
users of that namespace could thus use hardlinks to pin setgid binaries
owned by themselves (or any mapped uid, with CAP_FOWNER) and a gid outside
of the namespace. This is a possible security risk.

This change prevents hardlinking of sgid-executables within user
namespaces, if the file is not owned by a mapped gid.

Signed-off-by: Dirk Steinmetz <public@rsjtdrjgfuzkfg.com>
---

MISSING: Documentation/sysctl/fs.txt not updated, as this patch is
intended for discussion.

If there are no further misunderstandings on my side, this patch is what
Serge and I agree on (modulo my not-that-much-linux-kernel-experience
codestyle, feel free to suggest improvements!).

The new condition for sgid-executables is equivalent to
> inode_owner_or_capable(inode) && kgid_has_mapping(ns, inode->i_gid)
which, as recommended by Serge, does not change the behaviour for the init
namespace. It fixes the problem of pinning parent namespace's gids.

However, I think the "same" security issue is also valid within any
namespace, for regular users pinning other gids within the same namespace.
I already presented an example for that in a previous mail:
- A file has the setgid and user/group executable bits set, and is owned
  by user:group.
- The user 'user' is not in the group 'group', and does not have any
  capabilities.
- The user 'user' hardlinks the file. The permission check will succeed,
  as the user is the owner of the file.
- The file is replaced with a newer version (for example fixing a security
  issue)
- Now user can still use the hardlink-pinned version to execute the file
  as 'user:group' (and for example exploit the security issue).

To prevent that, the condition would need to be changed to something like
inode_group_or_capable, resembling inode_owner_or_capable, but checking
that the caller is in the group the inode belongs to or has some
capability (for consistency with former behaviour, CAP_FOWNER? for
consistency with the documentation, CAP_FSETID?). However, this would
change userland behaviour outside of userns. Thus my main question:
Is the scenario above bad enough to change userland behaviour?

I'd apprechiate your comments.

- Dirk


Diffstat:
 fs/namei.c | 47 ++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 38 insertions(+), 9 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 29fc6a6..9c6c2e2 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -913,18 +913,19 @@ static inline int may_follow_link(struct nameidata *nd)
 }
 
 /**
- * safe_hardlink_source - Check for safe hardlink conditions
+ * safe_hardlink_source_uid - Check for safe hardlink conditions not dependent
+ * on the inode's group. These conditions may be overridden by inode ownership
+ * or CAP_FOWNER with respect to the inode's uid
  * @inode: the source inode to hardlink from
  *
  * Return false if at least one of the following conditions:
  *    - inode is not a regular file
  *    - inode is setuid
- *    - inode is setgid and group-exec
  *    - access failure for read and write
  *
  * Otherwise returns true.
  */
-static bool safe_hardlink_source(struct inode *inode)
+static bool safe_hardlink_source_uid(struct inode *inode)
 {
 	umode_t mode = inode->i_mode;
 
@@ -936,10 +937,6 @@ static bool safe_hardlink_source(struct inode *inode)
 	if (mode & S_ISUID)
 		return false;
 
-	/* Executable setgid files should not get pinned to the filesystem. */
-	if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
-		return false;
-
 	/* Hardlinking to unreadable or unwritable sources is dangerous. */
 	if (inode_permission(inode, MAY_READ | MAY_WRITE))
 		return false;
@@ -948,30 +945,62 @@ static bool safe_hardlink_source(struct inode *inode)
 }
 
 /**
+ * safe_hardlink_source_gid - Check for safe hardlink conditions dependent
+ * on the inode's group. These conditions may be overridden by inode ownership
+ * or CAP_FOWNER with respect to the inode's gid
+ * @inode: the source inode to hardlink from
+ *
+ * Return false if inode is setgid and group-exec
+ *
+ * Otherwise returns true.
+ */
+static bool safe_hardlink_source_gid(struct inode *inode)
+{
+	umode_t mode = inode->i_mode;
+
+	/* Executable setgid files should not get pinned to the filesystem. */
+	if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
+		return false;
+
+	return true;
+}
+
+/**
  * may_linkat - Check permissions for creating a hardlink
  * @link: the source to hardlink from
  *
  * Block hardlink when all of:
  *  - sysctl_protected_hardlinks enabled
  *  - fsuid does not match inode
- *  - hardlink source is unsafe (see safe_hardlink_source() above)
+ *  - hardlink source is unsafe (see safe_hardlink_source_*() above)
  *  - not CAP_FOWNER in a namespace with the inode owner uid mapped
+ *    (and inode gid mapped, if hardlink conditions depending on the inode's
+ *    group are not satisfied)
  *
  * Returns 0 if successful, -ve on error.
  */
 static int may_linkat(struct path *link)
 {
 	struct inode *inode;
+	struct user_namespace *ns;
+	bool owner;
+	bool safe_uid;
+	bool safe_gid;
 
 	if (!sysctl_protected_hardlinks)
 		return 0;
 
 	inode = link->dentry->d_inode;
+	ns = current_user_ns();
 
 	/* Source inode owner (or CAP_FOWNER) can hardlink all they like,
 	 * otherwise, it must be a safe source.
 	 */
-	if (inode_owner_or_capable(inode) || safe_hardlink_source(inode))
+	owner = inode_owner_or_capable(inode);
+	safe_uid = safe_hardlink_source_uid(inode) || owner;
+	safe_gid = safe_hardlink_source_gid(inode) ||
+			(owner && kgid_has_mapping(ns, inode->i_gid));
+	if (safe_uid && safe_gid)
 		return 0;
 
 	audit_log_link_denied("linkat", link);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-03  0:39                   ` [RFC] namei: prevent sgid-hardlinks for unmapped gids Dirk Steinmetz
@ 2015-11-03 15:44                     ` Serge Hallyn
  2015-11-03 18:20                     ` Kees Cook
  1 sibling, 0 replies; 35+ messages in thread
From: Serge Hallyn @ 2015-11-03 15:44 UTC (permalink / raw)
  To: Dirk Steinmetz
  Cc: Seth Forshee, Alexander Viro, Linux FS Devel, linux-kernel,
	Eric W . Biederman, Kees Cook, Serge Hallyn

Quoting Dirk Steinmetz (public@rsjtdrjgfuzkfg.com):
> In order to hardlink to a sgid-executable, it is sufficient to be the
> file's owner. When hardlinking within an unprivileged user namespace, the
> users of that namespace could thus use hardlinks to pin setgid binaries
> owned by themselves (or any mapped uid, with CAP_FOWNER) and a gid outside
> of the namespace. This is a possible security risk.
> 
> This change prevents hardlinking of sgid-executables within user
> namespaces, if the file is not owned by a mapped gid.
> 
> Signed-off-by: Dirk Steinmetz <public@rsjtdrjgfuzkfg.com>

Hey,

Hoping this gets a close review by Kees, but this looks good to me, thanks!

Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

> ---
> 
> MISSING: Documentation/sysctl/fs.txt not updated, as this patch is
> intended for discussion.
> 
> If there are no further misunderstandings on my side, this patch is what
> Serge and I agree on (modulo my not-that-much-linux-kernel-experience
> codestyle, feel free to suggest improvements!).
> 
> The new condition for sgid-executables is equivalent to
> > inode_owner_or_capable(inode) && kgid_has_mapping(ns, inode->i_gid)
> which, as recommended by Serge, does not change the behaviour for the init
> namespace. It fixes the problem of pinning parent namespace's gids.
> 
> However, I think the "same" security issue is also valid within any
> namespace, for regular users pinning other gids within the same namespace.
> I already presented an example for that in a previous mail:
> - A file has the setgid and user/group executable bits set, and is owned
>   by user:group.
> - The user 'user' is not in the group 'group', and does not have any
>   capabilities.
> - The user 'user' hardlinks the file. The permission check will succeed,
>   as the user is the owner of the file.
> - The file is replaced with a newer version (for example fixing a security
>   issue)
> - Now user can still use the hardlink-pinned version to execute the file
>   as 'user:group' (and for example exploit the security issue).
> 
> To prevent that, the condition would need to be changed to something like
> inode_group_or_capable, resembling inode_owner_or_capable, but checking
> that the caller is in the group the inode belongs to or has some
> capability (for consistency with former behaviour, CAP_FOWNER? for
> consistency with the documentation, CAP_FSETID?). However, this would
> change userland behaviour outside of userns. Thus my main question:
> Is the scenario above bad enough to change userland behaviour?
> 
> I'd apprechiate your comments.
> 
> - Dirk
> 
> 
> Diffstat:
>  fs/namei.c | 47 ++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 38 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 29fc6a6..9c6c2e2 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -913,18 +913,19 @@ static inline int may_follow_link(struct nameidata *nd)
>  }
>  
>  /**
> - * safe_hardlink_source - Check for safe hardlink conditions
> + * safe_hardlink_source_uid - Check for safe hardlink conditions not dependent
> + * on the inode's group. These conditions may be overridden by inode ownership
> + * or CAP_FOWNER with respect to the inode's uid
>   * @inode: the source inode to hardlink from
>   *
>   * Return false if at least one of the following conditions:
>   *    - inode is not a regular file
>   *    - inode is setuid
> - *    - inode is setgid and group-exec
>   *    - access failure for read and write
>   *
>   * Otherwise returns true.
>   */
> -static bool safe_hardlink_source(struct inode *inode)
> +static bool safe_hardlink_source_uid(struct inode *inode)
>  {
>  	umode_t mode = inode->i_mode;
>  
> @@ -936,10 +937,6 @@ static bool safe_hardlink_source(struct inode *inode)
>  	if (mode & S_ISUID)
>  		return false;
>  
> -	/* Executable setgid files should not get pinned to the filesystem. */
> -	if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
> -		return false;
> -
>  	/* Hardlinking to unreadable or unwritable sources is dangerous. */
>  	if (inode_permission(inode, MAY_READ | MAY_WRITE))
>  		return false;
> @@ -948,30 +945,62 @@ static bool safe_hardlink_source(struct inode *inode)
>  }
>  
>  /**
> + * safe_hardlink_source_gid - Check for safe hardlink conditions dependent
> + * on the inode's group. These conditions may be overridden by inode ownership
> + * or CAP_FOWNER with respect to the inode's gid
> + * @inode: the source inode to hardlink from
> + *
> + * Return false if inode is setgid and group-exec
> + *
> + * Otherwise returns true.
> + */
> +static bool safe_hardlink_source_gid(struct inode *inode)
> +{
> +	umode_t mode = inode->i_mode;
> +
> +	/* Executable setgid files should not get pinned to the filesystem. */
> +	if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
> +		return false;
> +
> +	return true;
> +}
> +
> +/**
>   * may_linkat - Check permissions for creating a hardlink
>   * @link: the source to hardlink from
>   *
>   * Block hardlink when all of:
>   *  - sysctl_protected_hardlinks enabled
>   *  - fsuid does not match inode
> - *  - hardlink source is unsafe (see safe_hardlink_source() above)
> + *  - hardlink source is unsafe (see safe_hardlink_source_*() above)
>   *  - not CAP_FOWNER in a namespace with the inode owner uid mapped
> + *    (and inode gid mapped, if hardlink conditions depending on the inode's
> + *    group are not satisfied)
>   *
>   * Returns 0 if successful, -ve on error.
>   */
>  static int may_linkat(struct path *link)
>  {
>  	struct inode *inode;
> +	struct user_namespace *ns;
> +	bool owner;
> +	bool safe_uid;
> +	bool safe_gid;
>  
>  	if (!sysctl_protected_hardlinks)
>  		return 0;
>  
>  	inode = link->dentry->d_inode;
> +	ns = current_user_ns();
>  
>  	/* Source inode owner (or CAP_FOWNER) can hardlink all they like,
>  	 * otherwise, it must be a safe source.
>  	 */
> -	if (inode_owner_or_capable(inode) || safe_hardlink_source(inode))
> +	owner = inode_owner_or_capable(inode);
> +	safe_uid = safe_hardlink_source_uid(inode) || owner;
> +	safe_gid = safe_hardlink_source_gid(inode) ||
> +			(owner && kgid_has_mapping(ns, inode->i_gid));
> +	if (safe_uid && safe_gid)
>  		return 0;
>  
>  	audit_log_link_denied("linkat", link);
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] namei: permit linking with CAP_FOWNER in userns
  2015-10-10 14:59 [PATCH] namei: permit linking with CAP_FOWNER in userns Dirk Steinmetz
  2015-10-20 14:09 ` Dirk Steinmetz
@ 2015-11-03 17:51 ` Kees Cook
  1 sibling, 0 replies; 35+ messages in thread
From: Kees Cook @ 2015-11-03 17:51 UTC (permalink / raw)
  To: Dirk Steinmetz
  Cc: Eric W. Biederman, Alexander Viro, linux-fsdevel, LKML,
	Andy Lutomirski, Serge Hallyn, Seth Forshee

On Sat, Oct 10, 2015 at 7:59 AM, Dirk Steinmetz
<public@rsjtdrjgfuzkfg.com> wrote:
> Attempting to hardlink to an unsafe file (e.g. a setuid binary) from
> within an unprivileged user namespace fails, even if CAP_FOWNER is held
> within the namespace. This may cause various failures, such as a gentoo
> installation within a lxc container failing to build and install specific
> packages.
>
> This change permits hardlinking of files owned by mapped uids, if
> CAP_FOWNER is held for that namespace. Furthermore, it improves consistency
> by using the existing inode_owner_or_capable(), which is aware of
> namespaced capabilities as of 23adbe12ef7d3 ("fs,userns: Change
> inode_capable to capable_wrt_inode_uidgid").
>
> Signed-off-by: Dirk Steinmetz <public@rsjtdrjgfuzkfg.com>

Sorry for the delay: was travelling when I got put on CC. (FWIW, in
the future, please check the scripts/get_maintainer.pl script with
--git-blame to build CC lists, then I would have been CCed earlier.)

I think Eric's already taken this patch, but it looks correct to me:

Acked-by: Kees Cook <keescook@chromium.org>

I'll hop on the other thread to discuss the setgid issue.

-Kees

> ---
>  fs/namei.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 726d211..29fc6a6 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -955,26 +955,23 @@ static bool safe_hardlink_source(struct inode *inode)
>   *  - sysctl_protected_hardlinks enabled
>   *  - fsuid does not match inode
>   *  - hardlink source is unsafe (see safe_hardlink_source() above)
> - *  - not CAP_FOWNER
> + *  - not CAP_FOWNER in a namespace with the inode owner uid mapped
>   *
>   * Returns 0 if successful, -ve on error.
>   */
>  static int may_linkat(struct path *link)
>  {
> -       const struct cred *cred;
>         struct inode *inode;
>
>         if (!sysctl_protected_hardlinks)
>                 return 0;
>
> -       cred = current_cred();
>         inode = link->dentry->d_inode;
>
>         /* Source inode owner (or CAP_FOWNER) can hardlink all they like,
>          * otherwise, it must be a safe source.
>          */
> -       if (uid_eq(cred->fsuid, inode->i_uid) || safe_hardlink_source(inode) ||
> -           capable(CAP_FOWNER))
> +       if (inode_owner_or_capable(inode) || safe_hardlink_source(inode))
>                 return 0;
>
>         audit_log_link_denied("linkat", link);
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-03  0:39                   ` [RFC] namei: prevent sgid-hardlinks for unmapped gids Dirk Steinmetz
  2015-11-03 15:44                     ` Serge Hallyn
@ 2015-11-03 18:20                     ` Kees Cook
  2015-11-03 23:21                       ` Dirk Steinmetz
  2015-11-04 14:46                       ` Serge E. Hallyn
  1 sibling, 2 replies; 35+ messages in thread
From: Kees Cook @ 2015-11-03 18:20 UTC (permalink / raw)
  To: Dirk Steinmetz
  Cc: Serge Hallyn, Seth Forshee, Alexander Viro, Linux FS Devel, LKML,
	Eric W . Biederman, Serge Hallyn

On Mon, Nov 2, 2015 at 4:39 PM, Dirk Steinmetz
<public@rsjtdrjgfuzkfg.com> wrote:
> In order to hardlink to a sgid-executable, it is sufficient to be the
> file's owner. When hardlinking within an unprivileged user namespace, the
> users of that namespace could thus use hardlinks to pin setgid binaries
> owned by themselves (or any mapped uid, with CAP_FOWNER) and a gid outside
> of the namespace. This is a possible security risk.

How would such a file appear within the namespace? Wouldn't the gid
have to map to something inside the namespace?

>
> This change prevents hardlinking of sgid-executables within user
> namespaces, if the file is not owned by a mapped gid.

For clarity, this should say "... is not group-owned by a mapped git." correct?

> Signed-off-by: Dirk Steinmetz <public@rsjtdrjgfuzkfg.com>
> ---
>
> MISSING: Documentation/sysctl/fs.txt not updated, as this patch is
> intended for discussion.
>
> If there are no further misunderstandings on my side, this patch is what
> Serge and I agree on (modulo my not-that-much-linux-kernel-experience
> codestyle, feel free to suggest improvements!).
>
> The new condition for sgid-executables is equivalent to
>> inode_owner_or_capable(inode) && kgid_has_mapping(ns, inode->i_gid)
> which, as recommended by Serge, does not change the behaviour for the init
> namespace. It fixes the problem of pinning parent namespace's gids.
>
> However, I think the "same" security issue is also valid within any
> namespace, for regular users pinning other gids within the same namespace.
> I already presented an example for that in a previous mail:
> - A file has the setgid and user/group executable bits set, and is owned
>   by user:group.
> - The user 'user' is not in the group 'group', and does not have any
>   capabilities.
> - The user 'user' hardlinks the file. The permission check will succeed,
>   as the user is the owner of the file.
> - The file is replaced with a newer version (for example fixing a security
>   issue)
> - Now user can still use the hardlink-pinned version to execute the file
>   as 'user:group' (and for example exploit the security issue).

I believe this to be an unneeded check is the stated configuration
(setgid but without group ownership) is itself a security flaw. This
allows the user already to gain those group privileges even without
needing to pin the file or do anything:

setgid executable that reports euid and egid:

$ cat poof.c
#include <stdio.h>

int main(void)
{
    printf("%d:%d\n", geteuid(), getegid());
    return 0;
}
$ make poof
cc     poof.c   -o poof
$ sudo chgrp root poof && sudo chmod g+s poof
$ ls -la poof
-rwxr-s--- 1 keescook root 8658 Nov  3 10:14 poof
$ ./poof
149786:0

I am not a member of the 0 group:

$ id
uid=149786(keescook) gid=5000(eng)
groups=5000(eng),4(adm),20(dialout),21(fax),24(cdrom),25(floppy),26(tape),27(sudo),30(dip),44(video),46(plugdev),106(fuse),110(lpadmin),124(sambashare),129(pkcs11),133(libvirtd),999(logindev)

Now I mmap the file, and rewrite it (here I change the format string
from a : separator to a -, but we just just as easily injected code):

$ cat mod.c
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>

int main(void)
{
    int fd;
    struct stat info;
    unsigned char *ptr;
    off_t i;

    fd = open("poof", O_RDWR);
    fstat(fd, &info);
    ptr = mmap(NULL, info.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    close(fd);

    for (i = 0; i < info.st_size; i++) {
        if (0 == strncmp(ptr + i, "%d:%d", 5)) {
            ptr[i + 2] = '-';
        }
    }
    munmap(ptr, info.st_size);

    system("./poof");

    return 0;
}
$ make mod
cc     mod.c   -o mod
$ ./mod
149786-0
$ ls -la poof
-rwxr-s--- 1 keescook root 8658 Nov  3 10:17 poof

So, I don't think this patch actually makes anything safer, though
there might be a namespace mapping element I've not understood.

-Kees

>
> To prevent that, the condition would need to be changed to something like
> inode_group_or_capable, resembling inode_owner_or_capable, but checking
> that the caller is in the group the inode belongs to or has some
> capability (for consistency with former behaviour, CAP_FOWNER? for
> consistency with the documentation, CAP_FSETID?). However, this would
> change userland behaviour outside of userns. Thus my main question:
> Is the scenario above bad enough to change userland behaviour?
>
> I'd apprechiate your comments.
>
> - Dirk
>
>
> Diffstat:
>  fs/namei.c | 47 ++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 38 insertions(+), 9 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 29fc6a6..9c6c2e2 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -913,18 +913,19 @@ static inline int may_follow_link(struct nameidata *nd)
>  }
>
>  /**
> - * safe_hardlink_source - Check for safe hardlink conditions
> + * safe_hardlink_source_uid - Check for safe hardlink conditions not dependent
> + * on the inode's group. These conditions may be overridden by inode ownership
> + * or CAP_FOWNER with respect to the inode's uid
>   * @inode: the source inode to hardlink from
>   *
>   * Return false if at least one of the following conditions:
>   *    - inode is not a regular file
>   *    - inode is setuid
> - *    - inode is setgid and group-exec
>   *    - access failure for read and write
>   *
>   * Otherwise returns true.
>   */
> -static bool safe_hardlink_source(struct inode *inode)
> +static bool safe_hardlink_source_uid(struct inode *inode)
>  {
>         umode_t mode = inode->i_mode;
>
> @@ -936,10 +937,6 @@ static bool safe_hardlink_source(struct inode *inode)
>         if (mode & S_ISUID)
>                 return false;
>
> -       /* Executable setgid files should not get pinned to the filesystem. */
> -       if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
> -               return false;
> -
>         /* Hardlinking to unreadable or unwritable sources is dangerous. */
>         if (inode_permission(inode, MAY_READ | MAY_WRITE))
>                 return false;
> @@ -948,30 +945,62 @@ static bool safe_hardlink_source(struct inode *inode)
>  }
>
>  /**
> + * safe_hardlink_source_gid - Check for safe hardlink conditions dependent
> + * on the inode's group. These conditions may be overridden by inode ownership
> + * or CAP_FOWNER with respect to the inode's gid
> + * @inode: the source inode to hardlink from
> + *
> + * Return false if inode is setgid and group-exec
> + *
> + * Otherwise returns true.
> + */
> +static bool safe_hardlink_source_gid(struct inode *inode)
> +{
> +       umode_t mode = inode->i_mode;
> +
> +       /* Executable setgid files should not get pinned to the filesystem. */
> +       if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
> +               return false;
> +
> +       return true;
> +}
> +
> +/**
>   * may_linkat - Check permissions for creating a hardlink
>   * @link: the source to hardlink from
>   *
>   * Block hardlink when all of:
>   *  - sysctl_protected_hardlinks enabled
>   *  - fsuid does not match inode
> - *  - hardlink source is unsafe (see safe_hardlink_source() above)
> + *  - hardlink source is unsafe (see safe_hardlink_source_*() above)
>   *  - not CAP_FOWNER in a namespace with the inode owner uid mapped
> + *    (and inode gid mapped, if hardlink conditions depending on the inode's
> + *    group are not satisfied)
>   *
>   * Returns 0 if successful, -ve on error.
>   */
>  static int may_linkat(struct path *link)
>  {
>         struct inode *inode;
> +       struct user_namespace *ns;
> +       bool owner;
> +       bool safe_uid;
> +       bool safe_gid;
>
>         if (!sysctl_protected_hardlinks)
>                 return 0;
>
>         inode = link->dentry->d_inode;
> +       ns = current_user_ns();
>
>         /* Source inode owner (or CAP_FOWNER) can hardlink all they like,
>          * otherwise, it must be a safe source.
>          */
> -       if (inode_owner_or_capable(inode) || safe_hardlink_source(inode))
> +       owner = inode_owner_or_capable(inode);
> +       safe_uid = safe_hardlink_source_uid(inode) || owner;
> +       safe_gid = safe_hardlink_source_gid(inode) ||
> +                       (owner && kgid_has_mapping(ns, inode->i_gid));
> +       if (safe_uid && safe_gid)
>                 return 0;
>
>         audit_log_link_denied("linkat", link);
> --
> 2.1.4
>



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-03 18:20                     ` Kees Cook
@ 2015-11-03 23:21                       ` Dirk Steinmetz
  2015-11-03 23:29                         ` Kees Cook
  2015-11-04 14:46                       ` Serge E. Hallyn
  1 sibling, 1 reply; 35+ messages in thread
From: Dirk Steinmetz @ 2015-11-03 23:21 UTC (permalink / raw)
  To: Kees Cook
  Cc: Serge Hallyn, Seth Forshee, Alexander Viro, Linux FS Devel, LKML,
	Eric W . Biederman, Serge Hallyn

On Tue, 3 Nov 2015 10:20:38 -0800, Kees Cook wrote:
> On Mon, Nov 2, 2015 at 4:39 PM, Dirk Steinmetz
> <public@rsjtdrjgfuzkfg.com> wrote:
> > In order to hardlink to a sgid-executable, it is sufficient to be the
> > file's owner. When hardlinking within an unprivileged user namespace, the
> > users of that namespace could thus use hardlinks to pin setgid binaries
> > owned by themselves (or any mapped uid, with CAP_FOWNER) and a gid outside
> > of the namespace. This is a possible security risk.
> 
> How would such a file appear within the namespace? Wouldn't the gid
> have to map to something inside the namespace?
> 
> >
> > This change prevents hardlinking of sgid-executables within user
> > namespaces, if the file is not owned by a mapped gid.
> 
> For clarity, this should say "... is not group-owned by a mapped git." correct?
> 
> > Signed-off-by: Dirk Steinmetz <public@rsjtdrjgfuzkfg.com>
> > ---
> >
> > MISSING: Documentation/sysctl/fs.txt not updated, as this patch is
> > intended for discussion.
> >
> > If there are no further misunderstandings on my side, this patch is what
> > Serge and I agree on (modulo my not-that-much-linux-kernel-experience
> > codestyle, feel free to suggest improvements!).
> >
> > The new condition for sgid-executables is equivalent to
> >> inode_owner_or_capable(inode) && kgid_has_mapping(ns, inode->i_gid)
> > which, as recommended by Serge, does not change the behaviour for the init
> > namespace. It fixes the problem of pinning parent namespace's gids.
> >
> > However, I think the "same" security issue is also valid within any
> > namespace, for regular users pinning other gids within the same namespace.
> > I already presented an example for that in a previous mail:
> > - A file has the setgid and user/group executable bits set, and is owned
> >   by user:group.
> > - The user 'user' is not in the group 'group', and does not have any
> >   capabilities.
> > - The user 'user' hardlinks the file. The permission check will succeed,
> >   as the user is the owner of the file.
> > - The file is replaced with a newer version (for example fixing a security
> >   issue)
> > - Now user can still use the hardlink-pinned version to execute the file
> >   as 'user:group' (and for example exploit the security issue).
> 
> I believe this to be an unneeded check is the stated configuration
> (setgid but without group ownership) is itself a security flaw. This
> allows the user already to gain those group privileges even without
> needing to pin the file or do anything:
> 
> setgid executable that reports euid and egid:
> 
> $ cat poof.c
> #include <stdio.h>
> 
> int main(void)
> {
>     printf("%d:%d\n", geteuid(), getegid());
>     return 0;
> }
> $ make poof
> cc     poof.c   -o poof
> $ sudo chgrp root poof && sudo chmod g+s poof
> $ ls -la poof
> -rwxr-s--- 1 keescook root 8658 Nov  3 10:14 poof
> $ ./poof
> 149786:0
> 
> I am not a member of the 0 group:
> 
> $ id
> uid=149786(keescook) gid=5000(eng)
> groups=5000(eng),4(adm),20(dialout),21(fax),24(cdrom),25(floppy),26(tape),27(sudo),30(dip),44(video),46(plugdev),106(fuse),110(lpadmin),124(sambashare),129(pkcs11),133(libvirtd),999(logindev)
> 
> Now I mmap the file, and rewrite it (here I change the format string
> from a : separator to a -, but we just just as easily injected code):
> 
> $ cat mod.c
> #include <stdio.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <sys/mman.h>
> 
> int main(void)
> {
>     int fd;
>     struct stat info;
>     unsigned char *ptr;
>     off_t i;
> 
>     fd = open("poof", O_RDWR);
>     fstat(fd, &info);
>     ptr = mmap(NULL, info.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>     close(fd);
> 
>     for (i = 0; i < info.st_size; i++) {
>         if (0 == strncmp(ptr + i, "%d:%d", 5)) {
>             ptr[i + 2] = '-';
>         }
>     }
>     munmap(ptr, info.st_size);
> 
>     system("./poof");
> 
>     return 0;
> }
> $ make mod
> cc     mod.c   -o mod
> $ ./mod
> 149786-0
> $ ls -la poof
> -rwxr-s--- 1 keescook root 8658 Nov  3 10:17 poof
> 
> So, I don't think this patch actually makes anything safer, though
> there might be a namespace mapping element I've not understood.

Thank you for that beautiful demonstration!

I agree. I was unaware that sgid-executables on foreign groups were already
considered as insecure, as the documentation states
"As a security measure, depending on the filesystem, the set-user-ID
and set-group-ID execution bits may be turned off if a file is written.
(On Linux this occurs if the writing process does not have the
CAP_FSETID capability.)" [1] -- I blindly assumed that this would be the
case for most filesystems and incorrectly derived that sgid with foreign
groups are a supported scenario. As you demonstrated, they are not, and
thus this patch and the sgid-issue discussed are irrelevant, as the
scenario is unsafe by design and not by error.

The only question remaining is whether the documentation should be
re-worded to prevent other people from doing the same mistake and assuming
the scenario would be supported.

However, I'm unsure how to word it properly; my best guess would be
"The set-user-ID and set-group-ID execution bits may be turned off if a
file is written. (On Linux, this is guaranteed to only happen if the
writing process does not have the CAP_FSETID capability. You should not
rely on this behavior as a security measure: any user having write access
to or owning the file should be considered as able to impersonate the
file's owner/group-owner)"

- Dirk


Reference:
[1] <http://man7.org/linux/man-pages/man2/chmod.2.html>

> 
> -Kees
> 
> >
> > To prevent that, the condition would need to be changed to something like
> > inode_group_or_capable, resembling inode_owner_or_capable, but checking
> > that the caller is in the group the inode belongs to or has some
> > capability (for consistency with former behaviour, CAP_FOWNER? for
> > consistency with the documentation, CAP_FSETID?). However, this would
> > change userland behaviour outside of userns. Thus my main question:
> > Is the scenario above bad enough to change userland behaviour?
> >
> > I'd apprechiate your comments.
> >
> > - Dirk
> >
> >
> > Diffstat:
> >  fs/namei.c | 47 ++++++++++++++++++++++++++++++++++++++---------
> >  1 file changed, 38 insertions(+), 9 deletions(-)
> >
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 29fc6a6..9c6c2e2 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -913,18 +913,19 @@ static inline int may_follow_link(struct nameidata *nd)
> >  }
> >
> >  /**
> > - * safe_hardlink_source - Check for safe hardlink conditions
> > + * safe_hardlink_source_uid - Check for safe hardlink conditions not dependent
> > + * on the inode's group. These conditions may be overridden by inode ownership
> > + * or CAP_FOWNER with respect to the inode's uid
> >   * @inode: the source inode to hardlink from
> >   *
> >   * Return false if at least one of the following conditions:
> >   *    - inode is not a regular file
> >   *    - inode is setuid
> > - *    - inode is setgid and group-exec
> >   *    - access failure for read and write
> >   *
> >   * Otherwise returns true.
> >   */
> > -static bool safe_hardlink_source(struct inode *inode)
> > +static bool safe_hardlink_source_uid(struct inode *inode)
> >  {
> >         umode_t mode = inode->i_mode;
> >
> > @@ -936,10 +937,6 @@ static bool safe_hardlink_source(struct inode *inode)
> >         if (mode & S_ISUID)
> >                 return false;
> >
> > -       /* Executable setgid files should not get pinned to the filesystem. */
> > -       if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
> > -               return false;
> > -
> >         /* Hardlinking to unreadable or unwritable sources is dangerous. */
> >         if (inode_permission(inode, MAY_READ | MAY_WRITE))
> >                 return false;
> > @@ -948,30 +945,62 @@ static bool safe_hardlink_source(struct inode *inode)
> >  }
> >
> >  /**
> > + * safe_hardlink_source_gid - Check for safe hardlink conditions dependent
> > + * on the inode's group. These conditions may be overridden by inode ownership
> > + * or CAP_FOWNER with respect to the inode's gid
> > + * @inode: the source inode to hardlink from
> > + *
> > + * Return false if inode is setgid and group-exec
> > + *
> > + * Otherwise returns true.
> > + */
> > +static bool safe_hardlink_source_gid(struct inode *inode)
> > +{
> > +       umode_t mode = inode->i_mode;
> > +
> > +       /* Executable setgid files should not get pinned to the filesystem. */
> > +       if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
> > +               return false;
> > +
> > +       return true;
> > +}
> > +
> > +/**
> >   * may_linkat - Check permissions for creating a hardlink
> >   * @link: the source to hardlink from
> >   *
> >   * Block hardlink when all of:
> >   *  - sysctl_protected_hardlinks enabled
> >   *  - fsuid does not match inode
> > - *  - hardlink source is unsafe (see safe_hardlink_source() above)
> > + *  - hardlink source is unsafe (see safe_hardlink_source_*() above)
> >   *  - not CAP_FOWNER in a namespace with the inode owner uid mapped
> > + *    (and inode gid mapped, if hardlink conditions depending on the inode's
> > + *    group are not satisfied)
> >   *
> >   * Returns 0 if successful, -ve on error.
> >   */
> >  static int may_linkat(struct path *link)
> >  {
> >         struct inode *inode;
> > +       struct user_namespace *ns;
> > +       bool owner;
> > +       bool safe_uid;
> > +       bool safe_gid;
> >
> >         if (!sysctl_protected_hardlinks)
> >                 return 0;
> >
> >         inode = link->dentry->d_inode;
> > +       ns = current_user_ns();
> >
> >         /* Source inode owner (or CAP_FOWNER) can hardlink all they like,
> >          * otherwise, it must be a safe source.
> >          */
> > -       if (inode_owner_or_capable(inode) || safe_hardlink_source(inode))
> > +       owner = inode_owner_or_capable(inode);
> > +       safe_uid = safe_hardlink_source_uid(inode) || owner;
> > +       safe_gid = safe_hardlink_source_gid(inode) ||
> > +                       (owner && kgid_has_mapping(ns, inode->i_gid));
> > +       if (safe_uid && safe_gid)
> >                 return 0;
> >
> >         audit_log_link_denied("linkat", link);
> > --
> > 2.1.4
> >
> 
> 
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-03 23:21                       ` Dirk Steinmetz
@ 2015-11-03 23:29                         ` Kees Cook
  2015-11-04  6:58                           ` Willy Tarreau
  0 siblings, 1 reply; 35+ messages in thread
From: Kees Cook @ 2015-11-03 23:29 UTC (permalink / raw)
  To: Dirk Steinmetz, Michael Kerrisk-manpages
  Cc: Serge Hallyn, Seth Forshee, Alexander Viro, Linux FS Devel, LKML,
	Eric W . Biederman, Serge Hallyn, security

On Tue, Nov 3, 2015 at 3:21 PM, Dirk Steinmetz
<public@rsjtdrjgfuzkfg.com> wrote:
> On Tue, 3 Nov 2015 10:20:38 -0800, Kees Cook wrote:
>> On Mon, Nov 2, 2015 at 4:39 PM, Dirk Steinmetz
>> <public@rsjtdrjgfuzkfg.com> wrote:
>> > In order to hardlink to a sgid-executable, it is sufficient to be the
>> > file's owner. When hardlinking within an unprivileged user namespace, the
>> > users of that namespace could thus use hardlinks to pin setgid binaries
>> > owned by themselves (or any mapped uid, with CAP_FOWNER) and a gid outside
>> > of the namespace. This is a possible security risk.
>>
>> How would such a file appear within the namespace? Wouldn't the gid
>> have to map to something inside the namespace?
>>
>> >
>> > This change prevents hardlinking of sgid-executables within user
>> > namespaces, if the file is not owned by a mapped gid.
>>
>> For clarity, this should say "... is not group-owned by a mapped git." correct?
>>
>> > Signed-off-by: Dirk Steinmetz <public@rsjtdrjgfuzkfg.com>
>> > ---
>> >
>> > MISSING: Documentation/sysctl/fs.txt not updated, as this patch is
>> > intended for discussion.
>> >
>> > If there are no further misunderstandings on my side, this patch is what
>> > Serge and I agree on (modulo my not-that-much-linux-kernel-experience
>> > codestyle, feel free to suggest improvements!).
>> >
>> > The new condition for sgid-executables is equivalent to
>> >> inode_owner_or_capable(inode) && kgid_has_mapping(ns, inode->i_gid)
>> > which, as recommended by Serge, does not change the behaviour for the init
>> > namespace. It fixes the problem of pinning parent namespace's gids.
>> >
>> > However, I think the "same" security issue is also valid within any
>> > namespace, for regular users pinning other gids within the same namespace.
>> > I already presented an example for that in a previous mail:
>> > - A file has the setgid and user/group executable bits set, and is owned
>> >   by user:group.
>> > - The user 'user' is not in the group 'group', and does not have any
>> >   capabilities.
>> > - The user 'user' hardlinks the file. The permission check will succeed,
>> >   as the user is the owner of the file.
>> > - The file is replaced with a newer version (for example fixing a security
>> >   issue)
>> > - Now user can still use the hardlink-pinned version to execute the file
>> >   as 'user:group' (and for example exploit the security issue).
>>
>> I believe this to be an unneeded check is the stated configuration
>> (setgid but without group ownership) is itself a security flaw. This
>> allows the user already to gain those group privileges even without
>> needing to pin the file or do anything:
>>
>> setgid executable that reports euid and egid:
>>
>> $ cat poof.c
>> #include <stdio.h>
>>
>> int main(void)
>> {
>>     printf("%d:%d\n", geteuid(), getegid());
>>     return 0;
>> }
>> $ make poof
>> cc     poof.c   -o poof
>> $ sudo chgrp root poof && sudo chmod g+s poof
>> $ ls -la poof
>> -rwxr-s--- 1 keescook root 8658 Nov  3 10:14 poof
>> $ ./poof
>> 149786:0
>>
>> I am not a member of the 0 group:
>>
>> $ id
>> uid=149786(keescook) gid=5000(eng)
>> groups=5000(eng),4(adm),20(dialout),21(fax),24(cdrom),25(floppy),26(tape),27(sudo),30(dip),44(video),46(plugdev),106(fuse),110(lpadmin),124(sambashare),129(pkcs11),133(libvirtd),999(logindev)
>>
>> Now I mmap the file, and rewrite it (here I change the format string
>> from a : separator to a -, but we just just as easily injected code):
>>
>> $ cat mod.c
>> #include <stdio.h>
>> #include <sys/types.h>
>> #include <sys/stat.h>
>> #include <fcntl.h>
>> #include <unistd.h>
>> #include <sys/mman.h>
>>
>> int main(void)
>> {
>>     int fd;
>>     struct stat info;
>>     unsigned char *ptr;
>>     off_t i;
>>
>>     fd = open("poof", O_RDWR);
>>     fstat(fd, &info);
>>     ptr = mmap(NULL, info.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>>     close(fd);
>>
>>     for (i = 0; i < info.st_size; i++) {
>>         if (0 == strncmp(ptr + i, "%d:%d", 5)) {
>>             ptr[i + 2] = '-';
>>         }
>>     }
>>     munmap(ptr, info.st_size);
>>
>>     system("./poof");
>>
>>     return 0;
>> }
>> $ make mod
>> cc     mod.c   -o mod
>> $ ./mod
>> 149786-0
>> $ ls -la poof
>> -rwxr-s--- 1 keescook root 8658 Nov  3 10:17 poof
>>
>> So, I don't think this patch actually makes anything safer, though
>> there might be a namespace mapping element I've not understood.
>
> Thank you for that beautiful demonstration!

You're welcome. It was a fun diversion. Weirdly, it's only possible
via mmap. Using "write" does kill the set-gid bit. I haven't looked at
why.
Al or anyone else, is there a meaningful distinction here? Should the
mmap MAP_SHARED-write trigger the loss of the set-gid bit too? While
holding the file open with either open or mmap, I get a Text-in-use
error, so I would kind of expect the same behavior between either
close() and munmap(). I wonder if this is a bug, and if so, then your
link patch is indeed useful again. :)

> I agree. I was unaware that sgid-executables on foreign groups were already
> considered as insecure, as the documentation states
> "As a security measure, depending on the filesystem, the set-user-ID
> and set-group-ID execution bits may be turned off if a file is written.
> (On Linux this occurs if the writing process does not have the
> CAP_FSETID capability.)" [1] -- I blindly assumed that this would be the
> case for most filesystems and incorrectly derived that sgid with foreign
> groups are a supported scenario. As you demonstrated, they are not, and
> thus this patch and the sgid-issue discussed are irrelevant, as the
> scenario is unsafe by design and not by error.
>
> The only question remaining is whether the documentation should be
> re-worded to prevent other people from doing the same mistake and assuming
> the scenario would be supported.
>
> However, I'm unsure how to word it properly; my best guess would be
> "The set-user-ID and set-group-ID execution bits may be turned off if a
> file is written. (On Linux, this is guaranteed to only happen if the
> writing process does not have the CAP_FSETID capability. You should not
> rely on this behavior as a security measure: any user having write access
> to or owning the file should be considered as able to impersonate the
> file's owner/group-owner)"

That wording seems good to me. Adding Michael for his thoughts.

-Kees

>
> - Dirk
>
>
> Reference:
> [1] <http://man7.org/linux/man-pages/man2/chmod.2.html>
>
>>
>> -Kees
>>
>> >
>> > To prevent that, the condition would need to be changed to something like
>> > inode_group_or_capable, resembling inode_owner_or_capable, but checking
>> > that the caller is in the group the inode belongs to or has some
>> > capability (for consistency with former behaviour, CAP_FOWNER? for
>> > consistency with the documentation, CAP_FSETID?). However, this would
>> > change userland behaviour outside of userns. Thus my main question:
>> > Is the scenario above bad enough to change userland behaviour?
>> >
>> > I'd apprechiate your comments.
>> >
>> > - Dirk
>> >
>> >
>> > Diffstat:
>> >  fs/namei.c | 47 ++++++++++++++++++++++++++++++++++++++---------
>> >  1 file changed, 38 insertions(+), 9 deletions(-)
>> >
>> > diff --git a/fs/namei.c b/fs/namei.c
>> > index 29fc6a6..9c6c2e2 100644
>> > --- a/fs/namei.c
>> > +++ b/fs/namei.c
>> > @@ -913,18 +913,19 @@ static inline int may_follow_link(struct nameidata *nd)
>> >  }
>> >
>> >  /**
>> > - * safe_hardlink_source - Check for safe hardlink conditions
>> > + * safe_hardlink_source_uid - Check for safe hardlink conditions not dependent
>> > + * on the inode's group. These conditions may be overridden by inode ownership
>> > + * or CAP_FOWNER with respect to the inode's uid
>> >   * @inode: the source inode to hardlink from
>> >   *
>> >   * Return false if at least one of the following conditions:
>> >   *    - inode is not a regular file
>> >   *    - inode is setuid
>> > - *    - inode is setgid and group-exec
>> >   *    - access failure for read and write
>> >   *
>> >   * Otherwise returns true.
>> >   */
>> > -static bool safe_hardlink_source(struct inode *inode)
>> > +static bool safe_hardlink_source_uid(struct inode *inode)
>> >  {
>> >         umode_t mode = inode->i_mode;
>> >
>> > @@ -936,10 +937,6 @@ static bool safe_hardlink_source(struct inode *inode)
>> >         if (mode & S_ISUID)
>> >                 return false;
>> >
>> > -       /* Executable setgid files should not get pinned to the filesystem. */
>> > -       if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
>> > -               return false;
>> > -
>> >         /* Hardlinking to unreadable or unwritable sources is dangerous. */
>> >         if (inode_permission(inode, MAY_READ | MAY_WRITE))
>> >                 return false;
>> > @@ -948,30 +945,62 @@ static bool safe_hardlink_source(struct inode *inode)
>> >  }
>> >
>> >  /**
>> > + * safe_hardlink_source_gid - Check for safe hardlink conditions dependent
>> > + * on the inode's group. These conditions may be overridden by inode ownership
>> > + * or CAP_FOWNER with respect to the inode's gid
>> > + * @inode: the source inode to hardlink from
>> > + *
>> > + * Return false if inode is setgid and group-exec
>> > + *
>> > + * Otherwise returns true.
>> > + */
>> > +static bool safe_hardlink_source_gid(struct inode *inode)
>> > +{
>> > +       umode_t mode = inode->i_mode;
>> > +
>> > +       /* Executable setgid files should not get pinned to the filesystem. */
>> > +       if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
>> > +               return false;
>> > +
>> > +       return true;
>> > +}
>> > +
>> > +/**
>> >   * may_linkat - Check permissions for creating a hardlink
>> >   * @link: the source to hardlink from
>> >   *
>> >   * Block hardlink when all of:
>> >   *  - sysctl_protected_hardlinks enabled
>> >   *  - fsuid does not match inode
>> > - *  - hardlink source is unsafe (see safe_hardlink_source() above)
>> > + *  - hardlink source is unsafe (see safe_hardlink_source_*() above)
>> >   *  - not CAP_FOWNER in a namespace with the inode owner uid mapped
>> > + *    (and inode gid mapped, if hardlink conditions depending on the inode's
>> > + *    group are not satisfied)
>> >   *
>> >   * Returns 0 if successful, -ve on error.
>> >   */
>> >  static int may_linkat(struct path *link)
>> >  {
>> >         struct inode *inode;
>> > +       struct user_namespace *ns;
>> > +       bool owner;
>> > +       bool safe_uid;
>> > +       bool safe_gid;
>> >
>> >         if (!sysctl_protected_hardlinks)
>> >                 return 0;
>> >
>> >         inode = link->dentry->d_inode;
>> > +       ns = current_user_ns();
>> >
>> >         /* Source inode owner (or CAP_FOWNER) can hardlink all they like,
>> >          * otherwise, it must be a safe source.
>> >          */
>> > -       if (inode_owner_or_capable(inode) || safe_hardlink_source(inode))
>> > +       owner = inode_owner_or_capable(inode);
>> > +       safe_uid = safe_hardlink_source_uid(inode) || owner;
>> > +       safe_gid = safe_hardlink_source_gid(inode) ||
>> > +                       (owner && kgid_has_mapping(ns, inode->i_gid));
>> > +       if (safe_uid && safe_gid)
>> >                 return 0;
>> >
>> >         audit_log_link_denied("linkat", link);
>> > --
>> > 2.1.4
>> >
>>
>>
>>
>



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-03 23:29                         ` Kees Cook
@ 2015-11-04  6:58                           ` Willy Tarreau
  2015-11-04 17:59                             ` Andy Lutomirski
  0 siblings, 1 reply; 35+ messages in thread
From: Willy Tarreau @ 2015-11-04  6:58 UTC (permalink / raw)
  To: Kees Cook
  Cc: Dirk Steinmetz, Michael Kerrisk-manpages, Serge Hallyn,
	Seth Forshee, Alexander Viro, Linux FS Devel, LKML,
	Eric W . Biederman, Serge Hallyn, security

On Tue, Nov 03, 2015 at 03:29:55PM -0800, Kees Cook wrote:
> Using "write" does kill the set-gid bit. I haven't looked at
> why.
> Al or anyone else, is there a meaningful distinction here?

I remember this one, I got caught once while trying to put a shell into
a suid-writable file to get some privileges someone forgot to offer me :-)

It's done by should_remove_suid() which is called upon write() and truncate().

> Should the
> mmap MAP_SHARED-write trigger the loss of the set-gid bit too? While
> holding the file open with either open or mmap, I get a Text-in-use
> error, so I would kind of expect the same behavior between either
> close() and munmap(). I wonder if this is a bug, and if so, then your
> link patch is indeed useful again. :)

I don't see how this could be done with mmap(). Maybe we have a way to know
when the first write is performed via this path, I have no idea.

Willy


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-03 18:20                     ` Kees Cook
  2015-11-03 23:21                       ` Dirk Steinmetz
@ 2015-11-04 14:46                       ` Serge E. Hallyn
  1 sibling, 0 replies; 35+ messages in thread
From: Serge E. Hallyn @ 2015-11-04 14:46 UTC (permalink / raw)
  To: Kees Cook
  Cc: Dirk Steinmetz, Serge Hallyn, Seth Forshee, Alexander Viro,
	Linux FS Devel, LKML, Eric W . Biederman, Serge Hallyn

On Tue, Nov 03, 2015 at 10:20:38AM -0800, Kees Cook wrote:
> On Mon, Nov 2, 2015 at 4:39 PM, Dirk Steinmetz
> <public@rsjtdrjgfuzkfg.com> wrote:
> > In order to hardlink to a sgid-executable, it is sufficient to be the
> > file's owner. When hardlinking within an unprivileged user namespace, the
> > users of that namespace could thus use hardlinks to pin setgid binaries
> > owned by themselves (or any mapped uid, with CAP_FOWNER) and a gid outside
> > of the namespace. This is a possible security risk.
> 
> How would such a file appear within the namespace? Wouldn't the gid
> have to map to something inside the namespace?

Inside the namespace it would appear as gid -1.  Outside the namespace
it would appear as the real gid.  So the problem would be if I am allowed
to map the file owning uid but not gid;  I make a new link to the file;
I wait for a vulnerability to be found;  host admin updates the original
file;  now on the host I run the file - having learned how to exploit the
vulnerability through no ingenuity of my own - and own all files owned
by that gid.

-serge

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-04  6:58                           ` Willy Tarreau
@ 2015-11-04 17:59                             ` Andy Lutomirski
  2015-11-04 18:15                               ` Willy Tarreau
  2015-11-06 21:59                               ` Kees Cook
  0 siblings, 2 replies; 35+ messages in thread
From: Andy Lutomirski @ 2015-11-04 17:59 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Kees Cook, Dirk Steinmetz, Michael Kerrisk-manpages,
	Serge Hallyn, Seth Forshee, Alexander Viro, Linux FS Devel, LKML,
	Eric W . Biederman, Serge Hallyn, security

On Tue, Nov 3, 2015 at 10:58 PM, Willy Tarreau <w@1wt.eu> wrote:
> On Tue, Nov 03, 2015 at 03:29:55PM -0800, Kees Cook wrote:
>> Using "write" does kill the set-gid bit. I haven't looked at
>> why.
>> Al or anyone else, is there a meaningful distinction here?
>
> I remember this one, I got caught once while trying to put a shell into
> a suid-writable file to get some privileges someone forgot to offer me :-)
>
> It's done by should_remove_suid() which is called upon write() and truncate().
>
>> Should the
>> mmap MAP_SHARED-write trigger the loss of the set-gid bit too? While
>> holding the file open with either open or mmap, I get a Text-in-use
>> error, so I would kind of expect the same behavior between either
>> close() and munmap(). I wonder if this is a bug, and if so, then your
>> link patch is indeed useful again. :)
>
> I don't see how this could be done with mmap(). Maybe we have a way to know
> when the first write is performed via this path, I have no idea.

do_wp_page might be a decent bet.

--Andy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-04 17:59                             ` Andy Lutomirski
@ 2015-11-04 18:15                               ` Willy Tarreau
  2015-11-04 18:17                                 ` Andy Lutomirski
  2015-11-06 21:59                               ` Kees Cook
  1 sibling, 1 reply; 35+ messages in thread
From: Willy Tarreau @ 2015-11-04 18:15 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Dirk Steinmetz, Michael Kerrisk-manpages,
	Serge Hallyn, Seth Forshee, Alexander Viro, Linux FS Devel, LKML,
	Eric W . Biederman, Serge Hallyn, security

On Wed, Nov 04, 2015 at 09:59:55AM -0800, Andy Lutomirski wrote:
> On Tue, Nov 3, 2015 at 10:58 PM, Willy Tarreau <w@1wt.eu> wrote:
> > On Tue, Nov 03, 2015 at 03:29:55PM -0800, Kees Cook wrote:
> >> Using "write" does kill the set-gid bit. I haven't looked at
> >> why.
> >> Al or anyone else, is there a meaningful distinction here?
> >
> > I remember this one, I got caught once while trying to put a shell into
> > a suid-writable file to get some privileges someone forgot to offer me :-)
> >
> > It's done by should_remove_suid() which is called upon write() and truncate().
> >
> >> Should the
> >> mmap MAP_SHARED-write trigger the loss of the set-gid bit too? While
> >> holding the file open with either open or mmap, I get a Text-in-use
> >> error, so I would kind of expect the same behavior between either
> >> close() and munmap(). I wonder if this is a bug, and if so, then your
> >> link patch is indeed useful again. :)
> >
> > I don't see how this could be done with mmap(). Maybe we have a way to know
> > when the first write is performed via this path, I have no idea.
> 
> do_wp_page might be a decent bet.

Yep probably at the same place where we update the file's time ?

That said I never feel completely comfortable with changing a file's
permissions this way, I always fear it could break backup/restore
applications. Let's imagine for a minute that a restore does this :

 extract(const char *file_name, int file_perms) {
   fd = open(".tmpfile", O_CREAT, file_perms);
   mmap(fd);
   /* actually write file */
   close(fd);
   unlink(real_file_name);
   rename(".tmpfile", file_name);
 }

Yes I know it's not safe to do the chmod before writing to the file
but we could imagine some situations where it makes sense to be done
this way (eg: if the file is put into a protected directory), and
anyway this possibility is provided by open() and creat() so it is
legitimate to imagine these ones could exist.

Such a change would slightly modify semantics and affect such use cases
*if they exist*, just like using write() instead of mmap() would fail.
We could imagine having a sysctl to disable this strengthening, but it
is probably not the best solution for the long term either.

Just my two cents,
Willy


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-04 18:15                               ` Willy Tarreau
@ 2015-11-04 18:17                                 ` Andy Lutomirski
  2015-11-04 18:28                                   ` Willy Tarreau
  0 siblings, 1 reply; 35+ messages in thread
From: Andy Lutomirski @ 2015-11-04 18:17 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Kees Cook, Dirk Steinmetz, Michael Kerrisk-manpages,
	Serge Hallyn, Seth Forshee, Alexander Viro, Linux FS Devel, LKML,
	Eric W . Biederman, Serge Hallyn, security

On Wed, Nov 4, 2015 at 10:15 AM, Willy Tarreau <w@1wt.eu> wrote:
> On Wed, Nov 04, 2015 at 09:59:55AM -0800, Andy Lutomirski wrote:
>> On Tue, Nov 3, 2015 at 10:58 PM, Willy Tarreau <w@1wt.eu> wrote:
>> > On Tue, Nov 03, 2015 at 03:29:55PM -0800, Kees Cook wrote:
>> >> Using "write" does kill the set-gid bit. I haven't looked at
>> >> why.
>> >> Al or anyone else, is there a meaningful distinction here?
>> >
>> > I remember this one, I got caught once while trying to put a shell into
>> > a suid-writable file to get some privileges someone forgot to offer me :-)
>> >
>> > It's done by should_remove_suid() which is called upon write() and truncate().
>> >
>> >> Should the
>> >> mmap MAP_SHARED-write trigger the loss of the set-gid bit too? While
>> >> holding the file open with either open or mmap, I get a Text-in-use
>> >> error, so I would kind of expect the same behavior between either
>> >> close() and munmap(). I wonder if this is a bug, and if so, then your
>> >> link patch is indeed useful again. :)
>> >
>> > I don't see how this could be done with mmap(). Maybe we have a way to know
>> > when the first write is performed via this path, I have no idea.
>>
>> do_wp_page might be a decent bet.
>
> Yep probably at the same place where we update the file's time ?
>
> That said I never feel completely comfortable with changing a file's
> permissions this way, I always fear it could break backup/restore
> applications. Let's imagine for a minute that a restore does this :
>
>  extract(const char *file_name, int file_perms) {
>    fd = open(".tmpfile", O_CREAT, file_perms);
>    mmap(fd);
>    /* actually write file */
>    close(fd);
>    unlink(real_file_name);
>    rename(".tmpfile", file_name);
>  }
>
> Yes I know it's not safe to do the chmod before writing to the file
> but we could imagine some situations where it makes sense to be done
> this way (eg: if the file is put into a protected directory), and
> anyway this possibility is provided by open() and creat() so it is
> legitimate to imagine these ones could exist.
>
> Such a change would slightly modify semantics and affect such use cases
> *if they exist*, just like using write() instead of mmap() would fail.
> We could imagine having a sysctl to disable this strengthening, but it
> is probably not the best solution for the long term either.

I'd say that this is an acceptable breakage risk.  In any event, the
potential for data loss is limited to a bit of the file mode, and
restore apps like that really don't deserve to work in the first
place.

--Andy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-04 18:17                                 ` Andy Lutomirski
@ 2015-11-04 18:28                                   ` Willy Tarreau
  0 siblings, 0 replies; 35+ messages in thread
From: Willy Tarreau @ 2015-11-04 18:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Dirk Steinmetz, Michael Kerrisk-manpages,
	Serge Hallyn, Seth Forshee, Alexander Viro, Linux FS Devel, LKML,
	Eric W . Biederman, Serge Hallyn, security

On Wed, Nov 04, 2015 at 10:17:06AM -0800, Andy Lutomirski wrote:
> > That said I never feel completely comfortable with changing a file's
> > permissions this way, I always fear it could break backup/restore
> > applications. Let's imagine for a minute that a restore does this :
> >
> >  extract(const char *file_name, int file_perms) {
> >    fd = open(".tmpfile", O_CREAT, file_perms);
> >    mmap(fd);
> >    /* actually write file */
> >    close(fd);
> >    unlink(real_file_name);
> >    rename(".tmpfile", file_name);
> >  }
> >
> > Yes I know it's not safe to do the chmod before writing to the file
> > but we could imagine some situations where it makes sense to be done
> > this way (eg: if the file is put into a protected directory), and
> > anyway this possibility is provided by open() and creat() so it is
> > legitimate to imagine these ones could exist.
> >
> > Such a change would slightly modify semantics and affect such use cases
> > *if they exist*, just like using write() instead of mmap() would fail.
> > We could imagine having a sysctl to disable this strengthening, but it
> > is probably not the best solution for the long term either.
> 
> I'd say that this is an acceptable breakage risk.

Yes probably.

> In any event, the
> potential for data loss is limited to a bit of the file mode,

When this bit is the one sudo's setuid, it becomes one of the most important
bits on the whole system :-)

> and
> restore apps like that really don't deserve to work in the first
> place.

I absolutely agree for this specific case. I just wanted to raise this
case so that we're sure not to oversee anything related to other similar
but more justified use cases.

Willy


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-04 17:59                             ` Andy Lutomirski
  2015-11-04 18:15                               ` Willy Tarreau
@ 2015-11-06 21:59                               ` Kees Cook
  2015-11-06 22:30                                 ` Andy Lutomirski
  1 sibling, 1 reply; 35+ messages in thread
From: Kees Cook @ 2015-11-06 21:59 UTC (permalink / raw)
  To: Andy Lutomirski, Theodore Tso
  Cc: Willy Tarreau, Dirk Steinmetz, Michael Kerrisk-manpages,
	Serge Hallyn, Seth Forshee, Alexander Viro, Linux FS Devel, LKML,
	Eric W . Biederman, Serge Hallyn, security

Adding Ted, who might know how this all hooks together. (The context
is that a write() or truncate() on a setgid file clears the setgid,
but mmap writes don't.)

On Wed, Nov 4, 2015 at 9:59 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Tue, Nov 3, 2015 at 10:58 PM, Willy Tarreau <w@1wt.eu> wrote:
>> On Tue, Nov 03, 2015 at 03:29:55PM -0800, Kees Cook wrote:
>>> Using "write" does kill the set-gid bit. I haven't looked at
>>> why.
>>> Al or anyone else, is there a meaningful distinction here?
>>
>> I remember this one, I got caught once while trying to put a shell into
>> a suid-writable file to get some privileges someone forgot to offer me :-)
>>
>> It's done by should_remove_suid() which is called upon write() and truncate().

file_remove_privs() seems to be the right entry point.
__generic_file_write_iter in mm/filemap.c calls it, though. Are these
callbacks not used for mmap writes?

>>
>>> Should the
>>> mmap MAP_SHARED-write trigger the loss of the set-gid bit too? While
>>> holding the file open with either open or mmap, I get a Text-in-use
>>> error, so I would kind of expect the same behavior between either
>>> close() and munmap(). I wonder if this is a bug, and if so, then your
>>> link patch is indeed useful again. :)
>>
>> I don't see how this could be done with mmap(). Maybe we have a way to know
>> when the first write is performed via this path, I have no idea.
>
> do_wp_page might be a decent bet.

Or wp_page_shared? Can we get back to a file from the mm at that point?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-06 21:59                               ` Kees Cook
@ 2015-11-06 22:30                                 ` Andy Lutomirski
  2015-11-07  0:11                                   ` Kees Cook
  0 siblings, 1 reply; 35+ messages in thread
From: Andy Lutomirski @ 2015-11-06 22:30 UTC (permalink / raw)
  To: Kees Cook
  Cc: Theodore Tso, Willy Tarreau, Dirk Steinmetz,
	Michael Kerrisk-manpages, Serge Hallyn, Seth Forshee,
	Alexander Viro, Linux FS Devel, LKML, Eric W . Biederman,
	Serge Hallyn, security

On Fri, Nov 6, 2015 at 1:59 PM, Kees Cook <keescook@chromium.org> wrote:
> Adding Ted, who might know how this all hooks together. (The context
> is that a write() or truncate() on a setgid file clears the setgid,
> but mmap writes don't.)
>
> On Wed, Nov 4, 2015 at 9:59 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Tue, Nov 3, 2015 at 10:58 PM, Willy Tarreau <w@1wt.eu> wrote:
>>> On Tue, Nov 03, 2015 at 03:29:55PM -0800, Kees Cook wrote:
>>>> Using "write" does kill the set-gid bit. I haven't looked at
>>>> why.
>>>> Al or anyone else, is there a meaningful distinction here?
>>>
>>> I remember this one, I got caught once while trying to put a shell into
>>> a suid-writable file to get some privileges someone forgot to offer me :-)
>>>
>>> It's done by should_remove_suid() which is called upon write() and truncate().
>
> file_remove_privs() seems to be the right entry point.
> __generic_file_write_iter in mm/filemap.c calls it, though. Are these
> callbacks not used for mmap writes?

They're certainly not used early enough -- we need to remove suid when
the page becomes writable via mmap (wp_page_shared), not when
writeback happens, or at least not only when writeback happens.

But IIRC mmaped writes go through a different path -- they go through
the address_space ops with names like writepages.

>
>>>
>>>> Should the
>>>> mmap MAP_SHARED-write trigger the loss of the set-gid bit too? While
>>>> holding the file open with either open or mmap, I get a Text-in-use
>>>> error, so I would kind of expect the same behavior between either
>>>> close() and munmap(). I wonder if this is a bug, and if so, then your
>>>> link patch is indeed useful again. :)
>>>
>>> I don't see how this could be done with mmap(). Maybe we have a way to know
>>> when the first write is performed via this path, I have no idea.
>>
>> do_wp_page might be a decent bet.
>
> Or wp_page_shared? Can we get back to a file from the mm at that point?

vma->vm_file, presumably (after checking whether it's null).
wp_page_shared AFAIK only happens from process context, and the vma
and its file should be valid.

We could also get to an inode via page->address_space->mapping, but
I'm guessing that vma->vm_file would be more appropriate here.

--Andy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-06 22:30                                 ` Andy Lutomirski
@ 2015-11-07  0:11                                   ` Kees Cook
  2015-11-07  0:16                                     ` Kees Cook
  0 siblings, 1 reply; 35+ messages in thread
From: Kees Cook @ 2015-11-07  0:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Theodore Tso, Willy Tarreau, Dirk Steinmetz,
	Michael Kerrisk-manpages, Serge Hallyn, Seth Forshee,
	Alexander Viro, Linux FS Devel, LKML, Eric W . Biederman,
	Serge Hallyn, security

On Fri, Nov 6, 2015 at 2:30 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Nov 6, 2015 at 1:59 PM, Kees Cook <keescook@chromium.org> wrote:
>> Adding Ted, who might know how this all hooks together. (The context
>> is that a write() or truncate() on a setgid file clears the setgid,
>> but mmap writes don't.)
>>
>> On Wed, Nov 4, 2015 at 9:59 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Tue, Nov 3, 2015 at 10:58 PM, Willy Tarreau <w@1wt.eu> wrote:
>>>> On Tue, Nov 03, 2015 at 03:29:55PM -0800, Kees Cook wrote:
>>>>> Using "write" does kill the set-gid bit. I haven't looked at
>>>>> why.
>>>>> Al or anyone else, is there a meaningful distinction here?
>>>>
>>>> I remember this one, I got caught once while trying to put a shell into
>>>> a suid-writable file to get some privileges someone forgot to offer me :-)
>>>>
>>>> It's done by should_remove_suid() which is called upon write() and truncate().
>>
>> file_remove_privs() seems to be the right entry point.
>> __generic_file_write_iter in mm/filemap.c calls it, though. Are these
>> callbacks not used for mmap writes?
>
> They're certainly not used early enough -- we need to remove suid when
> the page becomes writable via mmap (wp_page_shared), not when
> writeback happens, or at least not only when writeback happens.

Well, I'm shy about the change there. For example, we don't strip in
on open(RDWR), just on write().

> But IIRC mmaped writes go through a different path -- they go through
> the address_space ops with names like writepages.

Ah-ha.

>>>>> Should the
>>>>> mmap MAP_SHARED-write trigger the loss of the set-gid bit too? While
>>>>> holding the file open with either open or mmap, I get a Text-in-use
>>>>> error, so I would kind of expect the same behavior between either
>>>>> close() and munmap(). I wonder if this is a bug, and if so, then your
>>>>> link patch is indeed useful again. :)
>>>>
>>>> I don't see how this could be done with mmap(). Maybe we have a way to know
>>>> when the first write is performed via this path, I have no idea.
>>>
>>> do_wp_page might be a decent bet.
>>
>> Or wp_page_shared? Can we get back to a file from the mm at that point?
>
> vma->vm_file, presumably (after checking whether it's null).
> wp_page_shared AFAIK only happens from process context, and the vma
> and its file should be valid.
>
> We could also get to an inode via page->address_space->mapping, but
> I'm guessing that vma->vm_file would be more appropriate here.

Yeah. Let me give it a try...

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-07  0:11                                   ` Kees Cook
@ 2015-11-07  0:16                                     ` Kees Cook
  2015-11-07  0:48                                       ` Andy Lutomirski
  0 siblings, 1 reply; 35+ messages in thread
From: Kees Cook @ 2015-11-07  0:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Theodore Tso, Willy Tarreau, Dirk Steinmetz,
	Michael Kerrisk-manpages, Serge Hallyn, Seth Forshee,
	Alexander Viro, Linux FS Devel, LKML, Eric W . Biederman,
	Serge Hallyn, security

On Fri, Nov 6, 2015 at 4:11 PM, Kees Cook <keescook@chromium.org> wrote:
> On Fri, Nov 6, 2015 at 2:30 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Fri, Nov 6, 2015 at 1:59 PM, Kees Cook <keescook@chromium.org> wrote:
>>> Adding Ted, who might know how this all hooks together. (The context
>>> is that a write() or truncate() on a setgid file clears the setgid,
>>> but mmap writes don't.)
>>>
>>> On Wed, Nov 4, 2015 at 9:59 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>> On Tue, Nov 3, 2015 at 10:58 PM, Willy Tarreau <w@1wt.eu> wrote:
>>>>> On Tue, Nov 03, 2015 at 03:29:55PM -0800, Kees Cook wrote:
>>>>>> Using "write" does kill the set-gid bit. I haven't looked at
>>>>>> why.
>>>>>> Al or anyone else, is there a meaningful distinction here?
>>>>>
>>>>> I remember this one, I got caught once while trying to put a shell into
>>>>> a suid-writable file to get some privileges someone forgot to offer me :-)
>>>>>
>>>>> It's done by should_remove_suid() which is called upon write() and truncate().
>>>
>>> file_remove_privs() seems to be the right entry point.
>>> __generic_file_write_iter in mm/filemap.c calls it, though. Are these
>>> callbacks not used for mmap writes?
>>
>> They're certainly not used early enough -- we need to remove suid when
>> the page becomes writable via mmap (wp_page_shared), not when
>> writeback happens, or at least not only when writeback happens.
>
> Well, I'm shy about the change there. For example, we don't strip in
> on open(RDWR), just on write().

I take it back. Hooking wp_page_shared looks expensive. :) Maybe we do
need to hook the mmap?

-Kees

>
>> But IIRC mmaped writes go through a different path -- they go through
>> the address_space ops with names like writepages.
>
> Ah-ha.
>
>>>>>> Should the
>>>>>> mmap MAP_SHARED-write trigger the loss of the set-gid bit too? While
>>>>>> holding the file open with either open or mmap, I get a Text-in-use
>>>>>> error, so I would kind of expect the same behavior between either
>>>>>> close() and munmap(). I wonder if this is a bug, and if so, then your
>>>>>> link patch is indeed useful again. :)
>>>>>
>>>>> I don't see how this could be done with mmap(). Maybe we have a way to know
>>>>> when the first write is performed via this path, I have no idea.
>>>>
>>>> do_wp_page might be a decent bet.
>>>
>>> Or wp_page_shared? Can we get back to a file from the mm at that point?
>>
>> vma->vm_file, presumably (after checking whether it's null).
>> wp_page_shared AFAIK only happens from process context, and the vma
>> and its file should be valid.
>>
>> We could also get to an inode via page->address_space->mapping, but
>> I'm guessing that vma->vm_file would be more appropriate here.
>
> Yeah. Let me give it a try...
>
> -Kees
>
> --
> Kees Cook
> Chrome OS Security



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-07  0:16                                     ` Kees Cook
@ 2015-11-07  0:48                                       ` Andy Lutomirski
  2015-11-07  5:05                                         ` Kees Cook
  0 siblings, 1 reply; 35+ messages in thread
From: Andy Lutomirski @ 2015-11-07  0:48 UTC (permalink / raw)
  To: Kees Cook
  Cc: Theodore Tso, Willy Tarreau, Dirk Steinmetz,
	Michael Kerrisk-manpages, Serge Hallyn, Seth Forshee,
	Alexander Viro, Linux FS Devel, LKML, Eric W . Biederman,
	Serge Hallyn, security

On Fri, Nov 6, 2015 at 4:16 PM, Kees Cook <keescook@chromium.org> wrote:
> On Fri, Nov 6, 2015 at 4:11 PM, Kees Cook <keescook@chromium.org> wrote:
>> On Fri, Nov 6, 2015 at 2:30 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Fri, Nov 6, 2015 at 1:59 PM, Kees Cook <keescook@chromium.org> wrote:
>>>> Adding Ted, who might know how this all hooks together. (The context
>>>> is that a write() or truncate() on a setgid file clears the setgid,
>>>> but mmap writes don't.)
>>>>
>>>> On Wed, Nov 4, 2015 at 9:59 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>> On Tue, Nov 3, 2015 at 10:58 PM, Willy Tarreau <w@1wt.eu> wrote:
>>>>>> On Tue, Nov 03, 2015 at 03:29:55PM -0800, Kees Cook wrote:
>>>>>>> Using "write" does kill the set-gid bit. I haven't looked at
>>>>>>> why.
>>>>>>> Al or anyone else, is there a meaningful distinction here?
>>>>>>
>>>>>> I remember this one, I got caught once while trying to put a shell into
>>>>>> a suid-writable file to get some privileges someone forgot to offer me :-)
>>>>>>
>>>>>> It's done by should_remove_suid() which is called upon write() and truncate().
>>>>
>>>> file_remove_privs() seems to be the right entry point.
>>>> __generic_file_write_iter in mm/filemap.c calls it, though. Are these
>>>> callbacks not used for mmap writes?
>>>
>>> They're certainly not used early enough -- we need to remove suid when
>>> the page becomes writable via mmap (wp_page_shared), not when
>>> writeback happens, or at least not only when writeback happens.
>>
>> Well, I'm shy about the change there. For example, we don't strip in
>> on open(RDWR), just on write().
>
> I take it back. Hooking wp_page_shared looks expensive. :) Maybe we do
> need to hook the mmap?

But file_update_time already pokes at the same (or nearby) cachelines,
I think -- why would it be expensive?  The whole thing could be
guarded by if (unlikely(is setuid)), right?

--Andy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-07  0:48                                       ` Andy Lutomirski
@ 2015-11-07  5:05                                         ` Kees Cook
  2015-11-08  2:02                                           ` Theodore Ts'o
  0 siblings, 1 reply; 35+ messages in thread
From: Kees Cook @ 2015-11-07  5:05 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Theodore Tso, Willy Tarreau, Dirk Steinmetz,
	Michael Kerrisk-manpages, Serge Hallyn, Seth Forshee,
	Alexander Viro, Linux FS Devel, LKML, Eric W . Biederman,
	Serge Hallyn, security

On Fri, Nov 6, 2015 at 4:48 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Nov 6, 2015 at 4:16 PM, Kees Cook <keescook@chromium.org> wrote:
>> On Fri, Nov 6, 2015 at 4:11 PM, Kees Cook <keescook@chromium.org> wrote:
>>> On Fri, Nov 6, 2015 at 2:30 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>> On Fri, Nov 6, 2015 at 1:59 PM, Kees Cook <keescook@chromium.org> wrote:
>>>>> Adding Ted, who might know how this all hooks together. (The context
>>>>> is that a write() or truncate() on a setgid file clears the setgid,
>>>>> but mmap writes don't.)
>>>>>
>>>>> On Wed, Nov 4, 2015 at 9:59 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>>> On Tue, Nov 3, 2015 at 10:58 PM, Willy Tarreau <w@1wt.eu> wrote:
>>>>>>> On Tue, Nov 03, 2015 at 03:29:55PM -0800, Kees Cook wrote:
>>>>>>>> Using "write" does kill the set-gid bit. I haven't looked at
>>>>>>>> why.
>>>>>>>> Al or anyone else, is there a meaningful distinction here?
>>>>>>>
>>>>>>> I remember this one, I got caught once while trying to put a shell into
>>>>>>> a suid-writable file to get some privileges someone forgot to offer me :-)
>>>>>>>
>>>>>>> It's done by should_remove_suid() which is called upon write() and truncate().
>>>>>
>>>>> file_remove_privs() seems to be the right entry point.
>>>>> __generic_file_write_iter in mm/filemap.c calls it, though. Are these
>>>>> callbacks not used for mmap writes?
>>>>
>>>> They're certainly not used early enough -- we need to remove suid when
>>>> the page becomes writable via mmap (wp_page_shared), not when
>>>> writeback happens, or at least not only when writeback happens.
>>>
>>> Well, I'm shy about the change there. For example, we don't strip in
>>> on open(RDWR), just on write().
>>
>> I take it back. Hooking wp_page_shared looks expensive. :) Maybe we do
>> need to hook the mmap?
>
> But file_update_time already pokes at the same (or nearby) cachelines,
> I think -- why would it be expensive?  The whole thing could be
> guarded by if (unlikely(is setuid)), right?

Yeah, true. I added file_remove_privs calls near all the
file_update_time calls, to no effect. Added to wp_page_shared too,
nothing. Hmmm.

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-07  5:05                                         ` Kees Cook
@ 2015-11-08  2:02                                           ` Theodore Ts'o
  2015-11-10 15:08                                             ` Jan Kara
  0 siblings, 1 reply; 35+ messages in thread
From: Theodore Ts'o @ 2015-11-08  2:02 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, Theodore Tso, Willy Tarreau, Dirk Steinmetz,
	Michael Kerrisk-manpages, Serge Hallyn, Seth Forshee,
	Alexander Viro, Linux FS Devel, LKML, Eric W . Biederman,
	Serge Hallyn, security

On Fri, Nov 06, 2015 at 09:05:57PM -0800, Kees Cook wrote:
> >>>> They're certainly not used early enough -- we need to remove suid when
> >>>> the page becomes writable via mmap (wp_page_shared), not when
> >>>> writeback happens, or at least not only when writeback happens.
> >>>
> >>> Well, I'm shy about the change there. For example, we don't strip in
> >>> on open(RDWR), just on write().
> >>
> >> I take it back. Hooking wp_page_shared looks expensive. :) Maybe we do
> >> need to hook the mmap?
> >
> > But file_update_time already pokes at the same (or nearby) cachelines,
> > I think -- why would it be expensive?  The whole thing could be
> > guarded by if (unlikely(is setuid)), right?
> 
> Yeah, true. I added file_remove_privs calls near all the
> file_update_time calls, to no effect. Added to wp_page_shared too,
> nothing. Hmmm.

Why not put the the should_remove_suid() call in
filemap_page_mkwrite(), or maybe do_page_mkwrite()?

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-08  2:02                                           ` Theodore Ts'o
@ 2015-11-10 15:08                                             ` Jan Kara
  2015-11-19 20:11                                               ` Kees Cook
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kara @ 2015-11-10 15:08 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Kees Cook, Andy Lutomirski, Theodore Tso, Willy Tarreau,
	Dirk Steinmetz, Michael Kerrisk-manpages, Serge Hallyn,
	Seth Forshee, Alexander Viro, Linux FS Devel, LKML,
	Eric W . Biederman, Serge Hallyn, security

On Sat 07-11-15 21:02:06, Ted Tso wrote:
> On Fri, Nov 06, 2015 at 09:05:57PM -0800, Kees Cook wrote:
> > >>>> They're certainly not used early enough -- we need to remove suid when
> > >>>> the page becomes writable via mmap (wp_page_shared), not when
> > >>>> writeback happens, or at least not only when writeback happens.
> > >>>
> > >>> Well, I'm shy about the change there. For example, we don't strip in
> > >>> on open(RDWR), just on write().
> > >>
> > >> I take it back. Hooking wp_page_shared looks expensive. :) Maybe we do
> > >> need to hook the mmap?
> > >
> > > But file_update_time already pokes at the same (or nearby) cachelines,
> > > I think -- why would it be expensive?  The whole thing could be
> > > guarded by if (unlikely(is setuid)), right?
> > 
> > Yeah, true. I added file_remove_privs calls near all the
> > file_update_time calls, to no effect. Added to wp_page_shared too,
> > nothing. Hmmm.
> 
> Why not put the the should_remove_suid() call in
> filemap_page_mkwrite(), or maybe do_page_mkwrite()?

page_mkwrite() callbacks are IMHO the right place for this check (and
change).  Just next to file_update_time() call. You get proper filesystem
freezing protection etc. As Ted properly mentions filemap_page_mkwrite() is
one place you want to hook into but quite a few filesystems (most notably
ext4, xfs, btrfs) overload ->page_mkwrite() callbacks to their own
functions so each filesystem that does this needs to be updated
separately...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-10 15:08                                             ` Jan Kara
@ 2015-11-19 20:11                                               ` Kees Cook
  2015-11-19 21:57                                                 ` Andy Lutomirski
  2015-11-19 22:02                                                 ` Dave Chinner
  0 siblings, 2 replies; 35+ messages in thread
From: Kees Cook @ 2015-11-19 20:11 UTC (permalink / raw)
  To: Jan Kara
  Cc: Theodore Ts'o, Andy Lutomirski, Theodore Tso, Willy Tarreau,
	Dirk Steinmetz, Michael Kerrisk-manpages, Serge Hallyn,
	Seth Forshee, Alexander Viro, Linux FS Devel, LKML,
	Eric W . Biederman, Serge Hallyn, security

On Tue, Nov 10, 2015 at 7:08 AM, Jan Kara <jack@suse.cz> wrote:
> On Sat 07-11-15 21:02:06, Ted Tso wrote:
>> On Fri, Nov 06, 2015 at 09:05:57PM -0800, Kees Cook wrote:
>> > >>>> They're certainly not used early enough -- we need to remove suid when
>> > >>>> the page becomes writable via mmap (wp_page_shared), not when
>> > >>>> writeback happens, or at least not only when writeback happens.
>> > >>>
>> > >>> Well, I'm shy about the change there. For example, we don't strip in
>> > >>> on open(RDWR), just on write().
>> > >>
>> > >> I take it back. Hooking wp_page_shared looks expensive. :) Maybe we do
>> > >> need to hook the mmap?
>> > >
>> > > But file_update_time already pokes at the same (or nearby) cachelines,
>> > > I think -- why would it be expensive?  The whole thing could be
>> > > guarded by if (unlikely(is setuid)), right?
>> >
>> > Yeah, true. I added file_remove_privs calls near all the
>> > file_update_time calls, to no effect. Added to wp_page_shared too,
>> > nothing. Hmmm.
>>
>> Why not put the the should_remove_suid() call in
>> filemap_page_mkwrite(), or maybe do_page_mkwrite()?
>
> page_mkwrite() callbacks are IMHO the right place for this check (and
> change).  Just next to file_update_time() call. You get proper filesystem

Should file_update_time() just be modified to include
file_remove_privs()? They seem to regularly go together.

> freezing protection etc. As Ted properly mentions filemap_page_mkwrite() is
> one place you want to hook into but quite a few filesystems (most notably
> ext4, xfs, btrfs) overload ->page_mkwrite() callbacks to their own
> functions so each filesystem that does this needs to be updated
> separately...

Depending on each filesystem to do this correctly seems like a
mistake. I was surprised that my attempts (via file_update_time and
wp_page_shared) didn't work. Any other suggestions?

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-19 20:11                                               ` Kees Cook
@ 2015-11-19 21:57                                                 ` Andy Lutomirski
  2015-11-19 22:02                                                 ` Dave Chinner
  1 sibling, 0 replies; 35+ messages in thread
From: Andy Lutomirski @ 2015-11-19 21:57 UTC (permalink / raw)
  To: Kees Cook
  Cc: security, Serge Hallyn, Eric W . Biederman, Theodore Ts'o,
	Jan Kara, Theodore Tso, Dirk Steinmetz, Serge Hallyn,
	Willy Tarreau, Alexander Viro, Michael Kerrisk-manpages,
	Linux FS Devel, LKML, Seth Forshee

On Nov 19, 2015 12:11 PM, "Kees Cook" <keescook@chromium.org> wrote:
>
> On Tue, Nov 10, 2015 at 7:08 AM, Jan Kara <jack@suse.cz> wrote:
> > On Sat 07-11-15 21:02:06, Ted Tso wrote:
> >> On Fri, Nov 06, 2015 at 09:05:57PM -0800, Kees Cook wrote:
> >> > >>>> They're certainly not used early enough -- we need to remove suid when
> >> > >>>> the page becomes writable via mmap (wp_page_shared), not when
> >> > >>>> writeback happens, or at least not only when writeback happens.
> >> > >>>
> >> > >>> Well, I'm shy about the change there. For example, we don't strip in
> >> > >>> on open(RDWR), just on write().
> >> > >>
> >> > >> I take it back. Hooking wp_page_shared looks expensive. :) Maybe we do
> >> > >> need to hook the mmap?
> >> > >
> >> > > But file_update_time already pokes at the same (or nearby) cachelines,
> >> > > I think -- why would it be expensive?  The whole thing could be
> >> > > guarded by if (unlikely(is setuid)), right?
> >> >
> >> > Yeah, true. I added file_remove_privs calls near all the
> >> > file_update_time calls, to no effect. Added to wp_page_shared too,
> >> > nothing. Hmmm.
> >>
> >> Why not put the the should_remove_suid() call in
> >> filemap_page_mkwrite(), or maybe do_page_mkwrite()?
> >
> > page_mkwrite() callbacks are IMHO the right place for this check (and
> > change).  Just next to file_update_time() call. You get proper filesystem
>
> Should file_update_time() just be modified to include
> file_remove_privs()? They seem to regularly go together.
>

No, I think.  The current file_update_time is slow and
POSIX-noncompliant, and I have old patches I need to dig up to fix it.

--Andy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-19 20:11                                               ` Kees Cook
  2015-11-19 21:57                                                 ` Andy Lutomirski
@ 2015-11-19 22:02                                                 ` Dave Chinner
  2015-11-20  0:11                                                   ` Kees Cook
  1 sibling, 1 reply; 35+ messages in thread
From: Dave Chinner @ 2015-11-19 22:02 UTC (permalink / raw)
  To: Kees Cook
  Cc: Jan Kara, Theodore Ts'o, Andy Lutomirski, Theodore Tso,
	Willy Tarreau, Dirk Steinmetz, Michael Kerrisk-manpages,
	Serge Hallyn, Seth Forshee, Alexander Viro, Linux FS Devel, LKML,
	Eric W . Biederman, Serge Hallyn, security

On Thu, Nov 19, 2015 at 12:11:11PM -0800, Kees Cook wrote:
> On Tue, Nov 10, 2015 at 7:08 AM, Jan Kara <jack@suse.cz> wrote:
> > On Sat 07-11-15 21:02:06, Ted Tso wrote:
> >> On Fri, Nov 06, 2015 at 09:05:57PM -0800, Kees Cook wrote:
> >> > >>>> They're certainly not used early enough -- we need to remove suid when
> >> > >>>> the page becomes writable via mmap (wp_page_shared), not when
> >> > >>>> writeback happens, or at least not only when writeback happens.
> >> > >>>
> >> > >>> Well, I'm shy about the change there. For example, we don't strip in
> >> > >>> on open(RDWR), just on write().
> >> > >>
> >> > >> I take it back. Hooking wp_page_shared looks expensive. :) Maybe we do
> >> > >> need to hook the mmap?
> >> > >
> >> > > But file_update_time already pokes at the same (or nearby) cachelines,
> >> > > I think -- why would it be expensive?  The whole thing could be
> >> > > guarded by if (unlikely(is setuid)), right?
> >> >
> >> > Yeah, true. I added file_remove_privs calls near all the
> >> > file_update_time calls, to no effect. Added to wp_page_shared too,
> >> > nothing. Hmmm.
> >>
> >> Why not put the the should_remove_suid() call in
> >> filemap_page_mkwrite(), or maybe do_page_mkwrite()?
> >
> > page_mkwrite() callbacks are IMHO the right place for this check (and
> > change).  Just next to file_update_time() call. You get proper filesystem
> 
> Should file_update_time() just be modified to include
> file_remove_privs()? They seem to regularly go together.

They might have similar call sites, but they are completely
different operations. timestamp updates are optional, highly
configurable and behaviour is filesystem implementation specific,
whilst file_remove_privs() is mandatory and must be done in a
crash-safe manner (i.e. via transactions). Hence, IMO, they need to
be kept separate even if the call sites are similar.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
  2015-11-19 22:02                                                 ` Dave Chinner
@ 2015-11-20  0:11                                                   ` Kees Cook
  0 siblings, 0 replies; 35+ messages in thread
From: Kees Cook @ 2015-11-20  0:11 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jan Kara, Theodore Ts'o, Andy Lutomirski, Theodore Tso,
	Willy Tarreau, Dirk Steinmetz, Michael Kerrisk-manpages,
	Serge Hallyn, Seth Forshee, Alexander Viro, Linux FS Devel, LKML,
	Eric W . Biederman, Serge Hallyn, security

On Thu, Nov 19, 2015 at 2:02 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, Nov 19, 2015 at 12:11:11PM -0800, Kees Cook wrote:
>> On Tue, Nov 10, 2015 at 7:08 AM, Jan Kara <jack@suse.cz> wrote:
>> > On Sat 07-11-15 21:02:06, Ted Tso wrote:
>> >> On Fri, Nov 06, 2015 at 09:05:57PM -0800, Kees Cook wrote:
>> >> > >>>> They're certainly not used early enough -- we need to remove suid when
>> >> > >>>> the page becomes writable via mmap (wp_page_shared), not when
>> >> > >>>> writeback happens, or at least not only when writeback happens.
>> >> > >>>
>> >> > >>> Well, I'm shy about the change there. For example, we don't strip in
>> >> > >>> on open(RDWR), just on write().
>> >> > >>
>> >> > >> I take it back. Hooking wp_page_shared looks expensive. :) Maybe we do
>> >> > >> need to hook the mmap?
>> >> > >
>> >> > > But file_update_time already pokes at the same (or nearby) cachelines,
>> >> > > I think -- why would it be expensive?  The whole thing could be
>> >> > > guarded by if (unlikely(is setuid)), right?
>> >> >
>> >> > Yeah, true. I added file_remove_privs calls near all the
>> >> > file_update_time calls, to no effect. Added to wp_page_shared too,
>> >> > nothing. Hmmm.
>> >>
>> >> Why not put the the should_remove_suid() call in
>> >> filemap_page_mkwrite(), or maybe do_page_mkwrite()?
>> >
>> > page_mkwrite() callbacks are IMHO the right place for this check (and
>> > change).  Just next to file_update_time() call. You get proper filesystem
>>
>> Should file_update_time() just be modified to include
>> file_remove_privs()? They seem to regularly go together.
>
> They might have similar call sites, but they are completely
> different operations. timestamp updates are optional, highly
> configurable and behaviour is filesystem implementation specific,
> whilst file_remove_privs() is mandatory and must be done in a
> crash-safe manner (i.e. via transactions). Hence, IMO, they need to
> be kept separate even if the call sites are similar.

Yeah, that was my worry too.

Okay, I think I've got it now. I had misunderstood the purpose of the
page_mkwrite variable in wp_page_reuse. My tests pass now. Patch on
it's way...

-Kees

>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2015-11-20  0:11 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-10 14:59 [PATCH] namei: permit linking with CAP_FOWNER in userns Dirk Steinmetz
2015-10-20 14:09 ` Dirk Steinmetz
2015-10-27 14:33   ` Seth Forshee
2015-10-27 18:08     ` Dirk Steinmetz
2015-10-27 20:28       ` Serge Hallyn
2015-10-28 15:07         ` Dirk Steinmetz
2015-10-28 17:33           ` Serge Hallyn
2015-11-02 15:10             ` Dirk Steinmetz
2015-11-02 18:02               ` Serge Hallyn
2015-11-02 19:57                 ` Andy Lutomirski
2015-11-03  0:39                   ` [RFC] namei: prevent sgid-hardlinks for unmapped gids Dirk Steinmetz
2015-11-03 15:44                     ` Serge Hallyn
2015-11-03 18:20                     ` Kees Cook
2015-11-03 23:21                       ` Dirk Steinmetz
2015-11-03 23:29                         ` Kees Cook
2015-11-04  6:58                           ` Willy Tarreau
2015-11-04 17:59                             ` Andy Lutomirski
2015-11-04 18:15                               ` Willy Tarreau
2015-11-04 18:17                                 ` Andy Lutomirski
2015-11-04 18:28                                   ` Willy Tarreau
2015-11-06 21:59                               ` Kees Cook
2015-11-06 22:30                                 ` Andy Lutomirski
2015-11-07  0:11                                   ` Kees Cook
2015-11-07  0:16                                     ` Kees Cook
2015-11-07  0:48                                       ` Andy Lutomirski
2015-11-07  5:05                                         ` Kees Cook
2015-11-08  2:02                                           ` Theodore Ts'o
2015-11-10 15:08                                             ` Jan Kara
2015-11-19 20:11                                               ` Kees Cook
2015-11-19 21:57                                                 ` Andy Lutomirski
2015-11-19 22:02                                                 ` Dave Chinner
2015-11-20  0:11                                                   ` Kees Cook
2015-11-04 14:46                       ` Serge E. Hallyn
2015-10-27 21:04     ` [PATCH] namei: permit linking with CAP_FOWNER in userns Eric W. Biederman
2015-11-03 17:51 ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).