All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8] fs: clear file privilege bits when mmap writing
@ 2016-01-12 19:09 ` Kees Cook
  0 siblings, 0 replies; 15+ messages in thread
From: Kees Cook @ 2016-01-12 19:09 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Konstantin Khlebnikov, Andy Lutomirski, Jan Kara, yalin wang,
	Willy Tarreau, Andrew Morton, linux-mm, linux-kernel

Normally, when a user can modify a file that has setuid or setgid bits,
those bits are cleared when they are not the file owner or a member
of the group. This is enforced when using write and truncate but not
when writing to a shared mmap on the file. This could allow the file
writer to gain privileges by changing a binary without losing the
setuid/setgid/caps bits.

Changing the bits requires holding inode->i_mutex, so it cannot be done
during the page fault (due to mmap_sem being held during the fault).
Instead, clear the bits if PROT_WRITE is being used at mmap open time,
or added at mprotect time.

Since we can't do the check in the right place inside mmap (due to
holding mmap_sem), we have to do it before holding mmap_sem, which
means duplicating some checks, which have to be available to the non-MMU
builds too.

When walking VMAs during mprotect, we need to drop mmap_sem (while
holding a file reference) and restart the walk after clearing privileges.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
v8:
- use mmap/mprotect method, with mprotect walk restart, thanks to koct9i
v7:
- document and avoid arch-specific O_* values, viro
v6:
- clarify ETXTBSY situation in comments, luto
v5:
- add to f_flags instead, viro
- add i_mutex during __fput, jack
v4:
- delay removal instead of still needing mmap_sem for mprotect, yalin
v3:
- move outside of mmap_sem for real now, fengguang
- check return code of file_remove_privs, akpm
v2:
- move to mmap from fault handler, jack
---
 include/linux/mm.h |  1 +
 mm/mmap.c          | 20 ++++----------------
 mm/mprotect.c      | 24 ++++++++++++++++++++++++
 mm/util.c          | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 79 insertions(+), 16 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 00bad7793788..b264c8be7114 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1912,6 +1912,7 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo
 
 extern unsigned long mmap_region(struct file *file, unsigned long addr,
 	unsigned long len, vm_flags_t vm_flags, unsigned long pgoff);
+extern int do_mmap_shared_checks(struct file *file, unsigned long prot);
 extern unsigned long do_mmap(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot, unsigned long flags,
 	vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate);
diff --git a/mm/mmap.c b/mm/mmap.c
index 2ce04a649f6b..b3424db0a29e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1320,25 +1320,13 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 		return -EAGAIN;
 
 	if (file) {
-		struct inode *inode = file_inode(file);
+		int err;
 
 		switch (flags & MAP_TYPE) {
 		case MAP_SHARED:
-			if ((prot&PROT_WRITE) && !(file->f_mode&FMODE_WRITE))
-				return -EACCES;
-
-			/*
-			 * Make sure we don't allow writing to an append-only
-			 * file..
-			 */
-			if (IS_APPEND(inode) && (file->f_mode & FMODE_WRITE))
-				return -EACCES;
-
-			/*
-			 * Make sure there are no mandatory locks on the file.
-			 */
-			if (locks_verify_locked(file))
-				return -EAGAIN;
+			err = do_mmap_shared_checks(file, prot);
+			if (err)
+				return err;
 
 			vm_flags |= VM_SHARED | VM_MAYSHARE;
 			if (!(file->f_mode & FMODE_WRITE))
diff --git a/mm/mprotect.c b/mm/mprotect.c
index ef5be8eaab00..2e16eaedbca2 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -12,6 +12,7 @@
 #include <linux/hugetlb.h>
 #include <linux/shm.h>
 #include <linux/mman.h>
+#include <linux/file.h>
 #include <linux/fs.h>
 #include <linux/highmem.h>
 #include <linux/security.h>
@@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
 
 	vm_flags = calc_vm_prot_bits(prot);
 
+restart:
 	down_write(&current->mm->mmap_sem);
 
 	vma = find_vma(current->mm, start);
@@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
 			goto out;
 		}
 
+		/*
+		 * If we're adding write permissions to a shared file,
+		 * we must clear privileges (like done at mmap time),
+		 * but we have to juggle the locks to avoid holding
+		 * mmap_sem while holding i_mutex.
+		 */
+		if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
+		    (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
+		    !IS_NOSEC(file_inode(vma->vm_file))) {
+			struct file *file = get_file(vma->vm_file);
+
+			start = vma->vm_start;
+			up_write(&current->mm->mmap_sem);
+			mutex_lock(&file_inode(file)->i_mutex);
+			error = file_remove_privs(file);
+			mutex_unlock(&file_inode(file)->i_mutex);
+			fput(file);
+			if (error)
+				return error;
+			goto restart;
+		}
+
 		error = security_file_mprotect(vma, reqprot, prot);
 		if (error)
 			goto out;
diff --git a/mm/util.c b/mm/util.c
index 9af1c12b310c..1882eaf33a37 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -283,6 +283,29 @@ int __weak get_user_pages_fast(unsigned long start,
 }
 EXPORT_SYMBOL_GPL(get_user_pages_fast);
 
+int do_mmap_shared_checks(struct file *file, unsigned long prot)
+{
+	struct inode *inode = file_inode(file);
+
+	if ((prot & PROT_WRITE) && !(file->f_mode & FMODE_WRITE))
+		return -EACCES;
+
+	/*
+	 * Make sure we don't allow writing to an append-only
+	 * file..
+	 */
+	if (IS_APPEND(inode) && (file->f_mode & FMODE_WRITE))
+		return -EACCES;
+
+	/*
+	 * Make sure there are no mandatory locks on the file.
+	 */
+	if (locks_verify_locked(file))
+		return -EAGAIN;
+
+	return 0;
+}
+
 unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot,
 	unsigned long flag, unsigned long pgoff)
@@ -291,6 +314,33 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 	struct mm_struct *mm = current->mm;
 	unsigned long populate;
 
+	/*
+	 * If we must remove privs, we do it here since doing it during
+	 * page fault may be expensive and cannot hold inode->i_mutex,
+	 * since mm->mmap_sem is already held.
+	 */
+	if (file && (flag & MAP_TYPE) == MAP_SHARED && (prot & PROT_WRITE)) {
+		struct inode *inode = file_inode(file);
+		int err;
+
+		if (!IS_NOSEC(inode)) {
+			/*
+			 * Make sure we can't strip privs from a file that
+			 * wouldn't otherwise be allowed to be mmapped.
+			 */
+			err = do_mmap_shared_checks(file, prot);
+			if (err)
+				return err;
+
+			mutex_lock(&inode->i_mutex);
+			err = file_remove_privs(file);
+			mutex_unlock(&inode->i_mutex);
+
+			if (err)
+				return err;
+		}
+	}
+
 	ret = security_mmap_file(file, prot, flag);
 	if (!ret) {
 		down_write(&mm->mmap_sem);
-- 
2.6.3


-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v8] fs: clear file privilege bits when mmap writing
@ 2016-01-12 19:09 ` Kees Cook
  0 siblings, 0 replies; 15+ messages in thread
From: Kees Cook @ 2016-01-12 19:09 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Konstantin Khlebnikov, Andy Lutomirski, Jan Kara, yalin wang,
	Willy Tarreau, Andrew Morton, linux-mm, linux-kernel

Normally, when a user can modify a file that has setuid or setgid bits,
those bits are cleared when they are not the file owner or a member
of the group. This is enforced when using write and truncate but not
when writing to a shared mmap on the file. This could allow the file
writer to gain privileges by changing a binary without losing the
setuid/setgid/caps bits.

Changing the bits requires holding inode->i_mutex, so it cannot be done
during the page fault (due to mmap_sem being held during the fault).
Instead, clear the bits if PROT_WRITE is being used at mmap open time,
or added at mprotect time.

Since we can't do the check in the right place inside mmap (due to
holding mmap_sem), we have to do it before holding mmap_sem, which
means duplicating some checks, which have to be available to the non-MMU
builds too.

When walking VMAs during mprotect, we need to drop mmap_sem (while
holding a file reference) and restart the walk after clearing privileges.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
v8:
- use mmap/mprotect method, with mprotect walk restart, thanks to koct9i
v7:
- document and avoid arch-specific O_* values, viro
v6:
- clarify ETXTBSY situation in comments, luto
v5:
- add to f_flags instead, viro
- add i_mutex during __fput, jack
v4:
- delay removal instead of still needing mmap_sem for mprotect, yalin
v3:
- move outside of mmap_sem for real now, fengguang
- check return code of file_remove_privs, akpm
v2:
- move to mmap from fault handler, jack
---
 include/linux/mm.h |  1 +
 mm/mmap.c          | 20 ++++----------------
 mm/mprotect.c      | 24 ++++++++++++++++++++++++
 mm/util.c          | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 79 insertions(+), 16 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 00bad7793788..b264c8be7114 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1912,6 +1912,7 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo
 
 extern unsigned long mmap_region(struct file *file, unsigned long addr,
 	unsigned long len, vm_flags_t vm_flags, unsigned long pgoff);
+extern int do_mmap_shared_checks(struct file *file, unsigned long prot);
 extern unsigned long do_mmap(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot, unsigned long flags,
 	vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate);
diff --git a/mm/mmap.c b/mm/mmap.c
index 2ce04a649f6b..b3424db0a29e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1320,25 +1320,13 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 		return -EAGAIN;
 
 	if (file) {
-		struct inode *inode = file_inode(file);
+		int err;
 
 		switch (flags & MAP_TYPE) {
 		case MAP_SHARED:
-			if ((prot&PROT_WRITE) && !(file->f_mode&FMODE_WRITE))
-				return -EACCES;
-
-			/*
-			 * Make sure we don't allow writing to an append-only
-			 * file..
-			 */
-			if (IS_APPEND(inode) && (file->f_mode & FMODE_WRITE))
-				return -EACCES;
-
-			/*
-			 * Make sure there are no mandatory locks on the file.
-			 */
-			if (locks_verify_locked(file))
-				return -EAGAIN;
+			err = do_mmap_shared_checks(file, prot);
+			if (err)
+				return err;
 
 			vm_flags |= VM_SHARED | VM_MAYSHARE;
 			if (!(file->f_mode & FMODE_WRITE))
diff --git a/mm/mprotect.c b/mm/mprotect.c
index ef5be8eaab00..2e16eaedbca2 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -12,6 +12,7 @@
 #include <linux/hugetlb.h>
 #include <linux/shm.h>
 #include <linux/mman.h>
+#include <linux/file.h>
 #include <linux/fs.h>
 #include <linux/highmem.h>
 #include <linux/security.h>
@@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
 
 	vm_flags = calc_vm_prot_bits(prot);
 
+restart:
 	down_write(&current->mm->mmap_sem);
 
 	vma = find_vma(current->mm, start);
@@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
 			goto out;
 		}
 
+		/*
+		 * If we're adding write permissions to a shared file,
+		 * we must clear privileges (like done at mmap time),
+		 * but we have to juggle the locks to avoid holding
+		 * mmap_sem while holding i_mutex.
+		 */
+		if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
+		    (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
+		    !IS_NOSEC(file_inode(vma->vm_file))) {
+			struct file *file = get_file(vma->vm_file);
+
+			start = vma->vm_start;
+			up_write(&current->mm->mmap_sem);
+			mutex_lock(&file_inode(file)->i_mutex);
+			error = file_remove_privs(file);
+			mutex_unlock(&file_inode(file)->i_mutex);
+			fput(file);
+			if (error)
+				return error;
+			goto restart;
+		}
+
 		error = security_file_mprotect(vma, reqprot, prot);
 		if (error)
 			goto out;
diff --git a/mm/util.c b/mm/util.c
index 9af1c12b310c..1882eaf33a37 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -283,6 +283,29 @@ int __weak get_user_pages_fast(unsigned long start,
 }
 EXPORT_SYMBOL_GPL(get_user_pages_fast);
 
+int do_mmap_shared_checks(struct file *file, unsigned long prot)
+{
+	struct inode *inode = file_inode(file);
+
+	if ((prot & PROT_WRITE) && !(file->f_mode & FMODE_WRITE))
+		return -EACCES;
+
+	/*
+	 * Make sure we don't allow writing to an append-only
+	 * file..
+	 */
+	if (IS_APPEND(inode) && (file->f_mode & FMODE_WRITE))
+		return -EACCES;
+
+	/*
+	 * Make sure there are no mandatory locks on the file.
+	 */
+	if (locks_verify_locked(file))
+		return -EAGAIN;
+
+	return 0;
+}
+
 unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot,
 	unsigned long flag, unsigned long pgoff)
@@ -291,6 +314,33 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 	struct mm_struct *mm = current->mm;
 	unsigned long populate;
 
+	/*
+	 * If we must remove privs, we do it here since doing it during
+	 * page fault may be expensive and cannot hold inode->i_mutex,
+	 * since mm->mmap_sem is already held.
+	 */
+	if (file && (flag & MAP_TYPE) == MAP_SHARED && (prot & PROT_WRITE)) {
+		struct inode *inode = file_inode(file);
+		int err;
+
+		if (!IS_NOSEC(inode)) {
+			/*
+			 * Make sure we can't strip privs from a file that
+			 * wouldn't otherwise be allowed to be mmapped.
+			 */
+			err = do_mmap_shared_checks(file, prot);
+			if (err)
+				return err;
+
+			mutex_lock(&inode->i_mutex);
+			err = file_remove_privs(file);
+			mutex_unlock(&inode->i_mutex);
+
+			if (err)
+				return err;
+		}
+	}
+
 	ret = security_mmap_file(file, prot, flag);
 	if (!ret) {
 		down_write(&mm->mmap_sem);
-- 
2.6.3


-- 
Kees Cook
Chrome OS & Brillo Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
  2016-01-12 19:09 ` Kees Cook
@ 2016-01-13  9:03   ` Jan Kara
  -1 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2016-01-13  9:03 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexander Viro, Konstantin Khlebnikov, Andy Lutomirski, Jan Kara,
	yalin wang, Willy Tarreau, Andrew Morton, linux-mm, linux-kernel

On Tue 12-01-16 11:09:04, Kees Cook wrote:
> Normally, when a user can modify a file that has setuid or setgid bits,
> those bits are cleared when they are not the file owner or a member
> of the group. This is enforced when using write and truncate but not
> when writing to a shared mmap on the file. This could allow the file
> writer to gain privileges by changing a binary without losing the
> setuid/setgid/caps bits.
> 
> Changing the bits requires holding inode->i_mutex, so it cannot be done
> during the page fault (due to mmap_sem being held during the fault).
> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
> or added at mprotect time.
> 
> Since we can't do the check in the right place inside mmap (due to
> holding mmap_sem), we have to do it before holding mmap_sem, which
> means duplicating some checks, which have to be available to the non-MMU
> builds too.
> 
> When walking VMAs during mprotect, we need to drop mmap_sem (while
> holding a file reference) and restart the walk after clearing privileges.

...

> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>  
>  	vm_flags = calc_vm_prot_bits(prot);
>  
> +restart:
>  	down_write(&current->mm->mmap_sem);
>  
>  	vma = find_vma(current->mm, start);
> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>  			goto out;
>  		}
>  
> +		/*
> +		 * If we're adding write permissions to a shared file,
> +		 * we must clear privileges (like done at mmap time),
> +		 * but we have to juggle the locks to avoid holding
> +		 * mmap_sem while holding i_mutex.
> +		 */
> +		if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
> +		    (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
> +		    !IS_NOSEC(file_inode(vma->vm_file))) {

This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
is called. However that is not true for two reasons:

1) When you are root, SUID bit doesn't get cleared and thus you cannot set
IS_NOSEC.

2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
never true.

So in these cases you'll loop forever.

You can check SUID bits without i_mutex so that could be done without
dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
without i_mutex as that checks extended attributes (IMA) and that needs
i_mutex to be held to avoid races with someone else changing the attributes
under you.

Honestly, I don't see a way of implementing this in mprotect() which would
be reasonably elegant.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
@ 2016-01-13  9:03   ` Jan Kara
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2016-01-13  9:03 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexander Viro, Konstantin Khlebnikov, Andy Lutomirski, Jan Kara,
	yalin wang, Willy Tarreau, Andrew Morton, linux-mm, linux-kernel

On Tue 12-01-16 11:09:04, Kees Cook wrote:
> Normally, when a user can modify a file that has setuid or setgid bits,
> those bits are cleared when they are not the file owner or a member
> of the group. This is enforced when using write and truncate but not
> when writing to a shared mmap on the file. This could allow the file
> writer to gain privileges by changing a binary without losing the
> setuid/setgid/caps bits.
> 
> Changing the bits requires holding inode->i_mutex, so it cannot be done
> during the page fault (due to mmap_sem being held during the fault).
> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
> or added at mprotect time.
> 
> Since we can't do the check in the right place inside mmap (due to
> holding mmap_sem), we have to do it before holding mmap_sem, which
> means duplicating some checks, which have to be available to the non-MMU
> builds too.
> 
> When walking VMAs during mprotect, we need to drop mmap_sem (while
> holding a file reference) and restart the walk after clearing privileges.

...

> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>  
>  	vm_flags = calc_vm_prot_bits(prot);
>  
> +restart:
>  	down_write(&current->mm->mmap_sem);
>  
>  	vma = find_vma(current->mm, start);
> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>  			goto out;
>  		}
>  
> +		/*
> +		 * If we're adding write permissions to a shared file,
> +		 * we must clear privileges (like done at mmap time),
> +		 * but we have to juggle the locks to avoid holding
> +		 * mmap_sem while holding i_mutex.
> +		 */
> +		if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
> +		    (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
> +		    !IS_NOSEC(file_inode(vma->vm_file))) {

This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
is called. However that is not true for two reasons:

1) When you are root, SUID bit doesn't get cleared and thus you cannot set
IS_NOSEC.

2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
never true.

So in these cases you'll loop forever.

You can check SUID bits without i_mutex so that could be done without
dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
without i_mutex as that checks extended attributes (IMA) and that needs
i_mutex to be held to avoid races with someone else changing the attributes
under you.

Honestly, I don't see a way of implementing this in mprotect() which would
be reasonably elegant.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
  2016-01-13  9:03   ` Jan Kara
@ 2016-01-13 16:09     ` Kees Cook
  -1 siblings, 0 replies; 15+ messages in thread
From: Kees Cook @ 2016-01-13 16:09 UTC (permalink / raw)
  To: Jan Kara
  Cc: Alexander Viro, Konstantin Khlebnikov, Andy Lutomirski,
	yalin wang, Willy Tarreau, Andrew Morton, Linux-MM, LKML

On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@suse.cz> wrote:
> On Tue 12-01-16 11:09:04, Kees Cook wrote:
>> Normally, when a user can modify a file that has setuid or setgid bits,
>> those bits are cleared when they are not the file owner or a member
>> of the group. This is enforced when using write and truncate but not
>> when writing to a shared mmap on the file. This could allow the file
>> writer to gain privileges by changing a binary without losing the
>> setuid/setgid/caps bits.
>>
>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>> during the page fault (due to mmap_sem being held during the fault).
>> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
>> or added at mprotect time.
>>
>> Since we can't do the check in the right place inside mmap (due to
>> holding mmap_sem), we have to do it before holding mmap_sem, which
>> means duplicating some checks, which have to be available to the non-MMU
>> builds too.
>>
>> When walking VMAs during mprotect, we need to drop mmap_sem (while
>> holding a file reference) and restart the walk after clearing privileges.
>
> ...
>
>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>
>>       vm_flags = calc_vm_prot_bits(prot);
>>
>> +restart:
>>       down_write(&current->mm->mmap_sem);
>>
>>       vma = find_vma(current->mm, start);
>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>                       goto out;
>>               }
>>
>> +             /*
>> +              * If we're adding write permissions to a shared file,
>> +              * we must clear privileges (like done at mmap time),
>> +              * but we have to juggle the locks to avoid holding
>> +              * mmap_sem while holding i_mutex.
>> +              */
>> +             if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
>> +                 (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
>> +                 !IS_NOSEC(file_inode(vma->vm_file))) {
>
> This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
> is called. However that is not true for two reasons:
>
> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set
> IS_NOSEC.
>
> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
> never true.
>
> So in these cases you'll loop forever.

UUuugh.

>
> You can check SUID bits without i_mutex so that could be done without
> dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
> without i_mutex as that checks extended attributes (IMA) and that needs
> i_mutex to be held to avoid races with someone else changing the attributes
> under you.

Yeah, that's why I changed this from Konstantin's original suggestion.

> Honestly, I don't see a way of implementing this in mprotect() which would
> be reasonably elegant.

Konstantin, any thoughts here?

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
@ 2016-01-13 16:09     ` Kees Cook
  0 siblings, 0 replies; 15+ messages in thread
From: Kees Cook @ 2016-01-13 16:09 UTC (permalink / raw)
  To: Jan Kara
  Cc: Alexander Viro, Konstantin Khlebnikov, Andy Lutomirski,
	yalin wang, Willy Tarreau, Andrew Morton, Linux-MM, LKML

On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@suse.cz> wrote:
> On Tue 12-01-16 11:09:04, Kees Cook wrote:
>> Normally, when a user can modify a file that has setuid or setgid bits,
>> those bits are cleared when they are not the file owner or a member
>> of the group. This is enforced when using write and truncate but not
>> when writing to a shared mmap on the file. This could allow the file
>> writer to gain privileges by changing a binary without losing the
>> setuid/setgid/caps bits.
>>
>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>> during the page fault (due to mmap_sem being held during the fault).
>> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
>> or added at mprotect time.
>>
>> Since we can't do the check in the right place inside mmap (due to
>> holding mmap_sem), we have to do it before holding mmap_sem, which
>> means duplicating some checks, which have to be available to the non-MMU
>> builds too.
>>
>> When walking VMAs during mprotect, we need to drop mmap_sem (while
>> holding a file reference) and restart the walk after clearing privileges.
>
> ...
>
>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>
>>       vm_flags = calc_vm_prot_bits(prot);
>>
>> +restart:
>>       down_write(&current->mm->mmap_sem);
>>
>>       vma = find_vma(current->mm, start);
>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>                       goto out;
>>               }
>>
>> +             /*
>> +              * If we're adding write permissions to a shared file,
>> +              * we must clear privileges (like done at mmap time),
>> +              * but we have to juggle the locks to avoid holding
>> +              * mmap_sem while holding i_mutex.
>> +              */
>> +             if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
>> +                 (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
>> +                 !IS_NOSEC(file_inode(vma->vm_file))) {
>
> This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
> is called. However that is not true for two reasons:
>
> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set
> IS_NOSEC.
>
> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
> never true.
>
> So in these cases you'll loop forever.

UUuugh.

>
> You can check SUID bits without i_mutex so that could be done without
> dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
> without i_mutex as that checks extended attributes (IMA) and that needs
> i_mutex to be held to avoid races with someone else changing the attributes
> under you.

Yeah, that's why I changed this from Konstantin's original suggestion.

> Honestly, I don't see a way of implementing this in mprotect() which would
> be reasonably elegant.

Konstantin, any thoughts here?

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
  2016-01-13 16:09     ` Kees Cook
@ 2016-01-13 20:23       ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 15+ messages in thread
From: Konstantin Khlebnikov @ 2016-01-13 20:23 UTC (permalink / raw)
  To: Kees Cook
  Cc: Jan Kara, Alexander Viro, Andy Lutomirski, yalin wang,
	Willy Tarreau, Andrew Morton, Linux-MM, LKML

On Wed, Jan 13, 2016 at 7:09 PM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@suse.cz> wrote:
>> On Tue 12-01-16 11:09:04, Kees Cook wrote:
>>> Normally, when a user can modify a file that has setuid or setgid bits,
>>> those bits are cleared when they are not the file owner or a member
>>> of the group. This is enforced when using write and truncate but not
>>> when writing to a shared mmap on the file. This could allow the file
>>> writer to gain privileges by changing a binary without losing the
>>> setuid/setgid/caps bits.
>>>
>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>>> during the page fault (due to mmap_sem being held during the fault).
>>> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
>>> or added at mprotect time.
>>>
>>> Since we can't do the check in the right place inside mmap (due to
>>> holding mmap_sem), we have to do it before holding mmap_sem, which
>>> means duplicating some checks, which have to be available to the non-MMU
>>> builds too.
>>>
>>> When walking VMAs during mprotect, we need to drop mmap_sem (while
>>> holding a file reference) and restart the walk after clearing privileges.
>>
>> ...
>>
>>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>
>>>       vm_flags = calc_vm_prot_bits(prot);
>>>
>>> +restart:
>>>       down_write(&current->mm->mmap_sem);
>>>
>>>       vma = find_vma(current->mm, start);
>>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>                       goto out;
>>>               }
>>>
>>> +             /*
>>> +              * If we're adding write permissions to a shared file,
>>> +              * we must clear privileges (like done at mmap time),
>>> +              * but we have to juggle the locks to avoid holding
>>> +              * mmap_sem while holding i_mutex.
>>> +              */
>>> +             if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
>>> +                 (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
>>> +                 !IS_NOSEC(file_inode(vma->vm_file))) {
>>
>> This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
>> is called. However that is not true for two reasons:
>>
>> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set
>> IS_NOSEC.
>>
>> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
>> never true.
>>
>> So in these cases you'll loop forever.
>
> UUuugh.
>
>>
>> You can check SUID bits without i_mutex so that could be done without
>> dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
>> without i_mutex as that checks extended attributes (IMA) and that needs
>> i_mutex to be held to avoid races with someone else changing the attributes
>> under you.
>
> Yeah, that's why I changed this from Konstantin's original suggestion.
>
>> Honestly, I don't see a way of implementing this in mprotect() which would
>> be reasonably elegant.
>
> Konstantin, any thoughts here?

Getxattr works fine without i_mutex: sys_getxattr/vfs_getxattr doesn't lock it.
If somebody changes xattrs under us we'll end up in race anyway.
But this still safe: setxattrs are sychronized.

>
> -Kees
>
> --
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
@ 2016-01-13 20:23       ` Konstantin Khlebnikov
  0 siblings, 0 replies; 15+ messages in thread
From: Konstantin Khlebnikov @ 2016-01-13 20:23 UTC (permalink / raw)
  To: Kees Cook
  Cc: Jan Kara, Alexander Viro, Andy Lutomirski, yalin wang,
	Willy Tarreau, Andrew Morton, Linux-MM, LKML

On Wed, Jan 13, 2016 at 7:09 PM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@suse.cz> wrote:
>> On Tue 12-01-16 11:09:04, Kees Cook wrote:
>>> Normally, when a user can modify a file that has setuid or setgid bits,
>>> those bits are cleared when they are not the file owner or a member
>>> of the group. This is enforced when using write and truncate but not
>>> when writing to a shared mmap on the file. This could allow the file
>>> writer to gain privileges by changing a binary without losing the
>>> setuid/setgid/caps bits.
>>>
>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>>> during the page fault (due to mmap_sem being held during the fault).
>>> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
>>> or added at mprotect time.
>>>
>>> Since we can't do the check in the right place inside mmap (due to
>>> holding mmap_sem), we have to do it before holding mmap_sem, which
>>> means duplicating some checks, which have to be available to the non-MMU
>>> builds too.
>>>
>>> When walking VMAs during mprotect, we need to drop mmap_sem (while
>>> holding a file reference) and restart the walk after clearing privileges.
>>
>> ...
>>
>>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>
>>>       vm_flags = calc_vm_prot_bits(prot);
>>>
>>> +restart:
>>>       down_write(&current->mm->mmap_sem);
>>>
>>>       vma = find_vma(current->mm, start);
>>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>                       goto out;
>>>               }
>>>
>>> +             /*
>>> +              * If we're adding write permissions to a shared file,
>>> +              * we must clear privileges (like done at mmap time),
>>> +              * but we have to juggle the locks to avoid holding
>>> +              * mmap_sem while holding i_mutex.
>>> +              */
>>> +             if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
>>> +                 (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
>>> +                 !IS_NOSEC(file_inode(vma->vm_file))) {
>>
>> This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
>> is called. However that is not true for two reasons:
>>
>> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set
>> IS_NOSEC.
>>
>> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
>> never true.
>>
>> So in these cases you'll loop forever.
>
> UUuugh.
>
>>
>> You can check SUID bits without i_mutex so that could be done without
>> dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
>> without i_mutex as that checks extended attributes (IMA) and that needs
>> i_mutex to be held to avoid races with someone else changing the attributes
>> under you.
>
> Yeah, that's why I changed this from Konstantin's original suggestion.
>
>> Honestly, I don't see a way of implementing this in mprotect() which would
>> be reasonably elegant.
>
> Konstantin, any thoughts here?

Getxattr works fine without i_mutex: sys_getxattr/vfs_getxattr doesn't lock it.
If somebody changes xattrs under us we'll end up in race anyway.
But this still safe: setxattrs are sychronized.

>
> -Kees
>
> --
> Kees Cook
> Chrome OS & Brillo Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
  2016-01-13 20:23       ` Konstantin Khlebnikov
@ 2016-01-13 20:33         ` Kees Cook
  -1 siblings, 0 replies; 15+ messages in thread
From: Kees Cook @ 2016-01-13 20:33 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Jan Kara, Alexander Viro, Andy Lutomirski, yalin wang,
	Willy Tarreau, Andrew Morton, Linux-MM, LKML

On Wed, Jan 13, 2016 at 12:23 PM, Konstantin Khlebnikov
<koct9i@gmail.com> wrote:
> On Wed, Jan 13, 2016 at 7:09 PM, Kees Cook <keescook@chromium.org> wrote:
>> On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@suse.cz> wrote:
>>> On Tue 12-01-16 11:09:04, Kees Cook wrote:
>>>> Normally, when a user can modify a file that has setuid or setgid bits,
>>>> those bits are cleared when they are not the file owner or a member
>>>> of the group. This is enforced when using write and truncate but not
>>>> when writing to a shared mmap on the file. This could allow the file
>>>> writer to gain privileges by changing a binary without losing the
>>>> setuid/setgid/caps bits.
>>>>
>>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>>>> during the page fault (due to mmap_sem being held during the fault).
>>>> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
>>>> or added at mprotect time.
>>>>
>>>> Since we can't do the check in the right place inside mmap (due to
>>>> holding mmap_sem), we have to do it before holding mmap_sem, which
>>>> means duplicating some checks, which have to be available to the non-MMU
>>>> builds too.
>>>>
>>>> When walking VMAs during mprotect, we need to drop mmap_sem (while
>>>> holding a file reference) and restart the walk after clearing privileges.
>>>
>>> ...
>>>
>>>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>>
>>>>       vm_flags = calc_vm_prot_bits(prot);
>>>>
>>>> +restart:
>>>>       down_write(&current->mm->mmap_sem);
>>>>
>>>>       vma = find_vma(current->mm, start);
>>>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>>                       goto out;
>>>>               }
>>>>
>>>> +             /*
>>>> +              * If we're adding write permissions to a shared file,
>>>> +              * we must clear privileges (like done at mmap time),
>>>> +              * but we have to juggle the locks to avoid holding
>>>> +              * mmap_sem while holding i_mutex.
>>>> +              */
>>>> +             if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
>>>> +                 (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
>>>> +                 !IS_NOSEC(file_inode(vma->vm_file))) {
>>>
>>> This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
>>> is called. However that is not true for two reasons:
>>>
>>> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set
>>> IS_NOSEC.
>>>
>>> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
>>> never true.
>>>
>>> So in these cases you'll loop forever.
>>
>> UUuugh.
>>
>>>
>>> You can check SUID bits without i_mutex so that could be done without
>>> dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
>>> without i_mutex as that checks extended attributes (IMA) and that needs
>>> i_mutex to be held to avoid races with someone else changing the attributes
>>> under you.
>>
>> Yeah, that's why I changed this from Konstantin's original suggestion.
>>
>>> Honestly, I don't see a way of implementing this in mprotect() which would
>>> be reasonably elegant.
>>
>> Konstantin, any thoughts here?
>
> Getxattr works fine without i_mutex: sys_getxattr/vfs_getxattr doesn't lock it.
> If somebody changes xattrs under us we'll end up in race anyway.
> But this still safe: setxattrs are sychronized.

So I can swap my IS_NOSEC for your original file_needs_remove_privs()?
Are the LSM hooks expecting to be called under mm_sem? (Looks like
only common_caps implements that, though.)

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
@ 2016-01-13 20:33         ` Kees Cook
  0 siblings, 0 replies; 15+ messages in thread
From: Kees Cook @ 2016-01-13 20:33 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Jan Kara, Alexander Viro, Andy Lutomirski, yalin wang,
	Willy Tarreau, Andrew Morton, Linux-MM, LKML

On Wed, Jan 13, 2016 at 12:23 PM, Konstantin Khlebnikov
<koct9i@gmail.com> wrote:
> On Wed, Jan 13, 2016 at 7:09 PM, Kees Cook <keescook@chromium.org> wrote:
>> On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@suse.cz> wrote:
>>> On Tue 12-01-16 11:09:04, Kees Cook wrote:
>>>> Normally, when a user can modify a file that has setuid or setgid bits,
>>>> those bits are cleared when they are not the file owner or a member
>>>> of the group. This is enforced when using write and truncate but not
>>>> when writing to a shared mmap on the file. This could allow the file
>>>> writer to gain privileges by changing a binary without losing the
>>>> setuid/setgid/caps bits.
>>>>
>>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>>>> during the page fault (due to mmap_sem being held during the fault).
>>>> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
>>>> or added at mprotect time.
>>>>
>>>> Since we can't do the check in the right place inside mmap (due to
>>>> holding mmap_sem), we have to do it before holding mmap_sem, which
>>>> means duplicating some checks, which have to be available to the non-MMU
>>>> builds too.
>>>>
>>>> When walking VMAs during mprotect, we need to drop mmap_sem (while
>>>> holding a file reference) and restart the walk after clearing privileges.
>>>
>>> ...
>>>
>>>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>>
>>>>       vm_flags = calc_vm_prot_bits(prot);
>>>>
>>>> +restart:
>>>>       down_write(&current->mm->mmap_sem);
>>>>
>>>>       vma = find_vma(current->mm, start);
>>>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>>                       goto out;
>>>>               }
>>>>
>>>> +             /*
>>>> +              * If we're adding write permissions to a shared file,
>>>> +              * we must clear privileges (like done at mmap time),
>>>> +              * but we have to juggle the locks to avoid holding
>>>> +              * mmap_sem while holding i_mutex.
>>>> +              */
>>>> +             if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
>>>> +                 (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
>>>> +                 !IS_NOSEC(file_inode(vma->vm_file))) {
>>>
>>> This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
>>> is called. However that is not true for two reasons:
>>>
>>> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set
>>> IS_NOSEC.
>>>
>>> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
>>> never true.
>>>
>>> So in these cases you'll loop forever.
>>
>> UUuugh.
>>
>>>
>>> You can check SUID bits without i_mutex so that could be done without
>>> dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
>>> without i_mutex as that checks extended attributes (IMA) and that needs
>>> i_mutex to be held to avoid races with someone else changing the attributes
>>> under you.
>>
>> Yeah, that's why I changed this from Konstantin's original suggestion.
>>
>>> Honestly, I don't see a way of implementing this in mprotect() which would
>>> be reasonably elegant.
>>
>> Konstantin, any thoughts here?
>
> Getxattr works fine without i_mutex: sys_getxattr/vfs_getxattr doesn't lock it.
> If somebody changes xattrs under us we'll end up in race anyway.
> But this still safe: setxattrs are sychronized.

So I can swap my IS_NOSEC for your original file_needs_remove_privs()?
Are the LSM hooks expecting to be called under mm_sem? (Looks like
only common_caps implements that, though.)

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
  2016-01-13 20:33         ` Kees Cook
@ 2016-01-14  7:35           ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 15+ messages in thread
From: Konstantin Khlebnikov @ 2016-01-14  7:35 UTC (permalink / raw)
  To: Kees Cook
  Cc: Jan Kara, Alexander Viro, Andy Lutomirski, yalin wang,
	Willy Tarreau, Andrew Morton, Linux-MM, LKML

On Wed, Jan 13, 2016 at 11:33 PM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Jan 13, 2016 at 12:23 PM, Konstantin Khlebnikov
> <koct9i@gmail.com> wrote:
>> On Wed, Jan 13, 2016 at 7:09 PM, Kees Cook <keescook@chromium.org> wrote:
>>> On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@suse.cz> wrote:
>>>> On Tue 12-01-16 11:09:04, Kees Cook wrote:
>>>>> Normally, when a user can modify a file that has setuid or setgid bits,
>>>>> those bits are cleared when they are not the file owner or a member
>>>>> of the group. This is enforced when using write and truncate but not
>>>>> when writing to a shared mmap on the file. This could allow the file
>>>>> writer to gain privileges by changing a binary without losing the
>>>>> setuid/setgid/caps bits.
>>>>>
>>>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>>>>> during the page fault (due to mmap_sem being held during the fault).
>>>>> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
>>>>> or added at mprotect time.
>>>>>
>>>>> Since we can't do the check in the right place inside mmap (due to
>>>>> holding mmap_sem), we have to do it before holding mmap_sem, which
>>>>> means duplicating some checks, which have to be available to the non-MMU
>>>>> builds too.
>>>>>
>>>>> When walking VMAs during mprotect, we need to drop mmap_sem (while
>>>>> holding a file reference) and restart the walk after clearing privileges.
>>>>
>>>> ...
>>>>
>>>>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>>>
>>>>>       vm_flags = calc_vm_prot_bits(prot);
>>>>>
>>>>> +restart:
>>>>>       down_write(&current->mm->mmap_sem);
>>>>>
>>>>>       vma = find_vma(current->mm, start);
>>>>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>>>                       goto out;
>>>>>               }
>>>>>
>>>>> +             /*
>>>>> +              * If we're adding write permissions to a shared file,
>>>>> +              * we must clear privileges (like done at mmap time),
>>>>> +              * but we have to juggle the locks to avoid holding
>>>>> +              * mmap_sem while holding i_mutex.
>>>>> +              */
>>>>> +             if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
>>>>> +                 (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
>>>>> +                 !IS_NOSEC(file_inode(vma->vm_file))) {
>>>>
>>>> This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
>>>> is called. However that is not true for two reasons:
>>>>
>>>> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set
>>>> IS_NOSEC.
>>>>
>>>> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
>>>> never true.
>>>>
>>>> So in these cases you'll loop forever.
>>>
>>> UUuugh.
>>>
>>>>
>>>> You can check SUID bits without i_mutex so that could be done without
>>>> dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
>>>> without i_mutex as that checks extended attributes (IMA) and that needs
>>>> i_mutex to be held to avoid races with someone else changing the attributes
>>>> under you.
>>>
>>> Yeah, that's why I changed this from Konstantin's original suggestion.
>>>
>>>> Honestly, I don't see a way of implementing this in mprotect() which would
>>>> be reasonably elegant.
>>>
>>> Konstantin, any thoughts here?
>>
>> Getxattr works fine without i_mutex: sys_getxattr/vfs_getxattr doesn't lock it.
>> If somebody changes xattrs under us we'll end up in race anyway.
>> But this still safe: setxattrs are sychronized.
>
> So I can swap my IS_NOSEC for your original file_needs_remove_privs()?
> Are the LSM hooks expecting to be called under mm_sem? (Looks like
> only common_caps implements that, though.)

getxattr should nests inside mmap_sem safely: it has sort of
"readpage" semantics,
actually ext4 uses it when inlines content of tiny files into xattr.

>
> -Kees
>
> --
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
@ 2016-01-14  7:35           ` Konstantin Khlebnikov
  0 siblings, 0 replies; 15+ messages in thread
From: Konstantin Khlebnikov @ 2016-01-14  7:35 UTC (permalink / raw)
  To: Kees Cook
  Cc: Jan Kara, Alexander Viro, Andy Lutomirski, yalin wang,
	Willy Tarreau, Andrew Morton, Linux-MM, LKML

On Wed, Jan 13, 2016 at 11:33 PM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Jan 13, 2016 at 12:23 PM, Konstantin Khlebnikov
> <koct9i@gmail.com> wrote:
>> On Wed, Jan 13, 2016 at 7:09 PM, Kees Cook <keescook@chromium.org> wrote:
>>> On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@suse.cz> wrote:
>>>> On Tue 12-01-16 11:09:04, Kees Cook wrote:
>>>>> Normally, when a user can modify a file that has setuid or setgid bits,
>>>>> those bits are cleared when they are not the file owner or a member
>>>>> of the group. This is enforced when using write and truncate but not
>>>>> when writing to a shared mmap on the file. This could allow the file
>>>>> writer to gain privileges by changing a binary without losing the
>>>>> setuid/setgid/caps bits.
>>>>>
>>>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>>>>> during the page fault (due to mmap_sem being held during the fault).
>>>>> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
>>>>> or added at mprotect time.
>>>>>
>>>>> Since we can't do the check in the right place inside mmap (due to
>>>>> holding mmap_sem), we have to do it before holding mmap_sem, which
>>>>> means duplicating some checks, which have to be available to the non-MMU
>>>>> builds too.
>>>>>
>>>>> When walking VMAs during mprotect, we need to drop mmap_sem (while
>>>>> holding a file reference) and restart the walk after clearing privileges.
>>>>
>>>> ...
>>>>
>>>>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>>>
>>>>>       vm_flags = calc_vm_prot_bits(prot);
>>>>>
>>>>> +restart:
>>>>>       down_write(&current->mm->mmap_sem);
>>>>>
>>>>>       vma = find_vma(current->mm, start);
>>>>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>>>                       goto out;
>>>>>               }
>>>>>
>>>>> +             /*
>>>>> +              * If we're adding write permissions to a shared file,
>>>>> +              * we must clear privileges (like done at mmap time),
>>>>> +              * but we have to juggle the locks to avoid holding
>>>>> +              * mmap_sem while holding i_mutex.
>>>>> +              */
>>>>> +             if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
>>>>> +                 (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
>>>>> +                 !IS_NOSEC(file_inode(vma->vm_file))) {
>>>>
>>>> This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
>>>> is called. However that is not true for two reasons:
>>>>
>>>> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set
>>>> IS_NOSEC.
>>>>
>>>> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
>>>> never true.
>>>>
>>>> So in these cases you'll loop forever.
>>>
>>> UUuugh.
>>>
>>>>
>>>> You can check SUID bits without i_mutex so that could be done without
>>>> dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
>>>> without i_mutex as that checks extended attributes (IMA) and that needs
>>>> i_mutex to be held to avoid races with someone else changing the attributes
>>>> under you.
>>>
>>> Yeah, that's why I changed this from Konstantin's original suggestion.
>>>
>>>> Honestly, I don't see a way of implementing this in mprotect() which would
>>>> be reasonably elegant.
>>>
>>> Konstantin, any thoughts here?
>>
>> Getxattr works fine without i_mutex: sys_getxattr/vfs_getxattr doesn't lock it.
>> If somebody changes xattrs under us we'll end up in race anyway.
>> But this still safe: setxattrs are sychronized.
>
> So I can swap my IS_NOSEC for your original file_needs_remove_privs()?
> Are the LSM hooks expecting to be called under mm_sem? (Looks like
> only common_caps implements that, though.)

getxattr should nests inside mmap_sem safely: it has sort of
"readpage" semantics,
actually ext4 uses it when inlines content of tiny files into xattr.

>
> -Kees
>
> --
> Kees Cook
> Chrome OS & Brillo Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
  2016-01-14  7:35           ` Konstantin Khlebnikov
  (?)
@ 2016-01-15 10:17             ` Jan Kara
  -1 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2016-01-15 10:17 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Kees Cook, Jan Kara, Alexander Viro, Andy Lutomirski, yalin wang,
	Willy Tarreau, Andrew Morton, Linux-MM, LKML, mfasheh,
	ocfs2-devel

On Thu 14-01-16 10:35:17, Konstantin Khlebnikov wrote:
> On Wed, Jan 13, 2016 at 11:33 PM, Kees Cook <keescook@chromium.org> wrote:
> > On Wed, Jan 13, 2016 at 12:23 PM, Konstantin Khlebnikov
> > <koct9i@gmail.com> wrote:
> >> On Wed, Jan 13, 2016 at 7:09 PM, Kees Cook <keescook@chromium.org> wrote:
> >>> On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@suse.cz> wrote:
> >>>> On Tue 12-01-16 11:09:04, Kees Cook wrote:
> >>>>> Normally, when a user can modify a file that has setuid or setgid bits,
> >>>>> those bits are cleared when they are not the file owner or a member
> >>>>> of the group. This is enforced when using write and truncate but not
> >>>>> when writing to a shared mmap on the file. This could allow the file
> >>>>> writer to gain privileges by changing a binary without losing the
> >>>>> setuid/setgid/caps bits.
> >>>>>
> >>>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
> >>>>> during the page fault (due to mmap_sem being held during the fault).
> >>>>> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
> >>>>> or added at mprotect time.
> >>>>>
> >>>>> Since we can't do the check in the right place inside mmap (due to
> >>>>> holding mmap_sem), we have to do it before holding mmap_sem, which
> >>>>> means duplicating some checks, which have to be available to the non-MMU
> >>>>> builds too.
> >>>>>
> >>>>> When walking VMAs during mprotect, we need to drop mmap_sem (while
> >>>>> holding a file reference) and restart the walk after clearing privileges.
> >>>>
> >>>> ...
> >>>>
> >>>>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
> >>>>>
> >>>>>       vm_flags = calc_vm_prot_bits(prot);
> >>>>>
> >>>>> +restart:
> >>>>>       down_write(&current->mm->mmap_sem);
> >>>>>
> >>>>>       vma = find_vma(current->mm, start);
> >>>>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
> >>>>>                       goto out;
> >>>>>               }
> >>>>>
> >>>>> +             /*
> >>>>> +              * If we're adding write permissions to a shared file,
> >>>>> +              * we must clear privileges (like done at mmap time),
> >>>>> +              * but we have to juggle the locks to avoid holding
> >>>>> +              * mmap_sem while holding i_mutex.
> >>>>> +              */
> >>>>> +             if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
> >>>>> +                 (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
> >>>>> +                 !IS_NOSEC(file_inode(vma->vm_file))) {
> >>>>
> >>>> This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
> >>>> is called. However that is not true for two reasons:
> >>>>
> >>>> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set
> >>>> IS_NOSEC.
> >>>>
> >>>> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
> >>>> never true.
> >>>>
> >>>> So in these cases you'll loop forever.
> >>>
> >>> UUuugh.
> >>>
> >>>>
> >>>> You can check SUID bits without i_mutex so that could be done without
> >>>> dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
> >>>> without i_mutex as that checks extended attributes (IMA) and that needs
> >>>> i_mutex to be held to avoid races with someone else changing the attributes
> >>>> under you.
> >>>
> >>> Yeah, that's why I changed this from Konstantin's original suggestion.
> >>>
> >>>> Honestly, I don't see a way of implementing this in mprotect() which would
> >>>> be reasonably elegant.
> >>>
> >>> Konstantin, any thoughts here?
> >>
> >> Getxattr works fine without i_mutex: sys_getxattr/vfs_getxattr doesn't lock it.
> >> If somebody changes xattrs under us we'll end up in race anyway.
> >> But this still safe: setxattrs are sychronized.
> >
> > So I can swap my IS_NOSEC for your original file_needs_remove_privs()?
> > Are the LSM hooks expecting to be called under mm_sem? (Looks like
> > only common_caps implements that, though.)
> 
> getxattr should nests inside mmap_sem safely: it has sort of
> "readpage" semantics,
> actually ext4 uses it when inlines content of tiny files into xattr.

First, sorry Kees for misleading you. Somehow I missed that i_mutex is not
actually acquired for getxattr() calls.

I have checked and lots of filesystems have dedicated xattr semaphore which
should be safe to nest inside mmap_sem. There are filesystems like ocfs2 or
gfs2 which use their equivalent of i_mutex for xattr locking so there we
would create lock inversion when calling file_needs_remove_privs() from
under mmap_sem.

That being said at least OCFS2 has other issues with this xattr locking
scheme and they are working on changing things AFAIK. Mark can you perhaps
comment?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Ocfs2-devel] [PATCH v8] fs: clear file privilege bits when mmap writing
@ 2016-01-15 10:17             ` Jan Kara
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2016-01-15 10:17 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Kees Cook, Jan Kara, Alexander Viro, Andy Lutomirski, yalin wang,
	Willy Tarreau, Andrew Morton, Linux-MM, LKML, mfasheh,
	ocfs2-devel

On Thu 14-01-16 10:35:17, Konstantin Khlebnikov wrote:
> On Wed, Jan 13, 2016 at 11:33 PM, Kees Cook <keescook@chromium.org> wrote:
> > On Wed, Jan 13, 2016 at 12:23 PM, Konstantin Khlebnikov
> > <koct9i@gmail.com> wrote:
> >> On Wed, Jan 13, 2016 at 7:09 PM, Kees Cook <keescook@chromium.org> wrote:
> >>> On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@suse.cz> wrote:
> >>>> On Tue 12-01-16 11:09:04, Kees Cook wrote:
> >>>>> Normally, when a user can modify a file that has setuid or setgid bits,
> >>>>> those bits are cleared when they are not the file owner or a member
> >>>>> of the group. This is enforced when using write and truncate but not
> >>>>> when writing to a shared mmap on the file. This could allow the file
> >>>>> writer to gain privileges by changing a binary without losing the
> >>>>> setuid/setgid/caps bits.
> >>>>>
> >>>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
> >>>>> during the page fault (due to mmap_sem being held during the fault).
> >>>>> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
> >>>>> or added at mprotect time.
> >>>>>
> >>>>> Since we can't do the check in the right place inside mmap (due to
> >>>>> holding mmap_sem), we have to do it before holding mmap_sem, which
> >>>>> means duplicating some checks, which have to be available to the non-MMU
> >>>>> builds too.
> >>>>>
> >>>>> When walking VMAs during mprotect, we need to drop mmap_sem (while
> >>>>> holding a file reference) and restart the walk after clearing privileges.
> >>>>
> >>>> ...
> >>>>
> >>>>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
> >>>>>
> >>>>>       vm_flags = calc_vm_prot_bits(prot);
> >>>>>
> >>>>> +restart:
> >>>>>       down_write(&current->mm->mmap_sem);
> >>>>>
> >>>>>       vma = find_vma(current->mm, start);
> >>>>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
> >>>>>                       goto out;
> >>>>>               }
> >>>>>
> >>>>> +             /*
> >>>>> +              * If we're adding write permissions to a shared file,
> >>>>> +              * we must clear privileges (like done at mmap time),
> >>>>> +              * but we have to juggle the locks to avoid holding
> >>>>> +              * mmap_sem while holding i_mutex.
> >>>>> +              */
> >>>>> +             if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
> >>>>> +                 (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
> >>>>> +                 !IS_NOSEC(file_inode(vma->vm_file))) {
> >>>>
> >>>> This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
> >>>> is called. However that is not true for two reasons:
> >>>>
> >>>> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set
> >>>> IS_NOSEC.
> >>>>
> >>>> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
> >>>> never true.
> >>>>
> >>>> So in these cases you'll loop forever.
> >>>
> >>> UUuugh.
> >>>
> >>>>
> >>>> You can check SUID bits without i_mutex so that could be done without
> >>>> dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
> >>>> without i_mutex as that checks extended attributes (IMA) and that needs
> >>>> i_mutex to be held to avoid races with someone else changing the attributes
> >>>> under you.
> >>>
> >>> Yeah, that's why I changed this from Konstantin's original suggestion.
> >>>
> >>>> Honestly, I don't see a way of implementing this in mprotect() which would
> >>>> be reasonably elegant.
> >>>
> >>> Konstantin, any thoughts here?
> >>
> >> Getxattr works fine without i_mutex: sys_getxattr/vfs_getxattr doesn't lock it.
> >> If somebody changes xattrs under us we'll end up in race anyway.
> >> But this still safe: setxattrs are sychronized.
> >
> > So I can swap my IS_NOSEC for your original file_needs_remove_privs()?
> > Are the LSM hooks expecting to be called under mm_sem? (Looks like
> > only common_caps implements that, though.)
> 
> getxattr should nests inside mmap_sem safely: it has sort of
> "readpage" semantics,
> actually ext4 uses it when inlines content of tiny files into xattr.

First, sorry Kees for misleading you. Somehow I missed that i_mutex is not
actually acquired for getxattr() calls.

I have checked and lots of filesystems have dedicated xattr semaphore which
should be safe to nest inside mmap_sem. There are filesystems like ocfs2 or
gfs2 which use their equivalent of i_mutex for xattr locking so there we
would create lock inversion when calling file_needs_remove_privs() from
under mmap_sem.

That being said at least OCFS2 has other issues with this xattr locking
scheme and they are working on changing things AFAIK. Mark can you perhaps
comment?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
@ 2016-01-15 10:17             ` Jan Kara
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2016-01-15 10:17 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Kees Cook, Jan Kara, Alexander Viro, Andy Lutomirski, yalin wang,
	Willy Tarreau, Andrew Morton, Linux-MM, LKML, mfasheh,
	ocfs2-devel

On Thu 14-01-16 10:35:17, Konstantin Khlebnikov wrote:
> On Wed, Jan 13, 2016 at 11:33 PM, Kees Cook <keescook@chromium.org> wrote:
> > On Wed, Jan 13, 2016 at 12:23 PM, Konstantin Khlebnikov
> > <koct9i@gmail.com> wrote:
> >> On Wed, Jan 13, 2016 at 7:09 PM, Kees Cook <keescook@chromium.org> wrote:
> >>> On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@suse.cz> wrote:
> >>>> On Tue 12-01-16 11:09:04, Kees Cook wrote:
> >>>>> Normally, when a user can modify a file that has setuid or setgid bits,
> >>>>> those bits are cleared when they are not the file owner or a member
> >>>>> of the group. This is enforced when using write and truncate but not
> >>>>> when writing to a shared mmap on the file. This could allow the file
> >>>>> writer to gain privileges by changing a binary without losing the
> >>>>> setuid/setgid/caps bits.
> >>>>>
> >>>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
> >>>>> during the page fault (due to mmap_sem being held during the fault).
> >>>>> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
> >>>>> or added at mprotect time.
> >>>>>
> >>>>> Since we can't do the check in the right place inside mmap (due to
> >>>>> holding mmap_sem), we have to do it before holding mmap_sem, which
> >>>>> means duplicating some checks, which have to be available to the non-MMU
> >>>>> builds too.
> >>>>>
> >>>>> When walking VMAs during mprotect, we need to drop mmap_sem (while
> >>>>> holding a file reference) and restart the walk after clearing privileges.
> >>>>
> >>>> ...
> >>>>
> >>>>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
> >>>>>
> >>>>>       vm_flags = calc_vm_prot_bits(prot);
> >>>>>
> >>>>> +restart:
> >>>>>       down_write(&current->mm->mmap_sem);
> >>>>>
> >>>>>       vma = find_vma(current->mm, start);
> >>>>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
> >>>>>                       goto out;
> >>>>>               }
> >>>>>
> >>>>> +             /*
> >>>>> +              * If we're adding write permissions to a shared file,
> >>>>> +              * we must clear privileges (like done at mmap time),
> >>>>> +              * but we have to juggle the locks to avoid holding
> >>>>> +              * mmap_sem while holding i_mutex.
> >>>>> +              */
> >>>>> +             if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
> >>>>> +                 (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
> >>>>> +                 !IS_NOSEC(file_inode(vma->vm_file))) {
> >>>>
> >>>> This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
> >>>> is called. However that is not true for two reasons:
> >>>>
> >>>> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set
> >>>> IS_NOSEC.
> >>>>
> >>>> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
> >>>> never true.
> >>>>
> >>>> So in these cases you'll loop forever.
> >>>
> >>> UUuugh.
> >>>
> >>>>
> >>>> You can check SUID bits without i_mutex so that could be done without
> >>>> dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
> >>>> without i_mutex as that checks extended attributes (IMA) and that needs
> >>>> i_mutex to be held to avoid races with someone else changing the attributes
> >>>> under you.
> >>>
> >>> Yeah, that's why I changed this from Konstantin's original suggestion.
> >>>
> >>>> Honestly, I don't see a way of implementing this in mprotect() which would
> >>>> be reasonably elegant.
> >>>
> >>> Konstantin, any thoughts here?
> >>
> >> Getxattr works fine without i_mutex: sys_getxattr/vfs_getxattr doesn't lock it.
> >> If somebody changes xattrs under us we'll end up in race anyway.
> >> But this still safe: setxattrs are sychronized.
> >
> > So I can swap my IS_NOSEC for your original file_needs_remove_privs()?
> > Are the LSM hooks expecting to be called under mm_sem? (Looks like
> > only common_caps implements that, though.)
> 
> getxattr should nests inside mmap_sem safely: it has sort of
> "readpage" semantics,
> actually ext4 uses it when inlines content of tiny files into xattr.

First, sorry Kees for misleading you. Somehow I missed that i_mutex is not
actually acquired for getxattr() calls.

I have checked and lots of filesystems have dedicated xattr semaphore which
should be safe to nest inside mmap_sem. There are filesystems like ocfs2 or
gfs2 which use their equivalent of i_mutex for xattr locking so there we
would create lock inversion when calling file_needs_remove_privs() from
under mmap_sem.

That being said at least OCFS2 has other issues with this xattr locking
scheme and they are working on changing things AFAIK. Mark can you perhaps
comment?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-01-15 10:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-12 19:09 [PATCH v8] fs: clear file privilege bits when mmap writing Kees Cook
2016-01-12 19:09 ` Kees Cook
2016-01-13  9:03 ` Jan Kara
2016-01-13  9:03   ` Jan Kara
2016-01-13 16:09   ` Kees Cook
2016-01-13 16:09     ` Kees Cook
2016-01-13 20:23     ` Konstantin Khlebnikov
2016-01-13 20:23       ` Konstantin Khlebnikov
2016-01-13 20:33       ` Kees Cook
2016-01-13 20:33         ` Kees Cook
2016-01-14  7:35         ` Konstantin Khlebnikov
2016-01-14  7:35           ` Konstantin Khlebnikov
2016-01-15 10:17           ` Jan Kara
2016-01-15 10:17             ` Jan Kara
2016-01-15 10:17             ` [Ocfs2-devel] " Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.