* [PATCH v8] fs: clear file privilege bits when mmap writing
@ 2016-01-12 19:09 ` Kees Cook
0 siblings, 0 replies; 15+ messages in thread
From: Kees Cook @ 2016-01-12 19:09 UTC (permalink / raw)
To: Alexander Viro
Cc: Konstantin Khlebnikov, Andy Lutomirski, Jan Kara, yalin wang,
Willy Tarreau, Andrew Morton, linux-mm, linux-kernel
Normally, when a user can modify a file that has setuid or setgid bits,
those bits are cleared when they are not the file owner or a member
of the group. This is enforced when using write and truncate but not
when writing to a shared mmap on the file. This could allow the file
writer to gain privileges by changing a binary without losing the
setuid/setgid/caps bits.
Changing the bits requires holding inode->i_mutex, so it cannot be done
during the page fault (due to mmap_sem being held during the fault).
Instead, clear the bits if PROT_WRITE is being used at mmap open time,
or added at mprotect time.
Since we can't do the check in the right place inside mmap (due to
holding mmap_sem), we have to do it before holding mmap_sem, which
means duplicating some checks, which have to be available to the non-MMU
builds too.
When walking VMAs during mprotect, we need to drop mmap_sem (while
holding a file reference) and restart the walk after clearing privileges.
Signed-off-by: Kees Cook <keescook@chromium.org>
---
v8:
- use mmap/mprotect method, with mprotect walk restart, thanks to koct9i
v7:
- document and avoid arch-specific O_* values, viro
v6:
- clarify ETXTBSY situation in comments, luto
v5:
- add to f_flags instead, viro
- add i_mutex during __fput, jack
v4:
- delay removal instead of still needing mmap_sem for mprotect, yalin
v3:
- move outside of mmap_sem for real now, fengguang
- check return code of file_remove_privs, akpm
v2:
- move to mmap from fault handler, jack
---
include/linux/mm.h | 1 +
mm/mmap.c | 20 ++++----------------
mm/mprotect.c | 24 ++++++++++++++++++++++++
mm/util.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 79 insertions(+), 16 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 00bad7793788..b264c8be7114 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1912,6 +1912,7 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo
extern unsigned long mmap_region(struct file *file, unsigned long addr,
unsigned long len, vm_flags_t vm_flags, unsigned long pgoff);
+extern int do_mmap_shared_checks(struct file *file, unsigned long prot);
extern unsigned long do_mmap(struct file *file, unsigned long addr,
unsigned long len, unsigned long prot, unsigned long flags,
vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate);
diff --git a/mm/mmap.c b/mm/mmap.c
index 2ce04a649f6b..b3424db0a29e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1320,25 +1320,13 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
return -EAGAIN;
if (file) {
- struct inode *inode = file_inode(file);
+ int err;
switch (flags & MAP_TYPE) {
case MAP_SHARED:
- if ((prot&PROT_WRITE) && !(file->f_mode&FMODE_WRITE))
- return -EACCES;
-
- /*
- * Make sure we don't allow writing to an append-only
- * file..
- */
- if (IS_APPEND(inode) && (file->f_mode & FMODE_WRITE))
- return -EACCES;
-
- /*
- * Make sure there are no mandatory locks on the file.
- */
- if (locks_verify_locked(file))
- return -EAGAIN;
+ err = do_mmap_shared_checks(file, prot);
+ if (err)
+ return err;
vm_flags |= VM_SHARED | VM_MAYSHARE;
if (!(file->f_mode & FMODE_WRITE))
diff --git a/mm/mprotect.c b/mm/mprotect.c
index ef5be8eaab00..2e16eaedbca2 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -12,6 +12,7 @@
#include <linux/hugetlb.h>
#include <linux/shm.h>
#include <linux/mman.h>
+#include <linux/file.h>
#include <linux/fs.h>
#include <linux/highmem.h>
#include <linux/security.h>
@@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
vm_flags = calc_vm_prot_bits(prot);
+restart:
down_write(¤t->mm->mmap_sem);
vma = find_vma(current->mm, start);
@@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
goto out;
}
+ /*
+ * If we're adding write permissions to a shared file,
+ * we must clear privileges (like done at mmap time),
+ * but we have to juggle the locks to avoid holding
+ * mmap_sem while holding i_mutex.
+ */
+ if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
+ (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
+ !IS_NOSEC(file_inode(vma->vm_file))) {
+ struct file *file = get_file(vma->vm_file);
+
+ start = vma->vm_start;
+ up_write(¤t->mm->mmap_sem);
+ mutex_lock(&file_inode(file)->i_mutex);
+ error = file_remove_privs(file);
+ mutex_unlock(&file_inode(file)->i_mutex);
+ fput(file);
+ if (error)
+ return error;
+ goto restart;
+ }
+
error = security_file_mprotect(vma, reqprot, prot);
if (error)
goto out;
diff --git a/mm/util.c b/mm/util.c
index 9af1c12b310c..1882eaf33a37 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -283,6 +283,29 @@ int __weak get_user_pages_fast(unsigned long start,
}
EXPORT_SYMBOL_GPL(get_user_pages_fast);
+int do_mmap_shared_checks(struct file *file, unsigned long prot)
+{
+ struct inode *inode = file_inode(file);
+
+ if ((prot & PROT_WRITE) && !(file->f_mode & FMODE_WRITE))
+ return -EACCES;
+
+ /*
+ * Make sure we don't allow writing to an append-only
+ * file..
+ */
+ if (IS_APPEND(inode) && (file->f_mode & FMODE_WRITE))
+ return -EACCES;
+
+ /*
+ * Make sure there are no mandatory locks on the file.
+ */
+ if (locks_verify_locked(file))
+ return -EAGAIN;
+
+ return 0;
+}
+
unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
unsigned long len, unsigned long prot,
unsigned long flag, unsigned long pgoff)
@@ -291,6 +314,33 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
struct mm_struct *mm = current->mm;
unsigned long populate;
+ /*
+ * If we must remove privs, we do it here since doing it during
+ * page fault may be expensive and cannot hold inode->i_mutex,
+ * since mm->mmap_sem is already held.
+ */
+ if (file && (flag & MAP_TYPE) == MAP_SHARED && (prot & PROT_WRITE)) {
+ struct inode *inode = file_inode(file);
+ int err;
+
+ if (!IS_NOSEC(inode)) {
+ /*
+ * Make sure we can't strip privs from a file that
+ * wouldn't otherwise be allowed to be mmapped.
+ */
+ err = do_mmap_shared_checks(file, prot);
+ if (err)
+ return err;
+
+ mutex_lock(&inode->i_mutex);
+ err = file_remove_privs(file);
+ mutex_unlock(&inode->i_mutex);
+
+ if (err)
+ return err;
+ }
+ }
+
ret = security_mmap_file(file, prot, flag);
if (!ret) {
down_write(&mm->mmap_sem);
--
2.6.3
--
Kees Cook
Chrome OS & Brillo Security
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v8] fs: clear file privilege bits when mmap writing
@ 2016-01-12 19:09 ` Kees Cook
0 siblings, 0 replies; 15+ messages in thread
From: Kees Cook @ 2016-01-12 19:09 UTC (permalink / raw)
To: Alexander Viro
Cc: Konstantin Khlebnikov, Andy Lutomirski, Jan Kara, yalin wang,
Willy Tarreau, Andrew Morton, linux-mm, linux-kernel
Normally, when a user can modify a file that has setuid or setgid bits,
those bits are cleared when they are not the file owner or a member
of the group. This is enforced when using write and truncate but not
when writing to a shared mmap on the file. This could allow the file
writer to gain privileges by changing a binary without losing the
setuid/setgid/caps bits.
Changing the bits requires holding inode->i_mutex, so it cannot be done
during the page fault (due to mmap_sem being held during the fault).
Instead, clear the bits if PROT_WRITE is being used at mmap open time,
or added at mprotect time.
Since we can't do the check in the right place inside mmap (due to
holding mmap_sem), we have to do it before holding mmap_sem, which
means duplicating some checks, which have to be available to the non-MMU
builds too.
When walking VMAs during mprotect, we need to drop mmap_sem (while
holding a file reference) and restart the walk after clearing privileges.
Signed-off-by: Kees Cook <keescook@chromium.org>
---
v8:
- use mmap/mprotect method, with mprotect walk restart, thanks to koct9i
v7:
- document and avoid arch-specific O_* values, viro
v6:
- clarify ETXTBSY situation in comments, luto
v5:
- add to f_flags instead, viro
- add i_mutex during __fput, jack
v4:
- delay removal instead of still needing mmap_sem for mprotect, yalin
v3:
- move outside of mmap_sem for real now, fengguang
- check return code of file_remove_privs, akpm
v2:
- move to mmap from fault handler, jack
---
include/linux/mm.h | 1 +
mm/mmap.c | 20 ++++----------------
mm/mprotect.c | 24 ++++++++++++++++++++++++
mm/util.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 79 insertions(+), 16 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 00bad7793788..b264c8be7114 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1912,6 +1912,7 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo
extern unsigned long mmap_region(struct file *file, unsigned long addr,
unsigned long len, vm_flags_t vm_flags, unsigned long pgoff);
+extern int do_mmap_shared_checks(struct file *file, unsigned long prot);
extern unsigned long do_mmap(struct file *file, unsigned long addr,
unsigned long len, unsigned long prot, unsigned long flags,
vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate);
diff --git a/mm/mmap.c b/mm/mmap.c
index 2ce04a649f6b..b3424db0a29e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1320,25 +1320,13 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
return -EAGAIN;
if (file) {
- struct inode *inode = file_inode(file);
+ int err;
switch (flags & MAP_TYPE) {
case MAP_SHARED:
- if ((prot&PROT_WRITE) && !(file->f_mode&FMODE_WRITE))
- return -EACCES;
-
- /*
- * Make sure we don't allow writing to an append-only
- * file..
- */
- if (IS_APPEND(inode) && (file->f_mode & FMODE_WRITE))
- return -EACCES;
-
- /*
- * Make sure there are no mandatory locks on the file.
- */
- if (locks_verify_locked(file))
- return -EAGAIN;
+ err = do_mmap_shared_checks(file, prot);
+ if (err)
+ return err;
vm_flags |= VM_SHARED | VM_MAYSHARE;
if (!(file->f_mode & FMODE_WRITE))
diff --git a/mm/mprotect.c b/mm/mprotect.c
index ef5be8eaab00..2e16eaedbca2 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -12,6 +12,7 @@
#include <linux/hugetlb.h>
#include <linux/shm.h>
#include <linux/mman.h>
+#include <linux/file.h>
#include <linux/fs.h>
#include <linux/highmem.h>
#include <linux/security.h>
@@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
vm_flags = calc_vm_prot_bits(prot);
+restart:
down_write(¤t->mm->mmap_sem);
vma = find_vma(current->mm, start);
@@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
goto out;
}
+ /*
+ * If we're adding write permissions to a shared file,
+ * we must clear privileges (like done at mmap time),
+ * but we have to juggle the locks to avoid holding
+ * mmap_sem while holding i_mutex.
+ */
+ if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
+ (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
+ !IS_NOSEC(file_inode(vma->vm_file))) {
+ struct file *file = get_file(vma->vm_file);
+
+ start = vma->vm_start;
+ up_write(¤t->mm->mmap_sem);
+ mutex_lock(&file_inode(file)->i_mutex);
+ error = file_remove_privs(file);
+ mutex_unlock(&file_inode(file)->i_mutex);
+ fput(file);
+ if (error)
+ return error;
+ goto restart;
+ }
+
error = security_file_mprotect(vma, reqprot, prot);
if (error)
goto out;
diff --git a/mm/util.c b/mm/util.c
index 9af1c12b310c..1882eaf33a37 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -283,6 +283,29 @@ int __weak get_user_pages_fast(unsigned long start,
}
EXPORT_SYMBOL_GPL(get_user_pages_fast);
+int do_mmap_shared_checks(struct file *file, unsigned long prot)
+{
+ struct inode *inode = file_inode(file);
+
+ if ((prot & PROT_WRITE) && !(file->f_mode & FMODE_WRITE))
+ return -EACCES;
+
+ /*
+ * Make sure we don't allow writing to an append-only
+ * file..
+ */
+ if (IS_APPEND(inode) && (file->f_mode & FMODE_WRITE))
+ return -EACCES;
+
+ /*
+ * Make sure there are no mandatory locks on the file.
+ */
+ if (locks_verify_locked(file))
+ return -EAGAIN;
+
+ return 0;
+}
+
unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
unsigned long len, unsigned long prot,
unsigned long flag, unsigned long pgoff)
@@ -291,6 +314,33 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
struct mm_struct *mm = current->mm;
unsigned long populate;
+ /*
+ * If we must remove privs, we do it here since doing it during
+ * page fault may be expensive and cannot hold inode->i_mutex,
+ * since mm->mmap_sem is already held.
+ */
+ if (file && (flag & MAP_TYPE) == MAP_SHARED && (prot & PROT_WRITE)) {
+ struct inode *inode = file_inode(file);
+ int err;
+
+ if (!IS_NOSEC(inode)) {
+ /*
+ * Make sure we can't strip privs from a file that
+ * wouldn't otherwise be allowed to be mmapped.
+ */
+ err = do_mmap_shared_checks(file, prot);
+ if (err)
+ return err;
+
+ mutex_lock(&inode->i_mutex);
+ err = file_remove_privs(file);
+ mutex_unlock(&inode->i_mutex);
+
+ if (err)
+ return err;
+ }
+ }
+
ret = security_mmap_file(file, prot, flag);
if (!ret) {
down_write(&mm->mmap_sem);
--
2.6.3
--
Kees Cook
Chrome OS & Brillo Security
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
2016-01-12 19:09 ` Kees Cook
@ 2016-01-13 9:03 ` Jan Kara
-1 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2016-01-13 9:03 UTC (permalink / raw)
To: Kees Cook
Cc: Alexander Viro, Konstantin Khlebnikov, Andy Lutomirski, Jan Kara,
yalin wang, Willy Tarreau, Andrew Morton, linux-mm, linux-kernel
On Tue 12-01-16 11:09:04, Kees Cook wrote:
> Normally, when a user can modify a file that has setuid or setgid bits,
> those bits are cleared when they are not the file owner or a member
> of the group. This is enforced when using write and truncate but not
> when writing to a shared mmap on the file. This could allow the file
> writer to gain privileges by changing a binary without losing the
> setuid/setgid/caps bits.
>
> Changing the bits requires holding inode->i_mutex, so it cannot be done
> during the page fault (due to mmap_sem being held during the fault).
> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
> or added at mprotect time.
>
> Since we can't do the check in the right place inside mmap (due to
> holding mmap_sem), we have to do it before holding mmap_sem, which
> means duplicating some checks, which have to be available to the non-MMU
> builds too.
>
> When walking VMAs during mprotect, we need to drop mmap_sem (while
> holding a file reference) and restart the walk after clearing privileges.
...
> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>
> vm_flags = calc_vm_prot_bits(prot);
>
> +restart:
> down_write(¤t->mm->mmap_sem);
>
> vma = find_vma(current->mm, start);
> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
> goto out;
> }
>
> + /*
> + * If we're adding write permissions to a shared file,
> + * we must clear privileges (like done at mmap time),
> + * but we have to juggle the locks to avoid holding
> + * mmap_sem while holding i_mutex.
> + */
> + if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
> + (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
> + !IS_NOSEC(file_inode(vma->vm_file))) {
This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
is called. However that is not true for two reasons:
1) When you are root, SUID bit doesn't get cleared and thus you cannot set
IS_NOSEC.
2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
never true.
So in these cases you'll loop forever.
You can check SUID bits without i_mutex so that could be done without
dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
without i_mutex as that checks extended attributes (IMA) and that needs
i_mutex to be held to avoid races with someone else changing the attributes
under you.
Honestly, I don't see a way of implementing this in mprotect() which would
be reasonably elegant.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
@ 2016-01-13 9:03 ` Jan Kara
0 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2016-01-13 9:03 UTC (permalink / raw)
To: Kees Cook
Cc: Alexander Viro, Konstantin Khlebnikov, Andy Lutomirski, Jan Kara,
yalin wang, Willy Tarreau, Andrew Morton, linux-mm, linux-kernel
On Tue 12-01-16 11:09:04, Kees Cook wrote:
> Normally, when a user can modify a file that has setuid or setgid bits,
> those bits are cleared when they are not the file owner or a member
> of the group. This is enforced when using write and truncate but not
> when writing to a shared mmap on the file. This could allow the file
> writer to gain privileges by changing a binary without losing the
> setuid/setgid/caps bits.
>
> Changing the bits requires holding inode->i_mutex, so it cannot be done
> during the page fault (due to mmap_sem being held during the fault).
> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
> or added at mprotect time.
>
> Since we can't do the check in the right place inside mmap (due to
> holding mmap_sem), we have to do it before holding mmap_sem, which
> means duplicating some checks, which have to be available to the non-MMU
> builds too.
>
> When walking VMAs during mprotect, we need to drop mmap_sem (while
> holding a file reference) and restart the walk after clearing privileges.
...
> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>
> vm_flags = calc_vm_prot_bits(prot);
>
> +restart:
> down_write(¤t->mm->mmap_sem);
>
> vma = find_vma(current->mm, start);
> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
> goto out;
> }
>
> + /*
> + * If we're adding write permissions to a shared file,
> + * we must clear privileges (like done at mmap time),
> + * but we have to juggle the locks to avoid holding
> + * mmap_sem while holding i_mutex.
> + */
> + if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
> + (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
> + !IS_NOSEC(file_inode(vma->vm_file))) {
This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
is called. However that is not true for two reasons:
1) When you are root, SUID bit doesn't get cleared and thus you cannot set
IS_NOSEC.
2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
never true.
So in these cases you'll loop forever.
You can check SUID bits without i_mutex so that could be done without
dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
without i_mutex as that checks extended attributes (IMA) and that needs
i_mutex to be held to avoid races with someone else changing the attributes
under you.
Honestly, I don't see a way of implementing this in mprotect() which would
be reasonably elegant.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
2016-01-13 9:03 ` Jan Kara
@ 2016-01-13 16:09 ` Kees Cook
-1 siblings, 0 replies; 15+ messages in thread
From: Kees Cook @ 2016-01-13 16:09 UTC (permalink / raw)
To: Jan Kara
Cc: Alexander Viro, Konstantin Khlebnikov, Andy Lutomirski,
yalin wang, Willy Tarreau, Andrew Morton, Linux-MM, LKML
On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@suse.cz> wrote:
> On Tue 12-01-16 11:09:04, Kees Cook wrote:
>> Normally, when a user can modify a file that has setuid or setgid bits,
>> those bits are cleared when they are not the file owner or a member
>> of the group. This is enforced when using write and truncate but not
>> when writing to a shared mmap on the file. This could allow the file
>> writer to gain privileges by changing a binary without losing the
>> setuid/setgid/caps bits.
>>
>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>> during the page fault (due to mmap_sem being held during the fault).
>> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
>> or added at mprotect time.
>>
>> Since we can't do the check in the right place inside mmap (due to
>> holding mmap_sem), we have to do it before holding mmap_sem, which
>> means duplicating some checks, which have to be available to the non-MMU
>> builds too.
>>
>> When walking VMAs during mprotect, we need to drop mmap_sem (while
>> holding a file reference) and restart the walk after clearing privileges.
>
> ...
>
>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>
>> vm_flags = calc_vm_prot_bits(prot);
>>
>> +restart:
>> down_write(¤t->mm->mmap_sem);
>>
>> vma = find_vma(current->mm, start);
>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>> goto out;
>> }
>>
>> + /*
>> + * If we're adding write permissions to a shared file,
>> + * we must clear privileges (like done at mmap time),
>> + * but we have to juggle the locks to avoid holding
>> + * mmap_sem while holding i_mutex.
>> + */
>> + if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
>> + (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
>> + !IS_NOSEC(file_inode(vma->vm_file))) {
>
> This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
> is called. However that is not true for two reasons:
>
> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set
> IS_NOSEC.
>
> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
> never true.
>
> So in these cases you'll loop forever.
UUuugh.
>
> You can check SUID bits without i_mutex so that could be done without
> dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
> without i_mutex as that checks extended attributes (IMA) and that needs
> i_mutex to be held to avoid races with someone else changing the attributes
> under you.
Yeah, that's why I changed this from Konstantin's original suggestion.
> Honestly, I don't see a way of implementing this in mprotect() which would
> be reasonably elegant.
Konstantin, any thoughts here?
-Kees
--
Kees Cook
Chrome OS & Brillo Security
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
@ 2016-01-13 16:09 ` Kees Cook
0 siblings, 0 replies; 15+ messages in thread
From: Kees Cook @ 2016-01-13 16:09 UTC (permalink / raw)
To: Jan Kara
Cc: Alexander Viro, Konstantin Khlebnikov, Andy Lutomirski,
yalin wang, Willy Tarreau, Andrew Morton, Linux-MM, LKML
On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@suse.cz> wrote:
> On Tue 12-01-16 11:09:04, Kees Cook wrote:
>> Normally, when a user can modify a file that has setuid or setgid bits,
>> those bits are cleared when they are not the file owner or a member
>> of the group. This is enforced when using write and truncate but not
>> when writing to a shared mmap on the file. This could allow the file
>> writer to gain privileges by changing a binary without losing the
>> setuid/setgid/caps bits.
>>
>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>> during the page fault (due to mmap_sem being held during the fault).
>> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
>> or added at mprotect time.
>>
>> Since we can't do the check in the right place inside mmap (due to
>> holding mmap_sem), we have to do it before holding mmap_sem, which
>> means duplicating some checks, which have to be available to the non-MMU
>> builds too.
>>
>> When walking VMAs during mprotect, we need to drop mmap_sem (while
>> holding a file reference) and restart the walk after clearing privileges.
>
> ...
>
>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>
>> vm_flags = calc_vm_prot_bits(prot);
>>
>> +restart:
>> down_write(¤t->mm->mmap_sem);
>>
>> vma = find_vma(current->mm, start);
>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>> goto out;
>> }
>>
>> + /*
>> + * If we're adding write permissions to a shared file,
>> + * we must clear privileges (like done at mmap time),
>> + * but we have to juggle the locks to avoid holding
>> + * mmap_sem while holding i_mutex.
>> + */
>> + if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
>> + (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
>> + !IS_NOSEC(file_inode(vma->vm_file))) {
>
> This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
> is called. However that is not true for two reasons:
>
> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set
> IS_NOSEC.
>
> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
> never true.
>
> So in these cases you'll loop forever.
UUuugh.
>
> You can check SUID bits without i_mutex so that could be done without
> dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
> without i_mutex as that checks extended attributes (IMA) and that needs
> i_mutex to be held to avoid races with someone else changing the attributes
> under you.
Yeah, that's why I changed this from Konstantin's original suggestion.
> Honestly, I don't see a way of implementing this in mprotect() which would
> be reasonably elegant.
Konstantin, any thoughts here?
-Kees
--
Kees Cook
Chrome OS & Brillo Security
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
2016-01-13 16:09 ` Kees Cook
@ 2016-01-13 20:23 ` Konstantin Khlebnikov
-1 siblings, 0 replies; 15+ messages in thread
From: Konstantin Khlebnikov @ 2016-01-13 20:23 UTC (permalink / raw)
To: Kees Cook
Cc: Jan Kara, Alexander Viro, Andy Lutomirski, yalin wang,
Willy Tarreau, Andrew Morton, Linux-MM, LKML
On Wed, Jan 13, 2016 at 7:09 PM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@suse.cz> wrote:
>> On Tue 12-01-16 11:09:04, Kees Cook wrote:
>>> Normally, when a user can modify a file that has setuid or setgid bits,
>>> those bits are cleared when they are not the file owner or a member
>>> of the group. This is enforced when using write and truncate but not
>>> when writing to a shared mmap on the file. This could allow the file
>>> writer to gain privileges by changing a binary without losing the
>>> setuid/setgid/caps bits.
>>>
>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>>> during the page fault (due to mmap_sem being held during the fault).
>>> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
>>> or added at mprotect time.
>>>
>>> Since we can't do the check in the right place inside mmap (due to
>>> holding mmap_sem), we have to do it before holding mmap_sem, which
>>> means duplicating some checks, which have to be available to the non-MMU
>>> builds too.
>>>
>>> When walking VMAs during mprotect, we need to drop mmap_sem (while
>>> holding a file reference) and restart the walk after clearing privileges.
>>
>> ...
>>
>>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>
>>> vm_flags = calc_vm_prot_bits(prot);
>>>
>>> +restart:
>>> down_write(¤t->mm->mmap_sem);
>>>
>>> vma = find_vma(current->mm, start);
>>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>> goto out;
>>> }
>>>
>>> + /*
>>> + * If we're adding write permissions to a shared file,
>>> + * we must clear privileges (like done at mmap time),
>>> + * but we have to juggle the locks to avoid holding
>>> + * mmap_sem while holding i_mutex.
>>> + */
>>> + if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
>>> + (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
>>> + !IS_NOSEC(file_inode(vma->vm_file))) {
>>
>> This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
>> is called. However that is not true for two reasons:
>>
>> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set
>> IS_NOSEC.
>>
>> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
>> never true.
>>
>> So in these cases you'll loop forever.
>
> UUuugh.
>
>>
>> You can check SUID bits without i_mutex so that could be done without
>> dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
>> without i_mutex as that checks extended attributes (IMA) and that needs
>> i_mutex to be held to avoid races with someone else changing the attributes
>> under you.
>
> Yeah, that's why I changed this from Konstantin's original suggestion.
>
>> Honestly, I don't see a way of implementing this in mprotect() which would
>> be reasonably elegant.
>
> Konstantin, any thoughts here?
Getxattr works fine without i_mutex: sys_getxattr/vfs_getxattr doesn't lock it.
If somebody changes xattrs under us we'll end up in race anyway.
But this still safe: setxattrs are sychronized.
>
> -Kees
>
> --
> Kees Cook
> Chrome OS & Brillo Security
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
@ 2016-01-13 20:23 ` Konstantin Khlebnikov
0 siblings, 0 replies; 15+ messages in thread
From: Konstantin Khlebnikov @ 2016-01-13 20:23 UTC (permalink / raw)
To: Kees Cook
Cc: Jan Kara, Alexander Viro, Andy Lutomirski, yalin wang,
Willy Tarreau, Andrew Morton, Linux-MM, LKML
On Wed, Jan 13, 2016 at 7:09 PM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@suse.cz> wrote:
>> On Tue 12-01-16 11:09:04, Kees Cook wrote:
>>> Normally, when a user can modify a file that has setuid or setgid bits,
>>> those bits are cleared when they are not the file owner or a member
>>> of the group. This is enforced when using write and truncate but not
>>> when writing to a shared mmap on the file. This could allow the file
>>> writer to gain privileges by changing a binary without losing the
>>> setuid/setgid/caps bits.
>>>
>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>>> during the page fault (due to mmap_sem being held during the fault).
>>> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
>>> or added at mprotect time.
>>>
>>> Since we can't do the check in the right place inside mmap (due to
>>> holding mmap_sem), we have to do it before holding mmap_sem, which
>>> means duplicating some checks, which have to be available to the non-MMU
>>> builds too.
>>>
>>> When walking VMAs during mprotect, we need to drop mmap_sem (while
>>> holding a file reference) and restart the walk after clearing privileges.
>>
>> ...
>>
>>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>
>>> vm_flags = calc_vm_prot_bits(prot);
>>>
>>> +restart:
>>> down_write(¤t->mm->mmap_sem);
>>>
>>> vma = find_vma(current->mm, start);
>>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>> goto out;
>>> }
>>>
>>> + /*
>>> + * If we're adding write permissions to a shared file,
>>> + * we must clear privileges (like done at mmap time),
>>> + * but we have to juggle the locks to avoid holding
>>> + * mmap_sem while holding i_mutex.
>>> + */
>>> + if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
>>> + (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
>>> + !IS_NOSEC(file_inode(vma->vm_file))) {
>>
>> This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
>> is called. However that is not true for two reasons:
>>
>> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set
>> IS_NOSEC.
>>
>> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
>> never true.
>>
>> So in these cases you'll loop forever.
>
> UUuugh.
>
>>
>> You can check SUID bits without i_mutex so that could be done without
>> dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
>> without i_mutex as that checks extended attributes (IMA) and that needs
>> i_mutex to be held to avoid races with someone else changing the attributes
>> under you.
>
> Yeah, that's why I changed this from Konstantin's original suggestion.
>
>> Honestly, I don't see a way of implementing this in mprotect() which would
>> be reasonably elegant.
>
> Konstantin, any thoughts here?
Getxattr works fine without i_mutex: sys_getxattr/vfs_getxattr doesn't lock it.
If somebody changes xattrs under us we'll end up in race anyway.
But this still safe: setxattrs are sychronized.
>
> -Kees
>
> --
> Kees Cook
> Chrome OS & Brillo Security
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
2016-01-13 20:23 ` Konstantin Khlebnikov
@ 2016-01-13 20:33 ` Kees Cook
-1 siblings, 0 replies; 15+ messages in thread
From: Kees Cook @ 2016-01-13 20:33 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Jan Kara, Alexander Viro, Andy Lutomirski, yalin wang,
Willy Tarreau, Andrew Morton, Linux-MM, LKML
On Wed, Jan 13, 2016 at 12:23 PM, Konstantin Khlebnikov
<koct9i@gmail.com> wrote:
> On Wed, Jan 13, 2016 at 7:09 PM, Kees Cook <keescook@chromium.org> wrote:
>> On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@suse.cz> wrote:
>>> On Tue 12-01-16 11:09:04, Kees Cook wrote:
>>>> Normally, when a user can modify a file that has setuid or setgid bits,
>>>> those bits are cleared when they are not the file owner or a member
>>>> of the group. This is enforced when using write and truncate but not
>>>> when writing to a shared mmap on the file. This could allow the file
>>>> writer to gain privileges by changing a binary without losing the
>>>> setuid/setgid/caps bits.
>>>>
>>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>>>> during the page fault (due to mmap_sem being held during the fault).
>>>> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
>>>> or added at mprotect time.
>>>>
>>>> Since we can't do the check in the right place inside mmap (due to
>>>> holding mmap_sem), we have to do it before holding mmap_sem, which
>>>> means duplicating some checks, which have to be available to the non-MMU
>>>> builds too.
>>>>
>>>> When walking VMAs during mprotect, we need to drop mmap_sem (while
>>>> holding a file reference) and restart the walk after clearing privileges.
>>>
>>> ...
>>>
>>>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>>
>>>> vm_flags = calc_vm_prot_bits(prot);
>>>>
>>>> +restart:
>>>> down_write(¤t->mm->mmap_sem);
>>>>
>>>> vma = find_vma(current->mm, start);
>>>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>> goto out;
>>>> }
>>>>
>>>> + /*
>>>> + * If we're adding write permissions to a shared file,
>>>> + * we must clear privileges (like done at mmap time),
>>>> + * but we have to juggle the locks to avoid holding
>>>> + * mmap_sem while holding i_mutex.
>>>> + */
>>>> + if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
>>>> + (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
>>>> + !IS_NOSEC(file_inode(vma->vm_file))) {
>>>
>>> This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
>>> is called. However that is not true for two reasons:
>>>
>>> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set
>>> IS_NOSEC.
>>>
>>> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
>>> never true.
>>>
>>> So in these cases you'll loop forever.
>>
>> UUuugh.
>>
>>>
>>> You can check SUID bits without i_mutex so that could be done without
>>> dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
>>> without i_mutex as that checks extended attributes (IMA) and that needs
>>> i_mutex to be held to avoid races with someone else changing the attributes
>>> under you.
>>
>> Yeah, that's why I changed this from Konstantin's original suggestion.
>>
>>> Honestly, I don't see a way of implementing this in mprotect() which would
>>> be reasonably elegant.
>>
>> Konstantin, any thoughts here?
>
> Getxattr works fine without i_mutex: sys_getxattr/vfs_getxattr doesn't lock it.
> If somebody changes xattrs under us we'll end up in race anyway.
> But this still safe: setxattrs are sychronized.
So I can swap my IS_NOSEC for your original file_needs_remove_privs()?
Are the LSM hooks expecting to be called under mm_sem? (Looks like
only common_caps implements that, though.)
-Kees
--
Kees Cook
Chrome OS & Brillo Security
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
@ 2016-01-13 20:33 ` Kees Cook
0 siblings, 0 replies; 15+ messages in thread
From: Kees Cook @ 2016-01-13 20:33 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Jan Kara, Alexander Viro, Andy Lutomirski, yalin wang,
Willy Tarreau, Andrew Morton, Linux-MM, LKML
On Wed, Jan 13, 2016 at 12:23 PM, Konstantin Khlebnikov
<koct9i@gmail.com> wrote:
> On Wed, Jan 13, 2016 at 7:09 PM, Kees Cook <keescook@chromium.org> wrote:
>> On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@suse.cz> wrote:
>>> On Tue 12-01-16 11:09:04, Kees Cook wrote:
>>>> Normally, when a user can modify a file that has setuid or setgid bits,
>>>> those bits are cleared when they are not the file owner or a member
>>>> of the group. This is enforced when using write and truncate but not
>>>> when writing to a shared mmap on the file. This could allow the file
>>>> writer to gain privileges by changing a binary without losing the
>>>> setuid/setgid/caps bits.
>>>>
>>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>>>> during the page fault (due to mmap_sem being held during the fault).
>>>> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
>>>> or added at mprotect time.
>>>>
>>>> Since we can't do the check in the right place inside mmap (due to
>>>> holding mmap_sem), we have to do it before holding mmap_sem, which
>>>> means duplicating some checks, which have to be available to the non-MMU
>>>> builds too.
>>>>
>>>> When walking VMAs during mprotect, we need to drop mmap_sem (while
>>>> holding a file reference) and restart the walk after clearing privileges.
>>>
>>> ...
>>>
>>>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>>
>>>> vm_flags = calc_vm_prot_bits(prot);
>>>>
>>>> +restart:
>>>> down_write(¤t->mm->mmap_sem);
>>>>
>>>> vma = find_vma(current->mm, start);
>>>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>> goto out;
>>>> }
>>>>
>>>> + /*
>>>> + * If we're adding write permissions to a shared file,
>>>> + * we must clear privileges (like done at mmap time),
>>>> + * but we have to juggle the locks to avoid holding
>>>> + * mmap_sem while holding i_mutex.
>>>> + */
>>>> + if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
>>>> + (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
>>>> + !IS_NOSEC(file_inode(vma->vm_file))) {
>>>
>>> This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
>>> is called. However that is not true for two reasons:
>>>
>>> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set
>>> IS_NOSEC.
>>>
>>> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
>>> never true.
>>>
>>> So in these cases you'll loop forever.
>>
>> UUuugh.
>>
>>>
>>> You can check SUID bits without i_mutex so that could be done without
>>> dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
>>> without i_mutex as that checks extended attributes (IMA) and that needs
>>> i_mutex to be held to avoid races with someone else changing the attributes
>>> under you.
>>
>> Yeah, that's why I changed this from Konstantin's original suggestion.
>>
>>> Honestly, I don't see a way of implementing this in mprotect() which would
>>> be reasonably elegant.
>>
>> Konstantin, any thoughts here?
>
> Getxattr works fine without i_mutex: sys_getxattr/vfs_getxattr doesn't lock it.
> If somebody changes xattrs under us we'll end up in race anyway.
> But this still safe: setxattrs are sychronized.
So I can swap my IS_NOSEC for your original file_needs_remove_privs()?
Are the LSM hooks expecting to be called under mm_sem? (Looks like
only common_caps implements that, though.)
-Kees
--
Kees Cook
Chrome OS & Brillo Security
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
2016-01-13 20:33 ` Kees Cook
@ 2016-01-14 7:35 ` Konstantin Khlebnikov
-1 siblings, 0 replies; 15+ messages in thread
From: Konstantin Khlebnikov @ 2016-01-14 7:35 UTC (permalink / raw)
To: Kees Cook
Cc: Jan Kara, Alexander Viro, Andy Lutomirski, yalin wang,
Willy Tarreau, Andrew Morton, Linux-MM, LKML
On Wed, Jan 13, 2016 at 11:33 PM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Jan 13, 2016 at 12:23 PM, Konstantin Khlebnikov
> <koct9i@gmail.com> wrote:
>> On Wed, Jan 13, 2016 at 7:09 PM, Kees Cook <keescook@chromium.org> wrote:
>>> On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@suse.cz> wrote:
>>>> On Tue 12-01-16 11:09:04, Kees Cook wrote:
>>>>> Normally, when a user can modify a file that has setuid or setgid bits,
>>>>> those bits are cleared when they are not the file owner or a member
>>>>> of the group. This is enforced when using write and truncate but not
>>>>> when writing to a shared mmap on the file. This could allow the file
>>>>> writer to gain privileges by changing a binary without losing the
>>>>> setuid/setgid/caps bits.
>>>>>
>>>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>>>>> during the page fault (due to mmap_sem being held during the fault).
>>>>> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
>>>>> or added at mprotect time.
>>>>>
>>>>> Since we can't do the check in the right place inside mmap (due to
>>>>> holding mmap_sem), we have to do it before holding mmap_sem, which
>>>>> means duplicating some checks, which have to be available to the non-MMU
>>>>> builds too.
>>>>>
>>>>> When walking VMAs during mprotect, we need to drop mmap_sem (while
>>>>> holding a file reference) and restart the walk after clearing privileges.
>>>>
>>>> ...
>>>>
>>>>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>>>
>>>>> vm_flags = calc_vm_prot_bits(prot);
>>>>>
>>>>> +restart:
>>>>> down_write(¤t->mm->mmap_sem);
>>>>>
>>>>> vma = find_vma(current->mm, start);
>>>>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>>> goto out;
>>>>> }
>>>>>
>>>>> + /*
>>>>> + * If we're adding write permissions to a shared file,
>>>>> + * we must clear privileges (like done at mmap time),
>>>>> + * but we have to juggle the locks to avoid holding
>>>>> + * mmap_sem while holding i_mutex.
>>>>> + */
>>>>> + if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
>>>>> + (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
>>>>> + !IS_NOSEC(file_inode(vma->vm_file))) {
>>>>
>>>> This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
>>>> is called. However that is not true for two reasons:
>>>>
>>>> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set
>>>> IS_NOSEC.
>>>>
>>>> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
>>>> never true.
>>>>
>>>> So in these cases you'll loop forever.
>>>
>>> UUuugh.
>>>
>>>>
>>>> You can check SUID bits without i_mutex so that could be done without
>>>> dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
>>>> without i_mutex as that checks extended attributes (IMA) and that needs
>>>> i_mutex to be held to avoid races with someone else changing the attributes
>>>> under you.
>>>
>>> Yeah, that's why I changed this from Konstantin's original suggestion.
>>>
>>>> Honestly, I don't see a way of implementing this in mprotect() which would
>>>> be reasonably elegant.
>>>
>>> Konstantin, any thoughts here?
>>
>> Getxattr works fine without i_mutex: sys_getxattr/vfs_getxattr doesn't lock it.
>> If somebody changes xattrs under us we'll end up in race anyway.
>> But this still safe: setxattrs are sychronized.
>
> So I can swap my IS_NOSEC for your original file_needs_remove_privs()?
> Are the LSM hooks expecting to be called under mm_sem? (Looks like
> only common_caps implements that, though.)
getxattr should nests inside mmap_sem safely: it has sort of
"readpage" semantics,
actually ext4 uses it when inlines content of tiny files into xattr.
>
> -Kees
>
> --
> Kees Cook
> Chrome OS & Brillo Security
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
@ 2016-01-14 7:35 ` Konstantin Khlebnikov
0 siblings, 0 replies; 15+ messages in thread
From: Konstantin Khlebnikov @ 2016-01-14 7:35 UTC (permalink / raw)
To: Kees Cook
Cc: Jan Kara, Alexander Viro, Andy Lutomirski, yalin wang,
Willy Tarreau, Andrew Morton, Linux-MM, LKML
On Wed, Jan 13, 2016 at 11:33 PM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Jan 13, 2016 at 12:23 PM, Konstantin Khlebnikov
> <koct9i@gmail.com> wrote:
>> On Wed, Jan 13, 2016 at 7:09 PM, Kees Cook <keescook@chromium.org> wrote:
>>> On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@suse.cz> wrote:
>>>> On Tue 12-01-16 11:09:04, Kees Cook wrote:
>>>>> Normally, when a user can modify a file that has setuid or setgid bits,
>>>>> those bits are cleared when they are not the file owner or a member
>>>>> of the group. This is enforced when using write and truncate but not
>>>>> when writing to a shared mmap on the file. This could allow the file
>>>>> writer to gain privileges by changing a binary without losing the
>>>>> setuid/setgid/caps bits.
>>>>>
>>>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>>>>> during the page fault (due to mmap_sem being held during the fault).
>>>>> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
>>>>> or added at mprotect time.
>>>>>
>>>>> Since we can't do the check in the right place inside mmap (due to
>>>>> holding mmap_sem), we have to do it before holding mmap_sem, which
>>>>> means duplicating some checks, which have to be available to the non-MMU
>>>>> builds too.
>>>>>
>>>>> When walking VMAs during mprotect, we need to drop mmap_sem (while
>>>>> holding a file reference) and restart the walk after clearing privileges.
>>>>
>>>> ...
>>>>
>>>>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>>>
>>>>> vm_flags = calc_vm_prot_bits(prot);
>>>>>
>>>>> +restart:
>>>>> down_write(¤t->mm->mmap_sem);
>>>>>
>>>>> vma = find_vma(current->mm, start);
>>>>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
>>>>> goto out;
>>>>> }
>>>>>
>>>>> + /*
>>>>> + * If we're adding write permissions to a shared file,
>>>>> + * we must clear privileges (like done at mmap time),
>>>>> + * but we have to juggle the locks to avoid holding
>>>>> + * mmap_sem while holding i_mutex.
>>>>> + */
>>>>> + if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
>>>>> + (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
>>>>> + !IS_NOSEC(file_inode(vma->vm_file))) {
>>>>
>>>> This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
>>>> is called. However that is not true for two reasons:
>>>>
>>>> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set
>>>> IS_NOSEC.
>>>>
>>>> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
>>>> never true.
>>>>
>>>> So in these cases you'll loop forever.
>>>
>>> UUuugh.
>>>
>>>>
>>>> You can check SUID bits without i_mutex so that could be done without
>>>> dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
>>>> without i_mutex as that checks extended attributes (IMA) and that needs
>>>> i_mutex to be held to avoid races with someone else changing the attributes
>>>> under you.
>>>
>>> Yeah, that's why I changed this from Konstantin's original suggestion.
>>>
>>>> Honestly, I don't see a way of implementing this in mprotect() which would
>>>> be reasonably elegant.
>>>
>>> Konstantin, any thoughts here?
>>
>> Getxattr works fine without i_mutex: sys_getxattr/vfs_getxattr doesn't lock it.
>> If somebody changes xattrs under us we'll end up in race anyway.
>> But this still safe: setxattrs are sychronized.
>
> So I can swap my IS_NOSEC for your original file_needs_remove_privs()?
> Are the LSM hooks expecting to be called under mm_sem? (Looks like
> only common_caps implements that, though.)
getxattr should nests inside mmap_sem safely: it has sort of
"readpage" semantics,
actually ext4 uses it when inlines content of tiny files into xattr.
>
> -Kees
>
> --
> Kees Cook
> Chrome OS & Brillo Security
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
2016-01-14 7:35 ` Konstantin Khlebnikov
(?)
@ 2016-01-15 10:17 ` Jan Kara
-1 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2016-01-15 10:17 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Kees Cook, Jan Kara, Alexander Viro, Andy Lutomirski, yalin wang,
Willy Tarreau, Andrew Morton, Linux-MM, LKML, mfasheh,
ocfs2-devel
On Thu 14-01-16 10:35:17, Konstantin Khlebnikov wrote:
> On Wed, Jan 13, 2016 at 11:33 PM, Kees Cook <keescook@chromium.org> wrote:
> > On Wed, Jan 13, 2016 at 12:23 PM, Konstantin Khlebnikov
> > <koct9i@gmail.com> wrote:
> >> On Wed, Jan 13, 2016 at 7:09 PM, Kees Cook <keescook@chromium.org> wrote:
> >>> On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@suse.cz> wrote:
> >>>> On Tue 12-01-16 11:09:04, Kees Cook wrote:
> >>>>> Normally, when a user can modify a file that has setuid or setgid bits,
> >>>>> those bits are cleared when they are not the file owner or a member
> >>>>> of the group. This is enforced when using write and truncate but not
> >>>>> when writing to a shared mmap on the file. This could allow the file
> >>>>> writer to gain privileges by changing a binary without losing the
> >>>>> setuid/setgid/caps bits.
> >>>>>
> >>>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
> >>>>> during the page fault (due to mmap_sem being held during the fault).
> >>>>> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
> >>>>> or added at mprotect time.
> >>>>>
> >>>>> Since we can't do the check in the right place inside mmap (due to
> >>>>> holding mmap_sem), we have to do it before holding mmap_sem, which
> >>>>> means duplicating some checks, which have to be available to the non-MMU
> >>>>> builds too.
> >>>>>
> >>>>> When walking VMAs during mprotect, we need to drop mmap_sem (while
> >>>>> holding a file reference) and restart the walk after clearing privileges.
> >>>>
> >>>> ...
> >>>>
> >>>>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
> >>>>>
> >>>>> vm_flags = calc_vm_prot_bits(prot);
> >>>>>
> >>>>> +restart:
> >>>>> down_write(¤t->mm->mmap_sem);
> >>>>>
> >>>>> vma = find_vma(current->mm, start);
> >>>>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
> >>>>> goto out;
> >>>>> }
> >>>>>
> >>>>> + /*
> >>>>> + * If we're adding write permissions to a shared file,
> >>>>> + * we must clear privileges (like done at mmap time),
> >>>>> + * but we have to juggle the locks to avoid holding
> >>>>> + * mmap_sem while holding i_mutex.
> >>>>> + */
> >>>>> + if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
> >>>>> + (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
> >>>>> + !IS_NOSEC(file_inode(vma->vm_file))) {
> >>>>
> >>>> This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
> >>>> is called. However that is not true for two reasons:
> >>>>
> >>>> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set
> >>>> IS_NOSEC.
> >>>>
> >>>> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
> >>>> never true.
> >>>>
> >>>> So in these cases you'll loop forever.
> >>>
> >>> UUuugh.
> >>>
> >>>>
> >>>> You can check SUID bits without i_mutex so that could be done without
> >>>> dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
> >>>> without i_mutex as that checks extended attributes (IMA) and that needs
> >>>> i_mutex to be held to avoid races with someone else changing the attributes
> >>>> under you.
> >>>
> >>> Yeah, that's why I changed this from Konstantin's original suggestion.
> >>>
> >>>> Honestly, I don't see a way of implementing this in mprotect() which would
> >>>> be reasonably elegant.
> >>>
> >>> Konstantin, any thoughts here?
> >>
> >> Getxattr works fine without i_mutex: sys_getxattr/vfs_getxattr doesn't lock it.
> >> If somebody changes xattrs under us we'll end up in race anyway.
> >> But this still safe: setxattrs are sychronized.
> >
> > So I can swap my IS_NOSEC for your original file_needs_remove_privs()?
> > Are the LSM hooks expecting to be called under mm_sem? (Looks like
> > only common_caps implements that, though.)
>
> getxattr should nests inside mmap_sem safely: it has sort of
> "readpage" semantics,
> actually ext4 uses it when inlines content of tiny files into xattr.
First, sorry Kees for misleading you. Somehow I missed that i_mutex is not
actually acquired for getxattr() calls.
I have checked and lots of filesystems have dedicated xattr semaphore which
should be safe to nest inside mmap_sem. There are filesystems like ocfs2 or
gfs2 which use their equivalent of i_mutex for xattr locking so there we
would create lock inversion when calling file_needs_remove_privs() from
under mmap_sem.
That being said at least OCFS2 has other issues with this xattr locking
scheme and they are working on changing things AFAIK. Mark can you perhaps
comment?
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Ocfs2-devel] [PATCH v8] fs: clear file privilege bits when mmap writing
@ 2016-01-15 10:17 ` Jan Kara
0 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2016-01-15 10:17 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Kees Cook, Jan Kara, Alexander Viro, Andy Lutomirski, yalin wang,
Willy Tarreau, Andrew Morton, Linux-MM, LKML, mfasheh,
ocfs2-devel
On Thu 14-01-16 10:35:17, Konstantin Khlebnikov wrote:
> On Wed, Jan 13, 2016 at 11:33 PM, Kees Cook <keescook@chromium.org> wrote:
> > On Wed, Jan 13, 2016 at 12:23 PM, Konstantin Khlebnikov
> > <koct9i@gmail.com> wrote:
> >> On Wed, Jan 13, 2016 at 7:09 PM, Kees Cook <keescook@chromium.org> wrote:
> >>> On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@suse.cz> wrote:
> >>>> On Tue 12-01-16 11:09:04, Kees Cook wrote:
> >>>>> Normally, when a user can modify a file that has setuid or setgid bits,
> >>>>> those bits are cleared when they are not the file owner or a member
> >>>>> of the group. This is enforced when using write and truncate but not
> >>>>> when writing to a shared mmap on the file. This could allow the file
> >>>>> writer to gain privileges by changing a binary without losing the
> >>>>> setuid/setgid/caps bits.
> >>>>>
> >>>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
> >>>>> during the page fault (due to mmap_sem being held during the fault).
> >>>>> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
> >>>>> or added at mprotect time.
> >>>>>
> >>>>> Since we can't do the check in the right place inside mmap (due to
> >>>>> holding mmap_sem), we have to do it before holding mmap_sem, which
> >>>>> means duplicating some checks, which have to be available to the non-MMU
> >>>>> builds too.
> >>>>>
> >>>>> When walking VMAs during mprotect, we need to drop mmap_sem (while
> >>>>> holding a file reference) and restart the walk after clearing privileges.
> >>>>
> >>>> ...
> >>>>
> >>>>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
> >>>>>
> >>>>> vm_flags = calc_vm_prot_bits(prot);
> >>>>>
> >>>>> +restart:
> >>>>> down_write(¤t->mm->mmap_sem);
> >>>>>
> >>>>> vma = find_vma(current->mm, start);
> >>>>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
> >>>>> goto out;
> >>>>> }
> >>>>>
> >>>>> + /*
> >>>>> + * If we're adding write permissions to a shared file,
> >>>>> + * we must clear privileges (like done at mmap time),
> >>>>> + * but we have to juggle the locks to avoid holding
> >>>>> + * mmap_sem while holding i_mutex.
> >>>>> + */
> >>>>> + if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
> >>>>> + (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
> >>>>> + !IS_NOSEC(file_inode(vma->vm_file))) {
> >>>>
> >>>> This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
> >>>> is called. However that is not true for two reasons:
> >>>>
> >>>> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set
> >>>> IS_NOSEC.
> >>>>
> >>>> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
> >>>> never true.
> >>>>
> >>>> So in these cases you'll loop forever.
> >>>
> >>> UUuugh.
> >>>
> >>>>
> >>>> You can check SUID bits without i_mutex so that could be done without
> >>>> dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
> >>>> without i_mutex as that checks extended attributes (IMA) and that needs
> >>>> i_mutex to be held to avoid races with someone else changing the attributes
> >>>> under you.
> >>>
> >>> Yeah, that's why I changed this from Konstantin's original suggestion.
> >>>
> >>>> Honestly, I don't see a way of implementing this in mprotect() which would
> >>>> be reasonably elegant.
> >>>
> >>> Konstantin, any thoughts here?
> >>
> >> Getxattr works fine without i_mutex: sys_getxattr/vfs_getxattr doesn't lock it.
> >> If somebody changes xattrs under us we'll end up in race anyway.
> >> But this still safe: setxattrs are sychronized.
> >
> > So I can swap my IS_NOSEC for your original file_needs_remove_privs()?
> > Are the LSM hooks expecting to be called under mm_sem? (Looks like
> > only common_caps implements that, though.)
>
> getxattr should nests inside mmap_sem safely: it has sort of
> "readpage" semantics,
> actually ext4 uses it when inlines content of tiny files into xattr.
First, sorry Kees for misleading you. Somehow I missed that i_mutex is not
actually acquired for getxattr() calls.
I have checked and lots of filesystems have dedicated xattr semaphore which
should be safe to nest inside mmap_sem. There are filesystems like ocfs2 or
gfs2 which use their equivalent of i_mutex for xattr locking so there we
would create lock inversion when calling file_needs_remove_privs() from
under mmap_sem.
That being said at least OCFS2 has other issues with this xattr locking
scheme and they are working on changing things AFAIK. Mark can you perhaps
comment?
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v8] fs: clear file privilege bits when mmap writing
@ 2016-01-15 10:17 ` Jan Kara
0 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2016-01-15 10:17 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Kees Cook, Jan Kara, Alexander Viro, Andy Lutomirski, yalin wang,
Willy Tarreau, Andrew Morton, Linux-MM, LKML, mfasheh,
ocfs2-devel
On Thu 14-01-16 10:35:17, Konstantin Khlebnikov wrote:
> On Wed, Jan 13, 2016 at 11:33 PM, Kees Cook <keescook@chromium.org> wrote:
> > On Wed, Jan 13, 2016 at 12:23 PM, Konstantin Khlebnikov
> > <koct9i@gmail.com> wrote:
> >> On Wed, Jan 13, 2016 at 7:09 PM, Kees Cook <keescook@chromium.org> wrote:
> >>> On Wed, Jan 13, 2016 at 1:03 AM, Jan Kara <jack@suse.cz> wrote:
> >>>> On Tue 12-01-16 11:09:04, Kees Cook wrote:
> >>>>> Normally, when a user can modify a file that has setuid or setgid bits,
> >>>>> those bits are cleared when they are not the file owner or a member
> >>>>> of the group. This is enforced when using write and truncate but not
> >>>>> when writing to a shared mmap on the file. This could allow the file
> >>>>> writer to gain privileges by changing a binary without losing the
> >>>>> setuid/setgid/caps bits.
> >>>>>
> >>>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
> >>>>> during the page fault (due to mmap_sem being held during the fault).
> >>>>> Instead, clear the bits if PROT_WRITE is being used at mmap open time,
> >>>>> or added at mprotect time.
> >>>>>
> >>>>> Since we can't do the check in the right place inside mmap (due to
> >>>>> holding mmap_sem), we have to do it before holding mmap_sem, which
> >>>>> means duplicating some checks, which have to be available to the non-MMU
> >>>>> builds too.
> >>>>>
> >>>>> When walking VMAs during mprotect, we need to drop mmap_sem (while
> >>>>> holding a file reference) and restart the walk after clearing privileges.
> >>>>
> >>>> ...
> >>>>
> >>>>> @@ -375,6 +376,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
> >>>>>
> >>>>> vm_flags = calc_vm_prot_bits(prot);
> >>>>>
> >>>>> +restart:
> >>>>> down_write(¤t->mm->mmap_sem);
> >>>>>
> >>>>> vma = find_vma(current->mm, start);
> >>>>> @@ -416,6 +418,28 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
> >>>>> goto out;
> >>>>> }
> >>>>>
> >>>>> + /*
> >>>>> + * If we're adding write permissions to a shared file,
> >>>>> + * we must clear privileges (like done at mmap time),
> >>>>> + * but we have to juggle the locks to avoid holding
> >>>>> + * mmap_sem while holding i_mutex.
> >>>>> + */
> >>>>> + if ((vma->vm_flags & VM_SHARED) && vma->vm_file &&
> >>>>> + (newflags & VM_WRITE) && !(vma->vm_flags & VM_WRITE) &&
> >>>>> + !IS_NOSEC(file_inode(vma->vm_file))) {
> >>>>
> >>>> This code assumes that IS_NOSEC gets set for inode once file_remove_privs()
> >>>> is called. However that is not true for two reasons:
> >>>>
> >>>> 1) When you are root, SUID bit doesn't get cleared and thus you cannot set
> >>>> IS_NOSEC.
> >>>>
> >>>> 2) Some filesystems do not have MS_NOSEC set and for those IS_NOSEC is
> >>>> never true.
> >>>>
> >>>> So in these cases you'll loop forever.
> >>>
> >>> UUuugh.
> >>>
> >>>>
> >>>> You can check SUID bits without i_mutex so that could be done without
> >>>> dropping mmap_sem but you cannot easily call security_inode_need_killpriv()
> >>>> without i_mutex as that checks extended attributes (IMA) and that needs
> >>>> i_mutex to be held to avoid races with someone else changing the attributes
> >>>> under you.
> >>>
> >>> Yeah, that's why I changed this from Konstantin's original suggestion.
> >>>
> >>>> Honestly, I don't see a way of implementing this in mprotect() which would
> >>>> be reasonably elegant.
> >>>
> >>> Konstantin, any thoughts here?
> >>
> >> Getxattr works fine without i_mutex: sys_getxattr/vfs_getxattr doesn't lock it.
> >> If somebody changes xattrs under us we'll end up in race anyway.
> >> But this still safe: setxattrs are sychronized.
> >
> > So I can swap my IS_NOSEC for your original file_needs_remove_privs()?
> > Are the LSM hooks expecting to be called under mm_sem? (Looks like
> > only common_caps implements that, though.)
>
> getxattr should nests inside mmap_sem safely: it has sort of
> "readpage" semantics,
> actually ext4 uses it when inlines content of tiny files into xattr.
First, sorry Kees for misleading you. Somehow I missed that i_mutex is not
actually acquired for getxattr() calls.
I have checked and lots of filesystems have dedicated xattr semaphore which
should be safe to nest inside mmap_sem. There are filesystems like ocfs2 or
gfs2 which use their equivalent of i_mutex for xattr locking so there we
would create lock inversion when calling file_needs_remove_privs() from
under mmap_sem.
That being said at least OCFS2 has other issues with this xattr locking
scheme and they are working on changing things AFAIK. Mark can you perhaps
comment?
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2016-01-15 10:18 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-12 19:09 [PATCH v8] fs: clear file privilege bits when mmap writing Kees Cook
2016-01-12 19:09 ` Kees Cook
2016-01-13 9:03 ` Jan Kara
2016-01-13 9:03 ` Jan Kara
2016-01-13 16:09 ` Kees Cook
2016-01-13 16:09 ` Kees Cook
2016-01-13 20:23 ` Konstantin Khlebnikov
2016-01-13 20:23 ` Konstantin Khlebnikov
2016-01-13 20:33 ` Kees Cook
2016-01-13 20:33 ` Kees Cook
2016-01-14 7:35 ` Konstantin Khlebnikov
2016-01-14 7:35 ` Konstantin Khlebnikov
2016-01-15 10:17 ` Jan Kara
2016-01-15 10:17 ` Jan Kara
2016-01-15 10:17 ` [Ocfs2-devel] " Jan Kara
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.