All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Daniel Colascione <dancol@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>,
	Jann Horn <jannh@google.com>,
	kernel list <linux-kernel@vger.kernel.org>,
	John Reck <jreck@google.com>,
	John Stultz <john.stultz@linaro.org>,
	Todd Kjos <tkjos@google.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Christoph Hellwig <hch@infradead.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Bruce Fields <bfields@fieldses.org>,
	Jeff Layton <jlayton@kernel.org>,
	Khalid Aziz <khalid.aziz@oracle.com>,
	Lei.Yang@windriver.com, linux-fsdevel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, Linux-MM <linux-mm@kvack.org>,
	marcandre.lureau@redhat.com,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Minchan Kim <minchan@kernel.org>, Shuah Khan <shuah@kernel.org>,
	Valdis Kletnieks <valdis.kletnieks@vt.edu>,
	Hugh Dickins <hughd@google.com>,
	Linux API <linux-api@vger.kernel.org>
Subject: Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd
Date: Sun, 11 Nov 2018 07:14:33 -0800	[thread overview]
Message-ID: <91E8E1AA-859A-457A-8978-3EA39CBBF075@amacapital.net> (raw)
In-Reply-To: <CAKOZuethQ3eaV4uoEXiffVMc_S0hyk1FGPB3iQHHnv4NadW1UQ@mail.gmail.com>




> On Nov 11, 2018, at 12:30 AM, Daniel Colascione <dancol@google.com> wrote:
> 
>> On Sun, Nov 11, 2018 at 12:09 AM, Joel Fernandes <joel@joelfernandes.org> wrote:
>> On Sat, Nov 10, 2018 at 07:40:10PM -0800, Andy Lutomirski wrote:
>> [...]
>>>>>>>>>> I see two reasonable solutions:
>>>>>>>>>> 
>>>>>>>>>> 1. Don’t fiddle with the struct file at all. Instead make the inode flag
>>>>>>>>>> work by itself.
>>>>>>>>> 
>>>>>>>>> Currently, the various VFS paths check only the struct file's f_mode to deny
>>>>>>>>> writes of already opened files. This would mean more checking in all those
>>>>>>>>> paths (and modification of all those paths).
>>>>>>>>> 
>>>>>>>>> Anyway going with that idea, we could
>>>>>>>>> 1. call deny_write_access(file) from the memfd's seal path which decrements
>>>>>>>>> the inode::i_writecount.
>>>>>>>>> 2. call get_write_access(inode) in the various VFS paths in addition to
>>>>>>>>> checking for FMODE_*WRITE and deny the write (incase i_writecount is negative)
>>>>>>>>> 
>>>>>>>>> That will prevent both reopens, and writes from succeeding. However I worry a
>>>>>>>>> bit about 2 not being too familiar with VFS internals, about what the
>>>>>>>>> consequences of doing that may be.
>>>>>>>> 
>>>>>>>> IMHO, modifying both the inode and the struct file separately is fine,
>>>>>>>> since they mean different things. In regular filesystems, it's fine to
>>>>>>>> have a read-write open file description for a file whose inode grants
>>>>>>>> write permission to nobody. Speaking of which: is fchmod enough to
>>>>>>>> prevent this attack?
>>>>>>> 
>>>>>>> Well, yes and no. fchmod does prevent reopening the file RW, but
>>>>>>> anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
>>>>>>> seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
>>>>>>> isn't sufficient by itself. While it might be good enough for Android
>>>>>>> (in the sense that it'll prevent RW-reopens from other security
>>>>>>> contexts to which we send an open memfd file), it's still conceptually
>>>>>>> ugly, IMHO. Let's go with the original approach of just tweaking the
>>>>>>> inode so that open-for-write is permanently blocked.
>>>>>> 
>>>>>> Agreed with the idea of modifying both file and inode flags. I was thinking
>>>>>> modifying i_mode may do the trick but as you pointed it probably could be
>>>>>> reverted by chmod or some other attribute setting calls.
>>>>>> 
>>>>>> OTOH, I don't think deny_write_access(file) can be reverted from any
>>>>>> user-facing path so we could do that from the seal to prevent the future
>>>>>> opens in write mode. I'll double check and test that out tomorrow.
>>>>>> 
>>>>>> 
>>>>> 
>>>>> This seems considerably more complicated and more fragile than needed. Just
>>>>> add a new F_SEAL_WRITE_FUTURE.  Grep for F_SEAL_WRITE and make the _FUTURE
>>>>> variant work exactly like it with two exceptions:
>>>>> 
>>>>> - shmem_mmap and maybe its hugetlbfs equivalent should check for it and act
>>>>> accordingly.
>>>> 
>>>> There's more to it than that, we also need to block future writes through
>>>> write syscall, so we have to hook into the write path too once the seal is
>>>> set, not just the mmap. That means we have to add code in mm/shmem.c to do
>>>> that in all those handlers, to check for the seal (and hope we didn't miss a
>>>> file_operations handler). Is that what you are proposing?
>>> 
>>> The existing code already does this. That’s why I suggested grepping :)
>>> 
>>>> 
>>>> Also, it means we have to keep CONFIG_TMPFS enabled so that the
>>>> shmem_file_operations write handlers like write_iter are hooked up. Currently
>>>> memfd works even with !CONFIG_TMPFS.
>>> 
>>> If so, that sounds like it may already be a bug.
> 
> Why shouldn't memfd work independently of CONFIG_TMPFS? In particular,
> write(2) on tmpfs FDs shouldn't work differently. If it does, that's a
> kernel implementation detail leaking out into userspace.
> 
>>>>> - add_seals won’t need the wait_for_pins and mapping_deny_write logic.
>>>>> 
>>>>> That really should be all that’s needed.
>>>> 
>>>> It seems a fair idea what you're saying. But I don't see how its less
>>>> complex.. IMO its far more simple to have VFS do the denial of the operations
>>>> based on the flags of its datastructures.. and if it works (which I will test
>>>> to be sure it will), then we should be good.
>>> 
>>> I agree it’s complicated, but the code is already written.  You should just
>>> need to adjust some masks.
>>> 
>> 
>> Its actually not that bad and a great idea, I did something like the
>> following and it works pretty well. I would say its cleaner than the old
>> approach for sure (and I also added a /proc/pid/fd/N reopen test to the
>> selftest and made sure that issue goes away).
>> 
>> Side note: One subtelty I discovered from the existing selftests is once the
>> F_SEAL_WRITE are active, an mmap of PROT_READ and MAP_SHARED region is
>> expected to fail. This is also evident from this code in mmap_region:
>>                if (vm_flags & VM_SHARED) {
>>                        error = mapping_map_writable(file->f_mapping);
>>                        if (error)
>>                                goto allow_write_and_free_vma;
>>                }
>> 
> 
> This behavior seems like a bug. Why should MAP_SHARED writes be denied
> here? There's no semantic incompatibility between shared mappings and
> the seal. And I think this change would represent an ABI break using
> memfd seals for ashmem, since ashmem currently allows MAP_SHARED
> mappings after changing prot_mask.

Hmm. I’m guessing the intent is that the mmap count should track writable mappings in addition to mappings that could be made writable using mprotect.  I think you could address this for SEAL_FUTURE in two ways:

1. In shmem_mmap, mask off VM_MAYWRITE if SEAL_FUTURE is set, or

2. Add a new vm operation that allows a vma to reject an mprotect attempt, like security_file_mprotect but per vma.  Then give it reasonable semantics for shmem.

(1) probably gives the semantics you want for SEAL_FUTURE: old maps can be mprotected, but new maps can’t.

> 
>> ---8<-----------------------
>> 
>> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
>> Subject: [PATCH] mm/memfd: implement future write seal using shmem ops
>> 
>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>> ---
>> fs/hugetlbfs/inode.c |  2 +-
>> mm/memfd.c           | 19 -------------------
>> mm/shmem.c           | 13 ++++++++++---
>> 3 files changed, 11 insertions(+), 23 deletions(-)
>> 
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index 32920a10100e..1978581abfdf 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -530,7 +530,7 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
>>                inode_lock(inode);
>> 
>>                /* protected by i_mutex */
>> -               if (info->seals & F_SEAL_WRITE) {
>> +               if (info->seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE)) {
>>                        inode_unlock(inode);
>>                        return -EPERM;
>>                }
> 
> Maybe we can always set F_SEAL_FUTURE_WRITE when F_SEAL_WRITE so we
> can just test one bit except where the F_SEAL_WRITE behavior differs
> from F_SEAL_FUTURE_WRITE.

This could plausibly confuse existing users that read the seal mask.

WARNING: multiple messages have this Message-ID (diff)
From: luto at amacapital.net (Andy Lutomirski)
Subject: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd
Date: Sun, 11 Nov 2018 07:14:33 -0800	[thread overview]
Message-ID: <91E8E1AA-859A-457A-8978-3EA39CBBF075@amacapital.net> (raw)
In-Reply-To: <CAKOZuethQ3eaV4uoEXiffVMc_S0hyk1FGPB3iQHHnv4NadW1UQ@mail.gmail.com>




> On Nov 11, 2018, at 12:30 AM, Daniel Colascione <dancol at google.com> wrote:
> 
>> On Sun, Nov 11, 2018 at 12:09 AM, Joel Fernandes <joel at joelfernandes.org> wrote:
>> On Sat, Nov 10, 2018 at 07:40:10PM -0800, Andy Lutomirski wrote:
>> [...]
>>>>>>>>>> I see two reasonable solutions:
>>>>>>>>>> 
>>>>>>>>>> 1. Don’t fiddle with the struct file at all. Instead make the inode flag
>>>>>>>>>> work by itself.
>>>>>>>>> 
>>>>>>>>> Currently, the various VFS paths check only the struct file's f_mode to deny
>>>>>>>>> writes of already opened files. This would mean more checking in all those
>>>>>>>>> paths (and modification of all those paths).
>>>>>>>>> 
>>>>>>>>> Anyway going with that idea, we could
>>>>>>>>> 1. call deny_write_access(file) from the memfd's seal path which decrements
>>>>>>>>> the inode::i_writecount.
>>>>>>>>> 2. call get_write_access(inode) in the various VFS paths in addition to
>>>>>>>>> checking for FMODE_*WRITE and deny the write (incase i_writecount is negative)
>>>>>>>>> 
>>>>>>>>> That will prevent both reopens, and writes from succeeding. However I worry a
>>>>>>>>> bit about 2 not being too familiar with VFS internals, about what the
>>>>>>>>> consequences of doing that may be.
>>>>>>>> 
>>>>>>>> IMHO, modifying both the inode and the struct file separately is fine,
>>>>>>>> since they mean different things. In regular filesystems, it's fine to
>>>>>>>> have a read-write open file description for a file whose inode grants
>>>>>>>> write permission to nobody. Speaking of which: is fchmod enough to
>>>>>>>> prevent this attack?
>>>>>>> 
>>>>>>> Well, yes and no. fchmod does prevent reopening the file RW, but
>>>>>>> anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
>>>>>>> seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
>>>>>>> isn't sufficient by itself. While it might be good enough for Android
>>>>>>> (in the sense that it'll prevent RW-reopens from other security
>>>>>>> contexts to which we send an open memfd file), it's still conceptually
>>>>>>> ugly, IMHO. Let's go with the original approach of just tweaking the
>>>>>>> inode so that open-for-write is permanently blocked.
>>>>>> 
>>>>>> Agreed with the idea of modifying both file and inode flags. I was thinking
>>>>>> modifying i_mode may do the trick but as you pointed it probably could be
>>>>>> reverted by chmod or some other attribute setting calls.
>>>>>> 
>>>>>> OTOH, I don't think deny_write_access(file) can be reverted from any
>>>>>> user-facing path so we could do that from the seal to prevent the future
>>>>>> opens in write mode. I'll double check and test that out tomorrow.
>>>>>> 
>>>>>> 
>>>>> 
>>>>> This seems considerably more complicated and more fragile than needed. Just
>>>>> add a new F_SEAL_WRITE_FUTURE.  Grep for F_SEAL_WRITE and make the _FUTURE
>>>>> variant work exactly like it with two exceptions:
>>>>> 
>>>>> - shmem_mmap and maybe its hugetlbfs equivalent should check for it and act
>>>>> accordingly.
>>>> 
>>>> There's more to it than that, we also need to block future writes through
>>>> write syscall, so we have to hook into the write path too once the seal is
>>>> set, not just the mmap. That means we have to add code in mm/shmem.c to do
>>>> that in all those handlers, to check for the seal (and hope we didn't miss a
>>>> file_operations handler). Is that what you are proposing?
>>> 
>>> The existing code already does this. That’s why I suggested grepping :)
>>> 
>>>> 
>>>> Also, it means we have to keep CONFIG_TMPFS enabled so that the
>>>> shmem_file_operations write handlers like write_iter are hooked up. Currently
>>>> memfd works even with !CONFIG_TMPFS.
>>> 
>>> If so, that sounds like it may already be a bug.
> 
> Why shouldn't memfd work independently of CONFIG_TMPFS? In particular,
> write(2) on tmpfs FDs shouldn't work differently. If it does, that's a
> kernel implementation detail leaking out into userspace.
> 
>>>>> - add_seals won’t need the wait_for_pins and mapping_deny_write logic.
>>>>> 
>>>>> That really should be all that’s needed.
>>>> 
>>>> It seems a fair idea what you're saying. But I don't see how its less
>>>> complex.. IMO its far more simple to have VFS do the denial of the operations
>>>> based on the flags of its datastructures.. and if it works (which I will test
>>>> to be sure it will), then we should be good.
>>> 
>>> I agree it’s complicated, but the code is already written.  You should just
>>> need to adjust some masks.
>>> 
>> 
>> Its actually not that bad and a great idea, I did something like the
>> following and it works pretty well. I would say its cleaner than the old
>> approach for sure (and I also added a /proc/pid/fd/N reopen test to the
>> selftest and made sure that issue goes away).
>> 
>> Side note: One subtelty I discovered from the existing selftests is once the
>> F_SEAL_WRITE are active, an mmap of PROT_READ and MAP_SHARED region is
>> expected to fail. This is also evident from this code in mmap_region:
>>                if (vm_flags & VM_SHARED) {
>>                        error = mapping_map_writable(file->f_mapping);
>>                        if (error)
>>                                goto allow_write_and_free_vma;
>>                }
>> 
> 
> This behavior seems like a bug. Why should MAP_SHARED writes be denied
> here? There's no semantic incompatibility between shared mappings and
> the seal. And I think this change would represent an ABI break using
> memfd seals for ashmem, since ashmem currently allows MAP_SHARED
> mappings after changing prot_mask.

Hmm. I’m guessing the intent is that the mmap count should track writable mappings in addition to mappings that could be made writable using mprotect.  I think you could address this for SEAL_FUTURE in two ways:

1. In shmem_mmap, mask off VM_MAYWRITE if SEAL_FUTURE is set, or

2. Add a new vm operation that allows a vma to reject an mprotect attempt, like security_file_mprotect but per vma.  Then give it reasonable semantics for shmem.

(1) probably gives the semantics you want for SEAL_FUTURE: old maps can be mprotected, but new maps can’t.

> 
>> ---8<-----------------------
>> 
>> From: "Joel Fernandes (Google)" <joel at joelfernandes.org>
>> Subject: [PATCH] mm/memfd: implement future write seal using shmem ops
>> 
>> Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
>> ---
>> fs/hugetlbfs/inode.c |  2 +-
>> mm/memfd.c           | 19 -------------------
>> mm/shmem.c           | 13 ++++++++++---
>> 3 files changed, 11 insertions(+), 23 deletions(-)
>> 
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index 32920a10100e..1978581abfdf 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -530,7 +530,7 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
>>                inode_lock(inode);
>> 
>>                /* protected by i_mutex */
>> -               if (info->seals & F_SEAL_WRITE) {
>> +               if (info->seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE)) {
>>                        inode_unlock(inode);
>>                        return -EPERM;
>>                }
> 
> Maybe we can always set F_SEAL_FUTURE_WRITE when F_SEAL_WRITE so we
> can just test one bit except where the F_SEAL_WRITE behavior differs
> from F_SEAL_FUTURE_WRITE.

This could plausibly confuse existing users that read the seal mask.

WARNING: multiple messages have this Message-ID (diff)
From: luto@amacapital.net (Andy Lutomirski)
Subject: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd
Date: Sun, 11 Nov 2018 07:14:33 -0800	[thread overview]
Message-ID: <91E8E1AA-859A-457A-8978-3EA39CBBF075@amacapital.net> (raw)
Message-ID: <20181111151433.QogbtnDMTRJYx8dxrJ3v4yeYq1Evm9ezhphB-f6kC1o@z> (raw)
In-Reply-To: <CAKOZuethQ3eaV4uoEXiffVMc_S0hyk1FGPB3iQHHnv4NadW1UQ@mail.gmail.com>




> On Nov 11, 2018,@12:30 AM, Daniel Colascione <dancol@google.com> wrote:
> 
>> On Sun, Nov 11, 2018@12:09 AM, Joel Fernandes <joel@joelfernandes.org> wrote:
>> On Sat, Nov 10, 2018@07:40:10PM -0800, Andy Lutomirski wrote:
>> [...]
>>>>>>>>>> I see two reasonable solutions:
>>>>>>>>>> 
>>>>>>>>>> 1. Don’t fiddle with the struct file at all. Instead make the inode flag
>>>>>>>>>> work by itself.
>>>>>>>>> 
>>>>>>>>> Currently, the various VFS paths check only the struct file's f_mode to deny
>>>>>>>>> writes of already opened files. This would mean more checking in all those
>>>>>>>>> paths (and modification of all those paths).
>>>>>>>>> 
>>>>>>>>> Anyway going with that idea, we could
>>>>>>>>> 1. call deny_write_access(file) from the memfd's seal path which decrements
>>>>>>>>> the inode::i_writecount.
>>>>>>>>> 2. call get_write_access(inode) in the various VFS paths in addition to
>>>>>>>>> checking for FMODE_*WRITE and deny the write (incase i_writecount is negative)
>>>>>>>>> 
>>>>>>>>> That will prevent both reopens, and writes from succeeding. However I worry a
>>>>>>>>> bit about 2 not being too familiar with VFS internals, about what the
>>>>>>>>> consequences of doing that may be.
>>>>>>>> 
>>>>>>>> IMHO, modifying both the inode and the struct file separately is fine,
>>>>>>>> since they mean different things. In regular filesystems, it's fine to
>>>>>>>> have a read-write open file description for a file whose inode grants
>>>>>>>> write permission to nobody. Speaking of which: is fchmod enough to
>>>>>>>> prevent this attack?
>>>>>>> 
>>>>>>> Well, yes and no. fchmod does prevent reopening the file RW, but
>>>>>>> anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
>>>>>>> seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
>>>>>>> isn't sufficient by itself. While it might be good enough for Android
>>>>>>> (in the sense that it'll prevent RW-reopens from other security
>>>>>>> contexts to which we send an open memfd file), it's still conceptually
>>>>>>> ugly, IMHO. Let's go with the original approach of just tweaking the
>>>>>>> inode so that open-for-write is permanently blocked.
>>>>>> 
>>>>>> Agreed with the idea of modifying both file and inode flags. I was thinking
>>>>>> modifying i_mode may do the trick but as you pointed it probably could be
>>>>>> reverted by chmod or some other attribute setting calls.
>>>>>> 
>>>>>> OTOH, I don't think deny_write_access(file) can be reverted from any
>>>>>> user-facing path so we could do that from the seal to prevent the future
>>>>>> opens in write mode. I'll double check and test that out tomorrow.
>>>>>> 
>>>>>> 
>>>>> 
>>>>> This seems considerably more complicated and more fragile than needed. Just
>>>>> add a new F_SEAL_WRITE_FUTURE.  Grep for F_SEAL_WRITE and make the _FUTURE
>>>>> variant work exactly like it with two exceptions:
>>>>> 
>>>>> - shmem_mmap and maybe its hugetlbfs equivalent should check for it and act
>>>>> accordingly.
>>>> 
>>>> There's more to it than that, we also need to block future writes through
>>>> write syscall, so we have to hook into the write path too once the seal is
>>>> set, not just the mmap. That means we have to add code in mm/shmem.c to do
>>>> that in all those handlers, to check for the seal (and hope we didn't miss a
>>>> file_operations handler). Is that what you are proposing?
>>> 
>>> The existing code already does this. That’s why I suggested grepping :)
>>> 
>>>> 
>>>> Also, it means we have to keep CONFIG_TMPFS enabled so that the
>>>> shmem_file_operations write handlers like write_iter are hooked up. Currently
>>>> memfd works even with !CONFIG_TMPFS.
>>> 
>>> If so, that sounds like it may already be a bug.
> 
> Why shouldn't memfd work independently of CONFIG_TMPFS? In particular,
> write(2) on tmpfs FDs shouldn't work differently. If it does, that's a
> kernel implementation detail leaking out into userspace.
> 
>>>>> - add_seals won’t need the wait_for_pins and mapping_deny_write logic.
>>>>> 
>>>>> That really should be all that’s needed.
>>>> 
>>>> It seems a fair idea what you're saying. But I don't see how its less
>>>> complex.. IMO its far more simple to have VFS do the denial of the operations
>>>> based on the flags of its datastructures.. and if it works (which I will test
>>>> to be sure it will), then we should be good.
>>> 
>>> I agree it’s complicated, but the code is already written.  You should just
>>> need to adjust some masks.
>>> 
>> 
>> Its actually not that bad and a great idea, I did something like the
>> following and it works pretty well. I would say its cleaner than the old
>> approach for sure (and I also added a /proc/pid/fd/N reopen test to the
>> selftest and made sure that issue goes away).
>> 
>> Side note: One subtelty I discovered from the existing selftests is once the
>> F_SEAL_WRITE are active, an mmap of PROT_READ and MAP_SHARED region is
>> expected to fail. This is also evident from this code in mmap_region:
>>                if (vm_flags & VM_SHARED) {
>>                        error = mapping_map_writable(file->f_mapping);
>>                        if (error)
>>                                goto allow_write_and_free_vma;
>>                }
>> 
> 
> This behavior seems like a bug. Why should MAP_SHARED writes be denied
> here? There's no semantic incompatibility between shared mappings and
> the seal. And I think this change would represent an ABI break using
> memfd seals for ashmem, since ashmem currently allows MAP_SHARED
> mappings after changing prot_mask.

Hmm. I’m guessing the intent is that the mmap count should track writable mappings in addition to mappings that could be made writable using mprotect.  I think you could address this for SEAL_FUTURE in two ways:

1. In shmem_mmap, mask off VM_MAYWRITE if SEAL_FUTURE is set, or

2. Add a new vm operation that allows a vma to reject an mprotect attempt, like security_file_mprotect but per vma.  Then give it reasonable semantics for shmem.

(1) probably gives the semantics you want for SEAL_FUTURE: old maps can be mprotected, but new maps can’t.

> 
>> ---8<-----------------------
>> 
>> From: "Joel Fernandes (Google)" <joel at joelfernandes.org>
>> Subject: [PATCH] mm/memfd: implement future write seal using shmem ops
>> 
>> Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
>> ---
>> fs/hugetlbfs/inode.c |  2 +-
>> mm/memfd.c           | 19 -------------------
>> mm/shmem.c           | 13 ++++++++++---
>> 3 files changed, 11 insertions(+), 23 deletions(-)
>> 
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index 32920a10100e..1978581abfdf 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -530,7 +530,7 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
>>                inode_lock(inode);
>> 
>>                /* protected by i_mutex */
>> -               if (info->seals & F_SEAL_WRITE) {
>> +               if (info->seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE)) {
>>                        inode_unlock(inode);
>>                        return -EPERM;
>>                }
> 
> Maybe we can always set F_SEAL_FUTURE_WRITE when F_SEAL_WRITE so we
> can just test one bit except where the F_SEAL_WRITE behavior differs
> from F_SEAL_FUTURE_WRITE.

This could plausibly confuse existing users that read the seal mask.

WARNING: multiple messages have this Message-ID (diff)
From: Andy Lutomirski <luto@amacapital.net>
To: Daniel Colascione <dancol@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>,
	Jann Horn <jannh@google.com>,
	kernel list <linux-kernel@vger.kernel.org>,
	John Reck <jreck@google.com>,
	John Stultz <john.stultz@linaro.org>,
	Todd Kjos <tkjos@google.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Christoph Hellwig <hch@infradead.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Bruce Fields <bfields@fieldses.org>,
	Jeff Layton <jlayton@kernel.org>,
	Khalid Aziz <khalid.aziz@oracle.com>,
	Lei.Yang@windriver.com, linux-fsdevel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, Linux-MM <linux-mm@kvack.org>,
	marcandre.lureau@redhat.com,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Minchan Kim <minchan@kernel.org>, Shuah Khan <shuah@kernel.org>,
	Valdis
Subject: Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd
Date: Sun, 11 Nov 2018 07:14:33 -0800	[thread overview]
Message-ID: <91E8E1AA-859A-457A-8978-3EA39CBBF075@amacapital.net> (raw)
In-Reply-To: <CAKOZuethQ3eaV4uoEXiffVMc_S0hyk1FGPB3iQHHnv4NadW1UQ@mail.gmail.com>




> On Nov 11, 2018, at 12:30 AM, Daniel Colascione <dancol@google.com> wrote:
> 
>> On Sun, Nov 11, 2018 at 12:09 AM, Joel Fernandes <joel@joelfernandes.org> wrote:
>> On Sat, Nov 10, 2018 at 07:40:10PM -0800, Andy Lutomirski wrote:
>> [...]
>>>>>>>>>> I see two reasonable solutions:
>>>>>>>>>> 
>>>>>>>>>> 1. Don’t fiddle with the struct file at all. Instead make the inode flag
>>>>>>>>>> work by itself.
>>>>>>>>> 
>>>>>>>>> Currently, the various VFS paths check only the struct file's f_mode to deny
>>>>>>>>> writes of already opened files. This would mean more checking in all those
>>>>>>>>> paths (and modification of all those paths).
>>>>>>>>> 
>>>>>>>>> Anyway going with that idea, we could
>>>>>>>>> 1. call deny_write_access(file) from the memfd's seal path which decrements
>>>>>>>>> the inode::i_writecount.
>>>>>>>>> 2. call get_write_access(inode) in the various VFS paths in addition to
>>>>>>>>> checking for FMODE_*WRITE and deny the write (incase i_writecount is negative)
>>>>>>>>> 
>>>>>>>>> That will prevent both reopens, and writes from succeeding. However I worry a
>>>>>>>>> bit about 2 not being too familiar with VFS internals, about what the
>>>>>>>>> consequences of doing that may be.
>>>>>>>> 
>>>>>>>> IMHO, modifying both the inode and the struct file separately is fine,
>>>>>>>> since they mean different things. In regular filesystems, it's fine to
>>>>>>>> have a read-write open file description for a file whose inode grants
>>>>>>>> write permission to nobody. Speaking of which: is fchmod enough to
>>>>>>>> prevent this attack?
>>>>>>> 
>>>>>>> Well, yes and no. fchmod does prevent reopening the file RW, but
>>>>>>> anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
>>>>>>> seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
>>>>>>> isn't sufficient by itself. While it might be good enough for Android
>>>>>>> (in the sense that it'll prevent RW-reopens from other security
>>>>>>> contexts to which we send an open memfd file), it's still conceptually
>>>>>>> ugly, IMHO. Let's go with the original approach of just tweaking the
>>>>>>> inode so that open-for-write is permanently blocked.
>>>>>> 
>>>>>> Agreed with the idea of modifying both file and inode flags. I was thinking
>>>>>> modifying i_mode may do the trick but as you pointed it probably could be
>>>>>> reverted by chmod or some other attribute setting calls.
>>>>>> 
>>>>>> OTOH, I don't think deny_write_access(file) can be reverted from any
>>>>>> user-facing path so we could do that from the seal to prevent the future
>>>>>> opens in write mode. I'll double check and test that out tomorrow.
>>>>>> 
>>>>>> 
>>>>> 
>>>>> This seems considerably more complicated and more fragile than needed. Just
>>>>> add a new F_SEAL_WRITE_FUTURE.  Grep for F_SEAL_WRITE and make the _FUTURE
>>>>> variant work exactly like it with two exceptions:
>>>>> 
>>>>> - shmem_mmap and maybe its hugetlbfs equivalent should check for it and act
>>>>> accordingly.
>>>> 
>>>> There's more to it than that, we also need to block future writes through
>>>> write syscall, so we have to hook into the write path too once the seal is
>>>> set, not just the mmap. That means we have to add code in mm/shmem.c to do
>>>> that in all those handlers, to check for the seal (and hope we didn't miss a
>>>> file_operations handler). Is that what you are proposing?
>>> 
>>> The existing code already does this. That’s why I suggested grepping :)
>>> 
>>>> 
>>>> Also, it means we have to keep CONFIG_TMPFS enabled so that the
>>>> shmem_file_operations write handlers like write_iter are hooked up. Currently
>>>> memfd works even with !CONFIG_TMPFS.
>>> 
>>> If so, that sounds like it may already be a bug.
> 
> Why shouldn't memfd work independently of CONFIG_TMPFS? In particular,
> write(2) on tmpfs FDs shouldn't work differently. If it does, that's a
> kernel implementation detail leaking out into userspace.
> 
>>>>> - add_seals won’t need the wait_for_pins and mapping_deny_write logic.
>>>>> 
>>>>> That really should be all that’s needed.
>>>> 
>>>> It seems a fair idea what you're saying. But I don't see how its less
>>>> complex.. IMO its far more simple to have VFS do the denial of the operations
>>>> based on the flags of its datastructures.. and if it works (which I will test
>>>> to be sure it will), then we should be good.
>>> 
>>> I agree it’s complicated, but the code is already written.  You should just
>>> need to adjust some masks.
>>> 
>> 
>> Its actually not that bad and a great idea, I did something like the
>> following and it works pretty well. I would say its cleaner than the old
>> approach for sure (and I also added a /proc/pid/fd/N reopen test to the
>> selftest and made sure that issue goes away).
>> 
>> Side note: One subtelty I discovered from the existing selftests is once the
>> F_SEAL_WRITE are active, an mmap of PROT_READ and MAP_SHARED region is
>> expected to fail. This is also evident from this code in mmap_region:
>>                if (vm_flags & VM_SHARED) {
>>                        error = mapping_map_writable(file->f_mapping);
>>                        if (error)
>>                                goto allow_write_and_free_vma;
>>                }
>> 
> 
> This behavior seems like a bug. Why should MAP_SHARED writes be denied
> here? There's no semantic incompatibility between shared mappings and
> the seal. And I think this change would represent an ABI break using
> memfd seals for ashmem, since ashmem currently allows MAP_SHARED
> mappings after changing prot_mask.

Hmm. I’m guessing the intent is that the mmap count should track writable mappings in addition to mappings that could be made writable using mprotect.  I think you could address this for SEAL_FUTURE in two ways:

1. In shmem_mmap, mask off VM_MAYWRITE if SEAL_FUTURE is set, or

2. Add a new vm operation that allows a vma to reject an mprotect attempt, like security_file_mprotect but per vma.  Then give it reasonable semantics for shmem.

(1) probably gives the semantics you want for SEAL_FUTURE: old maps can be mprotected, but new maps can’t.

> 
>> ---8<-----------------------
>> 
>> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
>> Subject: [PATCH] mm/memfd: implement future write seal using shmem ops
>> 
>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>> ---
>> fs/hugetlbfs/inode.c |  2 +-
>> mm/memfd.c           | 19 -------------------
>> mm/shmem.c           | 13 ++++++++++---
>> 3 files changed, 11 insertions(+), 23 deletions(-)
>> 
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index 32920a10100e..1978581abfdf 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -530,7 +530,7 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
>>                inode_lock(inode);
>> 
>>                /* protected by i_mutex */
>> -               if (info->seals & F_SEAL_WRITE) {
>> +               if (info->seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE)) {
>>                        inode_unlock(inode);
>>                        return -EPERM;
>>                }
> 
> Maybe we can always set F_SEAL_FUTURE_WRITE when F_SEAL_WRITE so we
> can just test one bit except where the F_SEAL_WRITE behavior differs
> from F_SEAL_FUTURE_WRITE.

This could plausibly confuse existing users that read the seal mask.

  reply	other threads:[~2018-11-11 15:14 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-08  4:15 [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd Joel Fernandes (Google)
2018-11-08  4:15 ` Joel Fernandes (Google)
2018-11-08  4:15 ` joel
2018-11-08  4:15 ` [PATCH v3 resend 2/2] selftests/memfd: Add tests for F_SEAL_FUTURE_WRITE seal Joel Fernandes (Google)
2018-11-08  4:15   ` Joel Fernandes (Google)
2018-11-08  4:15   ` joel
2018-11-09  8:49 ` [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd Joel Fernandes
2018-11-09  8:49   ` Joel Fernandes
2018-11-09  8:49   ` joel
2018-11-09 20:36 ` Andrew Morton
2018-11-09 20:36   ` Andrew Morton
2018-11-09 20:36   ` akpm
2018-11-10  3:54   ` Joel Fernandes
2018-11-10  3:54     ` Joel Fernandes
2018-11-10  3:54     ` joel
2018-11-09 21:06 ` Jann Horn
2018-11-09 21:06   ` Jann Horn
2018-11-09 21:06   ` jannh
2018-11-09 21:19   ` Jann Horn
2018-11-09 21:19     ` Jann Horn
2018-11-09 21:19     ` jannh
2018-11-10  3:20     ` Joel Fernandes
2018-11-10  3:20       ` Joel Fernandes
2018-11-10  3:20       ` joel
2018-11-10  6:05       ` Andy Lutomirski
2018-11-10  6:05         ` Andy Lutomirski
2018-11-10  6:05         ` Andy Lutomirski
2018-11-10  6:05         ` luto
2018-11-10 18:24         ` Joel Fernandes
2018-11-10 18:24           ` Joel Fernandes
2018-11-10 18:24           ` Joel Fernandes
2018-11-10 18:24           ` Joel Fernandes
2018-11-10 18:24           ` joel
2018-11-10 18:45           ` Daniel Colascione
2018-11-10 18:45             ` Daniel Colascione
2018-11-10 18:45             ` Daniel Colascione
2018-11-10 18:45             ` dancol
2018-11-10 19:11             ` Daniel Colascione
2018-11-10 19:11               ` Daniel Colascione
2018-11-10 19:11               ` Daniel Colascione
2018-11-10 19:11               ` dancol
2018-11-10 19:55               ` Andy Lutomirski
2018-11-10 19:55                 ` Andy Lutomirski
2018-11-10 19:55                 ` Andy Lutomirski
2018-11-10 19:55                 ` luto
2018-11-10 22:09               ` Joel Fernandes
2018-11-10 22:09                 ` Joel Fernandes
2018-11-10 22:09                 ` Joel Fernandes
2018-11-10 22:09                 ` Joel Fernandes
2018-11-10 22:09                 ` joel
2018-11-10 22:18                 ` Andy Lutomirski
2018-11-10 22:18                   ` Andy Lutomirski
2018-11-10 22:18                   ` Andy Lutomirski
2018-11-10 22:18                   ` luto
2018-11-11  2:38                   ` Joel Fernandes
2018-11-11  2:38                     ` Joel Fernandes
2018-11-11  2:38                     ` Joel Fernandes
2018-11-11  2:38                     ` Joel Fernandes
2018-11-11  2:38                     ` joel
2018-11-11  3:40                     ` Andy Lutomirski
2018-11-11  3:40                       ` Andy Lutomirski
2018-11-11  3:40                       ` Andy Lutomirski
2018-11-11  3:40                       ` luto
2018-11-11  4:01                       ` Joel Fernandes
2018-11-11  4:01                         ` Joel Fernandes
2018-11-11  4:01                         ` Joel Fernandes
2018-11-11  4:01                         ` Joel Fernandes
2018-11-11  4:01                         ` joel
2018-11-11  8:09                       ` Joel Fernandes
2018-11-11  8:09                         ` Joel Fernandes
2018-11-11  8:09                         ` Joel Fernandes
2018-11-11  8:09                         ` Joel Fernandes
2018-11-11  8:09                         ` joel
2018-11-11  8:30                         ` Daniel Colascione
2018-11-11  8:30                           ` Daniel Colascione
2018-11-11  8:30                           ` Daniel Colascione
2018-11-11  8:30                           ` dancol
2018-11-11 15:14                           ` Andy Lutomirski [this message]
2018-11-11 15:14                             ` Andy Lutomirski
2018-11-11 15:14                             ` Andy Lutomirski
2018-11-11 15:14                             ` luto
2018-11-11 17:36                             ` Joel Fernandes
2018-11-11 17:36                               ` Joel Fernandes
2018-11-11 17:36                               ` Joel Fernandes
2018-11-11 17:36                               ` Joel Fernandes
2018-11-11 17:36                               ` joel
2018-11-10 12:26       ` Daniel Colascione
2018-11-10 17:10         ` Joel Fernandes
2018-11-10 17:10           ` Joel Fernandes
2018-11-10 17:10           ` Joel Fernandes
2018-11-10 17:10           ` joel
2018-11-09 21:40   ` Andy Lutomirski
2018-11-09 21:40     ` Andy Lutomirski
2018-11-09 21:40     ` luto
2018-11-09 20:02     ` Michael Tirado
2018-11-09 20:02       ` Michael Tirado
2018-11-09 20:02       ` mtirado418
2018-11-10  1:49       ` Joel Fernandes
2018-11-10  1:49         ` Joel Fernandes
2018-11-10  1:49         ` joel
2018-11-09 22:20   ` Daniel Colascione
2018-11-09 22:20     ` Daniel Colascione
2018-11-09 22:20     ` Daniel Colascione
2018-11-09 22:20     ` dancol
2018-11-09 22:37     ` Andy Lutomirski
2018-11-09 22:37       ` Andy Lutomirski
2018-11-09 22:37       ` Andy Lutomirski
2018-11-09 22:37       ` luto
2018-11-09 22:42       ` Daniel Colascione
2018-11-09 22:42         ` Daniel Colascione
2018-11-09 22:42         ` Daniel Colascione
2018-11-09 22:42         ` dancol
2018-11-09 23:14         ` Andy Lutomirski
2018-11-09 23:14           ` Andy Lutomirski
2018-11-09 23:14           ` Andy Lutomirski
2018-11-09 23:14           ` luto
2018-11-10  1:36           ` Joel Fernandes
2018-11-10  1:36             ` Joel Fernandes
2018-11-10  1:36             ` Joel Fernandes
2018-11-10  1:36             ` Joel Fernandes
2018-11-10  1:36             ` joel
2018-11-09 23:46   ` Joel Fernandes
2018-11-09 23:46     ` Joel Fernandes
2018-11-09 23:46     ` joel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=91E8E1AA-859A-457A-8978-3EA39CBBF075@amacapital.net \
    --to=luto@amacapital.net \
    --cc=Lei.Yang@windriver.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=dancol@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=jlayton@kernel.org \
    --cc=joel@joelfernandes.org \
    --cc=john.stultz@linaro.org \
    --cc=jreck@google.com \
    --cc=khalid.aziz@oracle.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=marcandre.lureau@redhat.com \
    --cc=mike.kravetz@oracle.com \
    --cc=minchan@kernel.org \
    --cc=shuah@kernel.org \
    --cc=tkjos@google.com \
    --cc=valdis.kletnieks@vt.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.