* [PATCH v3 1/3] mm: drop the assumption that VM_SHARED always implies writable
2023-10-07 20:50 [PATCH v3 0/3] permit write-sealed memfd read-only shared mappings Lorenzo Stoakes
@ 2023-10-07 20:50 ` Lorenzo Stoakes
2023-10-07 20:51 ` [PATCH v3 2/3] mm: update memfd seal write check to include F_SEAL_WRITE Lorenzo Stoakes
2023-10-07 20:51 ` [PATCH v3 3/3] mm: enforce the mapping_map_writable() check after call_mmap() Lorenzo Stoakes
2 siblings, 0 replies; 7+ messages in thread
From: Lorenzo Stoakes @ 2023-10-07 20:50 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton
Cc: Mike Kravetz, Muchun Song, Alexander Viro, Christian Brauner,
Matthew Wilcox, Hugh Dickins, Andy Lutomirski, Jan Kara,
linux-fsdevel, bpf, Lorenzo Stoakes
There is a general assumption that VMAs with the VM_SHARED flag set are
writable. If the VM_MAYWRITE flag is not set, then this is simply not the
case.
Update those checks which affect the struct address_space->i_mmap_writable
field to explicitly test for this by introducing [vma_]is_shared_maywrite()
helper functions.
This remains entirely conservative, as the lack of VM_MAYWRITE guarantees
that the VMA cannot be written to.
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
---
include/linux/fs.h | 4 ++--
include/linux/mm.h | 11 +++++++++++
kernel/fork.c | 2 +-
mm/filemap.c | 2 +-
mm/madvise.c | 2 +-
mm/mmap.c | 12 ++++++------
6 files changed, 22 insertions(+), 11 deletions(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 92a9c6157de1..e9c03fb00d5c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -454,7 +454,7 @@ extern const struct address_space_operations empty_aops;
* It is also used to block modification of page cache contents through
* memory mappings.
* @gfp_mask: Memory allocation flags to use for allocating pages.
- * @i_mmap_writable: Number of VM_SHARED mappings.
+ * @i_mmap_writable: Number of VM_SHARED, VM_MAYWRITE mappings.
* @nr_thps: Number of THPs in the pagecache (non-shmem only).
* @i_mmap: Tree of private and shared mappings.
* @i_mmap_rwsem: Protects @i_mmap and @i_mmap_writable.
@@ -557,7 +557,7 @@ static inline int mapping_mapped(struct address_space *mapping)
/*
* Might pages of this file have been modified in userspace?
- * Note that i_mmap_writable counts all VM_SHARED vmas: do_mmap
+ * Note that i_mmap_writable counts all VM_SHARED, VM_MAYWRITE vmas: do_mmap
* marks vma as VM_SHARED if it is shared, and the file was opened for
* writing i.e. vma may be mprotected writable even if now readonly.
*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a7b667786cde..c9e9628addc4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -937,6 +937,17 @@ static inline bool vma_is_accessible(struct vm_area_struct *vma)
return vma->vm_flags & VM_ACCESS_FLAGS;
}
+static inline bool is_shared_maywrite(vm_flags_t vm_flags)
+{
+ return (vm_flags & (VM_SHARED | VM_MAYWRITE)) ==
+ (VM_SHARED | VM_MAYWRITE);
+}
+
+static inline bool vma_is_shared_maywrite(struct vm_area_struct *vma)
+{
+ return is_shared_maywrite(vma->vm_flags);
+}
+
static inline
struct vm_area_struct *vma_find(struct vma_iterator *vmi, unsigned long max)
{
diff --git a/kernel/fork.c b/kernel/fork.c
index e45a4457ba83..1e6c656e0857 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -733,7 +733,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
get_file(file);
i_mmap_lock_write(mapping);
- if (tmp->vm_flags & VM_SHARED)
+ if (vma_is_shared_maywrite(tmp))
mapping_allow_writable(mapping);
flush_dcache_mmap_lock(mapping);
/* insert tmp into the share list, just after mpnt */
diff --git a/mm/filemap.c b/mm/filemap.c
index 48cd16c54e86..ad559f94e125 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3637,7 +3637,7 @@ int generic_file_mmap(struct file *file, struct vm_area_struct *vma)
*/
int generic_file_readonly_mmap(struct file *file, struct vm_area_struct *vma)
{
- if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE))
+ if (vma_is_shared_maywrite(vma))
return -EINVAL;
return generic_file_mmap(file, vma);
}
diff --git a/mm/madvise.c b/mm/madvise.c
index a4a20de50494..5c74fb645de8 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -999,7 +999,7 @@ static long madvise_remove(struct vm_area_struct *vma,
return -EINVAL;
}
- if ((vma->vm_flags & (VM_SHARED|VM_WRITE)) != (VM_SHARED|VM_WRITE))
+ if (!vma_is_shared_maywrite(vma))
return -EACCES;
offset = (loff_t)(start - vma->vm_start)
diff --git a/mm/mmap.c b/mm/mmap.c
index 673429ee8a9e..6f6856b3267a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -107,7 +107,7 @@ void vma_set_page_prot(struct vm_area_struct *vma)
static void __remove_shared_vm_struct(struct vm_area_struct *vma,
struct file *file, struct address_space *mapping)
{
- if (vma->vm_flags & VM_SHARED)
+ if (vma_is_shared_maywrite(vma))
mapping_unmap_writable(mapping);
flush_dcache_mmap_lock(mapping);
@@ -384,7 +384,7 @@ static unsigned long count_vma_pages_range(struct mm_struct *mm,
static void __vma_link_file(struct vm_area_struct *vma,
struct address_space *mapping)
{
- if (vma->vm_flags & VM_SHARED)
+ if (vma_is_shared_maywrite(vma))
mapping_allow_writable(mapping);
flush_dcache_mmap_lock(mapping);
@@ -2767,7 +2767,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vma->vm_pgoff = pgoff;
if (file) {
- if (vm_flags & VM_SHARED) {
+ if (is_shared_maywrite(vm_flags)) {
error = mapping_map_writable(file->f_mapping);
if (error)
goto free_vma;
@@ -2842,7 +2842,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
mm->map_count++;
if (vma->vm_file) {
i_mmap_lock_write(vma->vm_file->f_mapping);
- if (vma->vm_flags & VM_SHARED)
+ if (vma_is_shared_maywrite(vma))
mapping_allow_writable(vma->vm_file->f_mapping);
flush_dcache_mmap_lock(vma->vm_file->f_mapping);
@@ -2859,7 +2859,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
/* Once vma denies write, undo our temporary denial count */
unmap_writable:
- if (file && vm_flags & VM_SHARED)
+ if (file && is_shared_maywrite(vm_flags))
mapping_unmap_writable(file->f_mapping);
file = vma->vm_file;
ksm_add_vma(vma);
@@ -2907,7 +2907,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
unmap_region(mm, &vmi.mas, vma, prev, next, vma->vm_start,
vma->vm_end, vma->vm_end, true);
}
- if (file && (vm_flags & VM_SHARED))
+ if (file && is_shared_maywrite(vm_flags))
mapping_unmap_writable(file->f_mapping);
free_vma:
vm_area_free(vma);
--
2.42.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v3 2/3] mm: update memfd seal write check to include F_SEAL_WRITE
2023-10-07 20:50 [PATCH v3 0/3] permit write-sealed memfd read-only shared mappings Lorenzo Stoakes
2023-10-07 20:50 ` [PATCH v3 1/3] mm: drop the assumption that VM_SHARED always implies writable Lorenzo Stoakes
@ 2023-10-07 20:51 ` Lorenzo Stoakes
2023-10-07 20:51 ` [PATCH v3 3/3] mm: enforce the mapping_map_writable() check after call_mmap() Lorenzo Stoakes
2 siblings, 0 replies; 7+ messages in thread
From: Lorenzo Stoakes @ 2023-10-07 20:51 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton
Cc: Mike Kravetz, Muchun Song, Alexander Viro, Christian Brauner,
Matthew Wilcox, Hugh Dickins, Andy Lutomirski, Jan Kara,
linux-fsdevel, bpf, Lorenzo Stoakes
The seal_check_future_write() function is called by shmem_mmap() or
hugetlbfs_file_mmap() to disallow any future writable mappings of an memfd
sealed this way.
The F_SEAL_WRITE flag is not checked here, as that is handled via the
mapping->i_mmap_writable mechanism and so any attempt at a mapping would
fail before this could be run.
However we intend to change this, meaning this check can be performed for
F_SEAL_WRITE mappings also.
The logic here is equally applicable to both flags, so update this function
to accommodate both and rename it accordingly.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
---
fs/hugetlbfs/inode.c | 2 +-
include/linux/mm.h | 15 ++++++++-------
mm/shmem.c | 2 +-
3 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 06693bb1153d..5c333373dcc9 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -112,7 +112,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
vm_flags_set(vma, VM_HUGETLB | VM_DONTEXPAND);
vma->vm_ops = &hugetlb_vm_ops;
- ret = seal_check_future_write(info->seals, vma);
+ ret = seal_check_write(info->seals, vma);
if (ret)
return ret;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c9e9628addc4..51a217ed4d1b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4027,25 +4027,26 @@ static inline void mem_dump_obj(void *object) {}
#endif
/**
- * seal_check_future_write - Check for F_SEAL_FUTURE_WRITE flag and handle it
+ * seal_check_write - Check for F_SEAL_WRITE or F_SEAL_FUTURE_WRITE flags and
+ * handle them.
* @seals: the seals to check
* @vma: the vma to operate on
*
- * Check whether F_SEAL_FUTURE_WRITE is set; if so, do proper check/handling on
- * the vma flags. Return 0 if check pass, or <0 for errors.
+ * Check whether F_SEAL_WRITE or F_SEAL_FUTURE_WRITE are set; if so, do proper
+ * check/handling on the vma flags. Return 0 if check pass, or <0 for errors.
*/
-static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
+static inline int seal_check_write(int seals, struct vm_area_struct *vma)
{
- if (seals & F_SEAL_FUTURE_WRITE) {
+ if (seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE)) {
/*
* New PROT_WRITE and MAP_SHARED mmaps are not allowed when
- * "future write" seal active.
+ * write seals are active.
*/
if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE))
return -EPERM;
/*
- * Since an F_SEAL_FUTURE_WRITE sealed memfd can be mapped as
+ * Since an F_SEAL_[FUTURE_]WRITE sealed memfd can be mapped as
* MAP_SHARED and read-only, take care to not allow mprotect to
* revert protections on such mappings. Do this only for shared
* mappings. For private mappings, don't need to mask
diff --git a/mm/shmem.c b/mm/shmem.c
index 6503910b0f54..cab053831fea 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2405,7 +2405,7 @@ static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
struct shmem_inode_info *info = SHMEM_I(inode);
int ret;
- ret = seal_check_future_write(info->seals, vma);
+ ret = seal_check_write(info->seals, vma);
if (ret)
return ret;
--
2.42.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v3 3/3] mm: enforce the mapping_map_writable() check after call_mmap()
2023-10-07 20:50 [PATCH v3 0/3] permit write-sealed memfd read-only shared mappings Lorenzo Stoakes
2023-10-07 20:50 ` [PATCH v3 1/3] mm: drop the assumption that VM_SHARED always implies writable Lorenzo Stoakes
2023-10-07 20:51 ` [PATCH v3 2/3] mm: update memfd seal write check to include F_SEAL_WRITE Lorenzo Stoakes
@ 2023-10-07 20:51 ` Lorenzo Stoakes
2023-10-11 9:46 ` Jan Kara
2 siblings, 1 reply; 7+ messages in thread
From: Lorenzo Stoakes @ 2023-10-07 20:51 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton
Cc: Mike Kravetz, Muchun Song, Alexander Viro, Christian Brauner,
Matthew Wilcox, Hugh Dickins, Andy Lutomirski, Jan Kara,
linux-fsdevel, bpf, Lorenzo Stoakes
In order for an F_SEAL_WRITE sealed memfd mapping to have an opportunity to
clear VM_MAYWRITE in seal_check_write() we must be able to invoke either
the shmem_mmap() or hugetlbfs_file_mmap() f_ops->mmap() handler to do so.
We would otherwise fail the mapping_map_writable() check before we had
the opportunity to clear VM_MAYWRITE.
However, the existing logic in mmap_region() performs this check BEFORE
calling call_mmap() (which invokes file->f_ops->mmap()). We must enforce
this check AFTER the function call.
In order to avoid any risk of breaking call_mmap() handlers which assume
this will have been done first, we continue to mark the file writable
first, simply deferring enforcement of it failing until afterwards.
This enables mmap(..., PROT_READ, MAP_SHARED, fd, 0) mappings for memfd's
sealed via F_SEAL_WRITE to succeed, whereas previously they were not
permitted.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217238
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
---
mm/mmap.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 6f6856b3267a..9fbee92aaaee 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2767,17 +2767,25 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vma->vm_pgoff = pgoff;
if (file) {
- if (is_shared_maywrite(vm_flags)) {
- error = mapping_map_writable(file->f_mapping);
- if (error)
- goto free_vma;
- }
+ int writable_error = 0;
+
+ if (vma_is_shared_maywrite(vma))
+ writable_error = mapping_map_writable(file->f_mapping);
vma->vm_file = get_file(file);
error = call_mmap(file, vma);
if (error)
goto unmap_and_free_vma;
+ /*
+ * call_mmap() may have changed VMA flags, so retry this check
+ * if it failed before.
+ */
+ if (writable_error && vma_is_shared_maywrite(vma)) {
+ error = writable_error;
+ goto close_and_free_vma;
+ }
+
/*
* Expansion is handled above, merging is handled below.
* Drivers should not alter the address of the VMA.
--
2.42.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v3 3/3] mm: enforce the mapping_map_writable() check after call_mmap()
2023-10-07 20:51 ` [PATCH v3 3/3] mm: enforce the mapping_map_writable() check after call_mmap() Lorenzo Stoakes
@ 2023-10-11 9:46 ` Jan Kara
2023-10-11 18:14 ` Lorenzo Stoakes
0 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2023-10-11 9:46 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: linux-mm, linux-kernel, Andrew Morton, Mike Kravetz, Muchun Song,
Alexander Viro, Christian Brauner, Matthew Wilcox, Hugh Dickins,
Andy Lutomirski, Jan Kara, linux-fsdevel, bpf
On Sat 07-10-23 21:51:01, Lorenzo Stoakes wrote:
> In order for an F_SEAL_WRITE sealed memfd mapping to have an opportunity to
> clear VM_MAYWRITE in seal_check_write() we must be able to invoke either
> the shmem_mmap() or hugetlbfs_file_mmap() f_ops->mmap() handler to do so.
>
> We would otherwise fail the mapping_map_writable() check before we had
> the opportunity to clear VM_MAYWRITE.
>
> However, the existing logic in mmap_region() performs this check BEFORE
> calling call_mmap() (which invokes file->f_ops->mmap()). We must enforce
> this check AFTER the function call.
>
> In order to avoid any risk of breaking call_mmap() handlers which assume
> this will have been done first, we continue to mark the file writable
> first, simply deferring enforcement of it failing until afterwards.
>
> This enables mmap(..., PROT_READ, MAP_SHARED, fd, 0) mappings for memfd's
> sealed via F_SEAL_WRITE to succeed, whereas previously they were not
> permitted.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217238
> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
...
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 6f6856b3267a..9fbee92aaaee 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2767,17 +2767,25 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> vma->vm_pgoff = pgoff;
>
> if (file) {
> - if (is_shared_maywrite(vm_flags)) {
> - error = mapping_map_writable(file->f_mapping);
> - if (error)
> - goto free_vma;
> - }
> + int writable_error = 0;
> +
> + if (vma_is_shared_maywrite(vma))
> + writable_error = mapping_map_writable(file->f_mapping);
>
> vma->vm_file = get_file(file);
> error = call_mmap(file, vma);
> if (error)
> goto unmap_and_free_vma;
>
> + /*
> + * call_mmap() may have changed VMA flags, so retry this check
> + * if it failed before.
> + */
> + if (writable_error && vma_is_shared_maywrite(vma)) {
> + error = writable_error;
> + goto close_and_free_vma;
> + }
Hum, this doesn't quite give me a peace of mind ;). One bug I can see is
that if call_mmap() drops the VM_MAYWRITE flag, we seem to forget to drop
i_mmap_writeable counter here?
I've checked why your v2 version broke i915 and I think the reason maybe
has nothing to do with i915. Just in case call_mmap() failed, it ended up
jumping to unmap_and_free_vma which calls mapping_unmap_writable() but we
didn't call mapping_map_writable() yet so the counter became imbalanced.
So I'd be for returning to v2 version, just fix up the error handling
paths...
Honza
> +
> /*
> * Expansion is handled above, merging is handled below.
> * Drivers should not alter the address of the VMA.
> --
> 2.42.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 3/3] mm: enforce the mapping_map_writable() check after call_mmap()
2023-10-11 9:46 ` Jan Kara
@ 2023-10-11 18:14 ` Lorenzo Stoakes
2023-10-12 8:38 ` Jan Kara
0 siblings, 1 reply; 7+ messages in thread
From: Lorenzo Stoakes @ 2023-10-11 18:14 UTC (permalink / raw)
To: Jan Kara
Cc: linux-mm, linux-kernel, Andrew Morton, Mike Kravetz, Muchun Song,
Alexander Viro, Christian Brauner, Matthew Wilcox, Hugh Dickins,
Andy Lutomirski, linux-fsdevel, bpf
On Wed, Oct 11, 2023 at 11:46:27AM +0200, Jan Kara wrote:
> On Sat 07-10-23 21:51:01, Lorenzo Stoakes wrote:
> > In order for an F_SEAL_WRITE sealed memfd mapping to have an opportunity to
> > clear VM_MAYWRITE in seal_check_write() we must be able to invoke either
> > the shmem_mmap() or hugetlbfs_file_mmap() f_ops->mmap() handler to do so.
> >
> > We would otherwise fail the mapping_map_writable() check before we had
> > the opportunity to clear VM_MAYWRITE.
> >
> > However, the existing logic in mmap_region() performs this check BEFORE
> > calling call_mmap() (which invokes file->f_ops->mmap()). We must enforce
> > this check AFTER the function call.
> >
> > In order to avoid any risk of breaking call_mmap() handlers which assume
> > this will have been done first, we continue to mark the file writable
> > first, simply deferring enforcement of it failing until afterwards.
> >
> > This enables mmap(..., PROT_READ, MAP_SHARED, fd, 0) mappings for memfd's
> > sealed via F_SEAL_WRITE to succeed, whereas previously they were not
> > permitted.
> >
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217238
> > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
>
> ...
>
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 6f6856b3267a..9fbee92aaaee 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -2767,17 +2767,25 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> > vma->vm_pgoff = pgoff;
> >
> > if (file) {
> > - if (is_shared_maywrite(vm_flags)) {
> > - error = mapping_map_writable(file->f_mapping);
> > - if (error)
> > - goto free_vma;
> > - }
> > + int writable_error = 0;
> > +
> > + if (vma_is_shared_maywrite(vma))
> > + writable_error = mapping_map_writable(file->f_mapping);
> >
> > vma->vm_file = get_file(file);
> > error = call_mmap(file, vma);
> > if (error)
> > goto unmap_and_free_vma;
> >
> > + /*
> > + * call_mmap() may have changed VMA flags, so retry this check
> > + * if it failed before.
> > + */
> > + if (writable_error && vma_is_shared_maywrite(vma)) {
> > + error = writable_error;
> > + goto close_and_free_vma;
> > + }
>
> Hum, this doesn't quite give me a peace of mind ;). One bug I can see is
> that if call_mmap() drops the VM_MAYWRITE flag, we seem to forget to drop
> i_mmap_writeable counter here?
This wouldn't be applicable in the F_SEAL_WRITE case, as the
i_mmap_writable counter would already have been decremented, and thus an
error would arise causing no further decrement, and everything would work
fine.
It'd be very odd for something to be writable here but the driver to make
it not writable. But we do need to account for this.
>
> I've checked why your v2 version broke i915 and I think the reason maybe
> has nothing to do with i915. Just in case call_mmap() failed, it ended up
> jumping to unmap_and_free_vma which calls mapping_unmap_writable() but we
> didn't call mapping_map_writable() yet so the counter became imbalanced.
yeah that must be the cause, I thought perhaps somehow
__remove_shared_vm_struct() got invoked by i915_gem_mmap() but I didn't
trace it through to see if it was possible.
Looking at it again, i don't think that is possible, as we hold a mmap/vma
write lock, and the only operations that can cause
__remove_shared_vm_struct() to run are things that would not be able to do
so with this lock held.
>
> So I'd be for returning to v2 version, just fix up the error handling
> paths...
So in conclusion, I agree, this is the better approach. Will respin in v4.
>
> Honza
>
>
> > +
> > /*
> > * Expansion is handled above, merging is handled below.
> > * Drivers should not alter the address of the VMA.
> > --
> > 2.42.0
> >
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 3/3] mm: enforce the mapping_map_writable() check after call_mmap()
2023-10-11 18:14 ` Lorenzo Stoakes
@ 2023-10-12 8:38 ` Jan Kara
0 siblings, 0 replies; 7+ messages in thread
From: Jan Kara @ 2023-10-12 8:38 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Jan Kara, linux-mm, linux-kernel, Andrew Morton, Mike Kravetz,
Muchun Song, Alexander Viro, Christian Brauner, Matthew Wilcox,
Hugh Dickins, Andy Lutomirski, linux-fsdevel, bpf
On Wed 11-10-23 19:14:10, Lorenzo Stoakes wrote:
> On Wed, Oct 11, 2023 at 11:46:27AM +0200, Jan Kara wrote:
> > On Sat 07-10-23 21:51:01, Lorenzo Stoakes wrote:
> > > In order for an F_SEAL_WRITE sealed memfd mapping to have an opportunity to
> > > clear VM_MAYWRITE in seal_check_write() we must be able to invoke either
> > > the shmem_mmap() or hugetlbfs_file_mmap() f_ops->mmap() handler to do so.
> > >
> > > We would otherwise fail the mapping_map_writable() check before we had
> > > the opportunity to clear VM_MAYWRITE.
> > >
> > > However, the existing logic in mmap_region() performs this check BEFORE
> > > calling call_mmap() (which invokes file->f_ops->mmap()). We must enforce
> > > this check AFTER the function call.
> > >
> > > In order to avoid any risk of breaking call_mmap() handlers which assume
> > > this will have been done first, we continue to mark the file writable
> > > first, simply deferring enforcement of it failing until afterwards.
> > >
> > > This enables mmap(..., PROT_READ, MAP_SHARED, fd, 0) mappings for memfd's
> > > sealed via F_SEAL_WRITE to succeed, whereas previously they were not
> > > permitted.
> > >
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217238
> > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
> >
> > ...
> >
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index 6f6856b3267a..9fbee92aaaee 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -2767,17 +2767,25 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> > > vma->vm_pgoff = pgoff;
> > >
> > > if (file) {
> > > - if (is_shared_maywrite(vm_flags)) {
> > > - error = mapping_map_writable(file->f_mapping);
> > > - if (error)
> > > - goto free_vma;
> > > - }
> > > + int writable_error = 0;
> > > +
> > > + if (vma_is_shared_maywrite(vma))
> > > + writable_error = mapping_map_writable(file->f_mapping);
> > >
> > > vma->vm_file = get_file(file);
> > > error = call_mmap(file, vma);
> > > if (error)
> > > goto unmap_and_free_vma;
> > >
> > > + /*
> > > + * call_mmap() may have changed VMA flags, so retry this check
> > > + * if it failed before.
> > > + */
> > > + if (writable_error && vma_is_shared_maywrite(vma)) {
> > > + error = writable_error;
> > > + goto close_and_free_vma;
> > > + }
> >
> > Hum, this doesn't quite give me a peace of mind ;). One bug I can see is
> > that if call_mmap() drops the VM_MAYWRITE flag, we seem to forget to drop
> > i_mmap_writeable counter here?
>
> This wouldn't be applicable in the F_SEAL_WRITE case, as the
> i_mmap_writable counter would already have been decremented, and thus an
> error would arise causing no further decrement, and everything would work
> fine.
>
> It'd be very odd for something to be writable here but the driver to make
> it not writable. But we do need to account for this.
Yeah, it may be odd but this is indeed what i915 driver appears to be
doing in i915_gem_object_mmap():
if (i915_gem_object_is_readonly(obj)) {
if (vma->vm_flags & VM_WRITE) {
i915_gem_object_put(obj);
return -EINVAL;
}
vm_flags_clear(vma, VM_MAYWRITE);
}
> > I've checked why your v2 version broke i915 and I think the reason maybe
> > has nothing to do with i915. Just in case call_mmap() failed, it ended up
> > jumping to unmap_and_free_vma which calls mapping_unmap_writable() but we
> > didn't call mapping_map_writable() yet so the counter became imbalanced.
>
> yeah that must be the cause, I thought perhaps somehow
> __remove_shared_vm_struct() got invoked by i915_gem_mmap() but I didn't
> trace it through to see if it was possible.
>
> Looking at it again, i don't think that is possible, as we hold a mmap/vma
> write lock, and the only operations that can cause
> __remove_shared_vm_struct() to run are things that would not be able to do
> so with this lock held.
>
> > So I'd be for returning to v2 version, just fix up the error handling
> > paths...
>
> So in conclusion, I agree, this is the better approach. Will respin in v4.
Thanks!
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 7+ messages in thread