linux-unionfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mmap denywrite mess (Was: [GIT PULL] overlayfs fixes for 5.14-rc6)
@ 2021-08-11 14:11 Miklos Szeredi
  2021-08-11 14:45 ` David Hildenbrand
  0 siblings, 1 reply; 4+ messages in thread
From: Miklos Szeredi @ 2021-08-11 14:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christian König, Matthew Wilcox, linux-mm, linux-kernel,
	linux-fsdevel, linux-unionfs

On Mon, Aug 09, 2021 at 02:25:17PM -0700, Linus Torvalds wrote:

> Ugh. Th edances with denywrite and mapping_unmap_writable are really
> really annoying.

Attached version has error and success paths separated.  Was that your
complaint?

> I get the feeling that the whole thing with deny_write_access and
> mapping_map_writable could possibly be done after-the-fact somehow as
> part of actually inserting the vma in the vma tree, rather than done
> as the vma is prepared.

I don't know if that's doable or not.  The final denywrite count is obtained in
__vma_link_file(), called after __vma_link().  The questions are:

 - does the order of those helper calls matter?

 - if it does, could the __vma_link() be safely undone after an unsuccessful
   __vmal_link_file()?

> And most users of vma_set_file() probably really don't want that whole
> thing at all (ie the DRM stuff that just switches out a local thing.
> They also don't check for the new error cases you've added.

Christian König wants to follow up with those checks (which should be asserts,
if the code wasn't buggy in the first place).

> So I really think this is quite questionable, and those cases should
> probably have been done entirely inside ovlfs rather than polluting
> the cases that don't care and don't check.

I don't get that.  mmap_region() currently drops the deny counts from the
original file.  That doesn't work for overlayfs since it needs to take new temp
counts on the override file.

So mmap_region() is changed to drop the counts on vma->vm_file, but then all
callers of vma_set_file() will need to do that switch of temp counts, there's no
way around that.

Thanks,
Miklos

For reference, here's the previous discussion:

https://lore.kernel.org/linux-mm/YNHXzBgzRrZu1MrD@miu.piliscsaba.redhat.com/

---
 fs/overlayfs/file.c |    4 +++-
 include/linux/mm.h  |    2 +-
 mm/mmap.c           |    2 +-
 mm/util.c           |   31 ++++++++++++++++++++++++++++++-
 4 files changed, 35 insertions(+), 4 deletions(-)

--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -475,7 +475,9 @@ static int ovl_mmap(struct file *file, s
 	if (WARN_ON(file != vma->vm_file))
 		return -EIO;
 
-	vma_set_file(vma, realfile);
+	ret = vma_set_file(vma, realfile);
+	if (ret)
+		return ret;
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
 	ret = call_mmap(vma->vm_file, vma);
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2780,7 +2780,7 @@ static inline void vma_set_page_prot(str
 }
 #endif
 
-void vma_set_file(struct vm_area_struct *vma, struct file *file);
+int /* __must_check */ vma_set_file(struct vm_area_struct *vma, struct file *file);
 
 #ifdef CONFIG_NUMA_BALANCING
 unsigned long change_prot_numa(struct vm_area_struct *vma,
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1806,6 +1806,7 @@ unsigned long mmap_region(struct file *f
 		 */
 		vma->vm_file = get_file(file);
 		error = call_mmap(file, vma);
+		file = vma->vm_file;
 		if (error)
 			goto unmap_and_free_vma;
 
@@ -1867,7 +1868,6 @@ unsigned long mmap_region(struct file *f
 		if (vm_flags & VM_DENYWRITE)
 			allow_write_access(file);
 	}
-	file = vma->vm_file;
 out:
 	perf_event_mmap(vma);
 
--- a/mm/util.c
+++ b/mm/util.c
@@ -314,12 +314,41 @@ int vma_is_stack_for_current(struct vm_a
 /*
  * Change backing file, only valid to use during initial VMA setup.
  */
-void vma_set_file(struct vm_area_struct *vma, struct file *file)
+int vma_set_file(struct vm_area_struct *vma, struct file *file)
 {
+	vm_flags_t vm_flags = vma->vm_flags;
+	int err;
+
+	/* Get temporary denial counts on replacement */
+	if (vm_flags & VM_DENYWRITE) {
+		err = deny_write_access(file);
+		if (err)
+			return err;
+	}
+	if (vm_flags & VM_SHARED) {
+		err = mapping_map_writable(file->f_mapping);
+		if (err)
+			goto undo_denywrite;
+	}
+
 	/* Changing an anonymous vma with this is illegal */
 	get_file(file);
 	swap(vma->vm_file, file);
+
+	/* Undo temporary denial counts on replaced */
+	if (vm_flags & VM_SHARED)
+		mapping_unmap_writable(file->f_mapping);
+
+	if (vm_flags & VM_DENYWRITE)
+		allow_write_access(file);
+
 	fput(file);
+	return 0;
+
+undo_denywrite:
+	if (vm_flags & VM_DENYWRITE)
+		allow_write_access(file);
+	return err;
 }
 EXPORT_SYMBOL(vma_set_file);
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mmap denywrite mess (Was: [GIT PULL] overlayfs fixes for 5.14-rc6)
  2021-08-11 14:11 mmap denywrite mess (Was: [GIT PULL] overlayfs fixes for 5.14-rc6) Miklos Szeredi
@ 2021-08-11 14:45 ` David Hildenbrand
  2021-08-11 16:20   ` Linus Torvalds
  0 siblings, 1 reply; 4+ messages in thread
From: David Hildenbrand @ 2021-08-11 14:45 UTC (permalink / raw)
  To: Miklos Szeredi, Linus Torvalds
  Cc: Christian König, Matthew Wilcox, linux-mm, linux-kernel,
	linux-fsdevel, linux-unionfs

On 11.08.21 16:11, Miklos Szeredi wrote:
> On Mon, Aug 09, 2021 at 02:25:17PM -0700, Linus Torvalds wrote:
> 
>> Ugh. Th edances with denywrite and mapping_unmap_writable are really
>> really annoying.
> 
> Attached version has error and success paths separated.  Was that your
> complaint?
> 
>> I get the feeling that the whole thing with deny_write_access and
>> mapping_map_writable could possibly be done after-the-fact somehow as
>> part of actually inserting the vma in the vma tree, rather than done
>> as the vma is prepared.
> 
> I don't know if that's doable or not.  The final denywrite count is obtained in
> __vma_link_file(), called after __vma_link().  The questions are:
> 
>   - does the order of those helper calls matter?
> 
>   - if it does, could the __vma_link() be safely undone after an unsuccessful
>     __vmal_link_file()?
> 
>> And most users of vma_set_file() probably really don't want that whole
>> thing at all (ie the DRM stuff that just switches out a local thing.
>> They also don't check for the new error cases you've added.
> 
> Christian König wants to follow up with those checks (which should be asserts,
> if the code wasn't buggy in the first place).
> 
>> So I really think this is quite questionable, and those cases should
>> probably have been done entirely inside ovlfs rather than polluting
>> the cases that don't care and don't check.
> 
> I don't get that.  mmap_region() currently drops the deny counts from the
> original file.  That doesn't work for overlayfs since it needs to take new temp
> counts on the override file.
> 
> So mmap_region() is changed to drop the counts on vma->vm_file, but then all
> callers of vma_set_file() will need to do that switch of temp counts, there's no
> way around that.
> 
> Thanks,
> Miklos
> 
> For reference, here's the previous discussion:
> 
> https://lore.kernel.org/linux-mm/YNHXzBgzRrZu1MrD@miu.piliscsaba.redhat.com/
> 
> ---
>   fs/overlayfs/file.c |    4 +++-
>   include/linux/mm.h  |    2 +-
>   mm/mmap.c           |    2 +-
>   mm/util.c           |   31 ++++++++++++++++++++++++++++++-
>   4 files changed, 35 insertions(+), 4 deletions(-)
> 
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -475,7 +475,9 @@ static int ovl_mmap(struct file *file, s
>   	if (WARN_ON(file != vma->vm_file))
>   		return -EIO;
>   
> -	vma_set_file(vma, realfile);
> +	ret = vma_set_file(vma, realfile);
> +	if (ret)
> +		return ret;
>   
>   	old_cred = ovl_override_creds(file_inode(file)->i_sb);
>   	ret = call_mmap(vma->vm_file, vma);
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2780,7 +2780,7 @@ static inline void vma_set_page_prot(str
>   }
>   #endif
>   
> -void vma_set_file(struct vm_area_struct *vma, struct file *file);
> +int /* __must_check */ vma_set_file(struct vm_area_struct *vma, struct file *file);
>   
>   #ifdef CONFIG_NUMA_BALANCING
>   unsigned long change_prot_numa(struct vm_area_struct *vma,
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1806,6 +1806,7 @@ unsigned long mmap_region(struct file *f
>   		 */
>   		vma->vm_file = get_file(file);
>   		error = call_mmap(file, vma);
> +		file = vma->vm_file;
>   		if (error)
>   			goto unmap_and_free_vma;
>   
> @@ -1867,7 +1868,6 @@ unsigned long mmap_region(struct file *f
>   		if (vm_flags & VM_DENYWRITE)
>   			allow_write_access(file);
>   	}
> -	file = vma->vm_file;
>   out:
>   	perf_event_mmap(vma);
>   
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -314,12 +314,41 @@ int vma_is_stack_for_current(struct vm_a
>   /*
>    * Change backing file, only valid to use during initial VMA setup.
>    */
> -void vma_set_file(struct vm_area_struct *vma, struct file *file)
> +int vma_set_file(struct vm_area_struct *vma, struct file *file)
>   {
> +	vm_flags_t vm_flags = vma->vm_flags;
> +	int err;
> +
> +	/* Get temporary denial counts on replacement */
> +	if (vm_flags & VM_DENYWRITE) {
> +		err = deny_write_access(file);
> +		if (err)
> +			return err;
> +	}
> +	if (vm_flags & VM_SHARED) {
> +		err = mapping_map_writable(file->f_mapping);
> +		if (err)
> +			goto undo_denywrite;
> +	}
> +
>   	/* Changing an anonymous vma with this is illegal */
>   	get_file(file);
>   	swap(vma->vm_file, file);
> +
> +	/* Undo temporary denial counts on replaced */
> +	if (vm_flags & VM_SHARED)
> +		mapping_unmap_writable(file->f_mapping);
> +
> +	if (vm_flags & VM_DENYWRITE)
> +		allow_write_access(file);
> +
>   	fput(file);
> +	return 0;
> +
> +undo_denywrite:
> +	if (vm_flags & VM_DENYWRITE)
> +		allow_write_access(file);
> +	return err;
>   }
>   EXPORT_SYMBOL(vma_set_file);
>   
> 

I proposed a while ago to get rid of VM_DENYWRITE completely:

https://lkml.kernel.org/r/20210423131640.20080-1-david@redhat.com

I haven't looked how much it still applies to current upstream, but 
maybe that might help cleaning up that code.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mmap denywrite mess (Was: [GIT PULL] overlayfs fixes for 5.14-rc6)
  2021-08-11 14:45 ` David Hildenbrand
@ 2021-08-11 16:20   ` Linus Torvalds
  2021-08-12  7:12     ` David Hildenbrand
  0 siblings, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2021-08-11 16:20 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Miklos Szeredi, Christian König, Matthew Wilcox, linux-mm,
	Linux Kernel Mailing List, linux-fsdevel, linux-unionfs

On Wed, Aug 11, 2021 at 4:45 AM David Hildenbrand <david@redhat.com> wrote:
>
> I proposed a while ago to get rid of VM_DENYWRITE completely:
>
> https://lkml.kernel.org/r/20210423131640.20080-1-david@redhat.com
>
> I haven't looked how much it still applies to current upstream, but
> maybe that might help cleaning up that code.

I like it.

I agree that we could - and probably should - just do it this way.

We don't expose MAP_DENYWRITE to user space any more - and the old
legacy library loading code certainly isn't worth it - and so
effectively the only way to set it is with execve().

And yes, it gets rid of all the silly games with the per-mapping flags.

               Linus

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mmap denywrite mess (Was: [GIT PULL] overlayfs fixes for 5.14-rc6)
  2021-08-11 16:20   ` Linus Torvalds
@ 2021-08-12  7:12     ` David Hildenbrand
  0 siblings, 0 replies; 4+ messages in thread
From: David Hildenbrand @ 2021-08-12  7:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Miklos Szeredi, Christian König, Matthew Wilcox, linux-mm,
	Linux Kernel Mailing List, linux-fsdevel, linux-unionfs

On 11.08.21 18:20, Linus Torvalds wrote:
> On Wed, Aug 11, 2021 at 4:45 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> I proposed a while ago to get rid of VM_DENYWRITE completely:
>>
>> https://lkml.kernel.org/r/20210423131640.20080-1-david@redhat.com
>>
>> I haven't looked how much it still applies to current upstream, but
>> maybe that might help cleaning up that code.
> 
> I like it.
> 
> I agree that we could - and probably should - just do it this way.
> 
> We don't expose MAP_DENYWRITE to user space any more - and the old
> legacy library loading code certainly isn't worth it - and so
> effectively the only way to set it is with execve().
> 
> And yes, it gets rid of all the silly games with the per-mapping flags.

I'll rebase, retest and resend, putting you on cc. Then we can discuss 
if/how/when we might want to go that path.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-08-12  7:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-11 14:11 mmap denywrite mess (Was: [GIT PULL] overlayfs fixes for 5.14-rc6) Miklos Szeredi
2021-08-11 14:45 ` David Hildenbrand
2021-08-11 16:20   ` Linus Torvalds
2021-08-12  7:12     ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).