LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] [RFC] kernfs: Allow vm_ops->close() if VMA is never split
@ 2020-07-04 20:56 Richard Weinberger
  2020-07-13  7:42 ` Richard Weinberger
  2020-07-13 12:35 ` Eric W. Biederman
  0 siblings, 2 replies; 3+ messages in thread
From: Richard Weinberger @ 2020-07-04 20:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: tj, gregkh, ebiederm, dan.j.williams, Richard Weinberger

10 years ago commit a6849fa1f7d7 ("sysfs: Fail bin file mmap if vma close is implemented.")
removed support for vm_ops->close() for mmap on sysfs.
As far I understand the reason is that due to the wrapping in kernfs
every VMA split operation needs to be tracked to call vm_ops->close()
for all fragments. This is not feasible with reasonable effort.

Since commit 31383c6865a5 ("mm, hugetlbfs: introduce ->split() to vm_operations_struct")
we can get notified as soon a VMA is split, this can help to relax the restriction.
So I propose to allow having a custom close under the condition that a
VMA cannot get split.

Signed-off-by: Richard Weinberger <richard@nod.at>
---
 fs/kernfs/file.c | 42 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 34 insertions(+), 8 deletions(-)

diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 06b342d8462b..82b09e72acff 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -347,6 +347,38 @@ static void kernfs_vma_open(struct vm_area_struct *vma)
 	kernfs_put_active(of->kn);
 }
 
+static void kernfs_vma_close(struct vm_area_struct *vma)
+{
+	struct file *file = vma->vm_file;
+	struct kernfs_open_file *of = kernfs_of(file);
+
+	if (!of->vm_ops)
+		return;
+
+	if (!kernfs_get_active(of->kn))
+		return;
+
+	if (of->vm_ops->close)
+		of->vm_ops->close(vma);
+
+	kernfs_put_active(of->kn);
+}
+
+static int kernfs_vma_split(struct vm_area_struct *vma, unsigned long addr)
+{
+	struct file *file = vma->vm_file;
+	struct kernfs_open_file *of = kernfs_of(file);
+
+	/*
+	 * We cannot keep track of split operations,
+	 * so refuse splitting a VMA if someone uses close.
+	 */
+	if (of->vm_ops && of->vm_ops->close)
+		return -EINVAL;
+
+	return 0;
+}
+
 static vm_fault_t kernfs_vma_fault(struct vm_fault *vmf)
 {
 	struct file *file = vmf->vma->vm_file;
@@ -457,6 +489,8 @@ static struct mempolicy *kernfs_vma_get_policy(struct vm_area_struct *vma,
 
 static const struct vm_operations_struct kernfs_vm_ops = {
 	.open		= kernfs_vma_open,
+	.close		= kernfs_vma_close,
+	.split		= kernfs_vma_split,
 	.fault		= kernfs_vma_fault,
 	.page_mkwrite	= kernfs_vma_page_mkwrite,
 	.access		= kernfs_vma_access,
@@ -505,14 +539,6 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma)
 	if (of->mmapped && of->vm_ops != vma->vm_ops)
 		goto out_put;
 
-	/*
-	 * It is not possible to successfully wrap close.
-	 * So error if someone is trying to use close.
-	 */
-	rc = -EINVAL;
-	if (vma->vm_ops && vma->vm_ops->close)
-		goto out_put;
-
 	rc = 0;
 	of->mmapped = true;
 	of->vm_ops = vma->vm_ops;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] [RFC] kernfs: Allow vm_ops->close() if VMA is never split
  2020-07-04 20:56 [PATCH] [RFC] kernfs: Allow vm_ops->close() if VMA is never split Richard Weinberger
@ 2020-07-13  7:42 ` Richard Weinberger
  2020-07-13 12:35 ` Eric W. Biederman
  1 sibling, 0 replies; 3+ messages in thread
From: Richard Weinberger @ 2020-07-13  7:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: tj, Greg Kroah-Hartman, ebiederm, dan j williams

----- Ursprüngliche Mail -----
> Von: "richard" <richard@nod.at>
> An: "linux-kernel" <linux-kernel@vger.kernel.org>
> CC: tj@kernel.org, "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>, ebiederm@xmission.com, "dan j williams"
> <dan.j.williams@intel.com>, "richard" <richard@nod.at>
> Gesendet: Samstag, 4. Juli 2020 22:56:19
> Betreff: [PATCH] [RFC] kernfs: Allow vm_ops->close() if VMA is never split

> 10 years ago commit a6849fa1f7d7 ("sysfs: Fail bin file mmap if vma close is
> implemented.")
> removed support for vm_ops->close() for mmap on sysfs.
> As far I understand the reason is that due to the wrapping in kernfs
> every VMA split operation needs to be tracked to call vm_ops->close()
> for all fragments. This is not feasible with reasonable effort.
> 
> Since commit 31383c6865a5 ("mm, hugetlbfs: introduce ->split() to
> vm_operations_struct")
> we can get notified as soon a VMA is split, this can help to relax the
> restriction.
> So I propose to allow having a custom close under the condition that a
> VMA cannot get split.
> 
> Signed-off-by: Richard Weinberger <richard@nod.at>

*friendly ping*

Thanks,
//richard

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] [RFC] kernfs: Allow vm_ops->close() if VMA is never split
  2020-07-04 20:56 [PATCH] [RFC] kernfs: Allow vm_ops->close() if VMA is never split Richard Weinberger
  2020-07-13  7:42 ` Richard Weinberger
@ 2020-07-13 12:35 ` Eric W. Biederman
  1 sibling, 0 replies; 3+ messages in thread
From: Eric W. Biederman @ 2020-07-13 12:35 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-kernel, tj, gregkh, dan.j.williams

Richard Weinberger <richard@nod.at> writes:

> 10 years ago commit a6849fa1f7d7 ("sysfs: Fail bin file mmap if vma close is implemented.")
> removed support for vm_ops->close() for mmap on sysfs.
> As far I understand the reason is that due to the wrapping in kernfs
> every VMA split operation needs to be tracked to call vm_ops->close()
> for all fragments. This is not feasible with reasonable effort.
>
> Since commit 31383c6865a5 ("mm, hugetlbfs: introduce ->split() to vm_operations_struct")
> we can get notified as soon a VMA is split, this can help to relax the restriction.
> So I propose to allow having a custom close under the condition that a
> VMA cannot get split.

The existence of split may help but your patch does not address the
problem.

The difficult to handle case is not split.  Simply wrapping
vma->vm_ops->open and vma->vm_ops->close is enough to handle split.  The
challenge is when the kernel revokes acess to a kernfs file.  Revokes
happen when the module supporting a kernfs file is removed, or when a
piece of hardware is removed.

The code path for that revocation looks like:
	kernfs_remove or kernfs_remove_self
		__kernfs_remove
			kernfs_drain
        			kernfs_drain_open_files
                			unmap_mapping_range


The fundamental problem is that at the time of unmap_mapping_range the
code if it is to implement vma->vm_ops->close properly needs to call
close on all of the vmas.  The code can not wait longer as after
kernfs_remove returns the kernfs code makes the guarantee that the code
implementing vma->vm_ops->close has been removed from the kernel.

The technical challenge is how to implement walking the list of vmas and
call the vma->vops->close method on all of them at revoke time.  Last
I looked taking the locks that are necessary to do that is not safe to
do when holding the locks that are held in the kernfs_remove path.

A lot has changed in kernfs and the mm subsystem since I examined it,
so it might be worth looking at this again.

So in short the problem is that the code does not exist to call close
when the kernfs file is revoked.  Your patch below does not implement
that code.

Eric

>
> Signed-off-by: Richard Weinberger <richard@nod.at>
> ---
>  fs/kernfs/file.c | 42 ++++++++++++++++++++++++++++++++++--------
>  1 file changed, 34 insertions(+), 8 deletions(-)
>
> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> index 06b342d8462b..82b09e72acff 100644
> --- a/fs/kernfs/file.c
> +++ b/fs/kernfs/file.c
> @@ -347,6 +347,38 @@ static void kernfs_vma_open(struct vm_area_struct *vma)
>  	kernfs_put_active(of->kn);
>  }
>  
> +static void kernfs_vma_close(struct vm_area_struct *vma)
> +{
> +	struct file *file = vma->vm_file;
> +	struct kernfs_open_file *of = kernfs_of(file);
> +
> +	if (!of->vm_ops)
> +		return;
> +
> +	if (!kernfs_get_active(of->kn))
> +		return;
> +
> +	if (of->vm_ops->close)
> +		of->vm_ops->close(vma);
> +
> +	kernfs_put_active(of->kn);
> +}
> +
> +static int kernfs_vma_split(struct vm_area_struct *vma, unsigned long addr)
> +{
> +	struct file *file = vma->vm_file;
> +	struct kernfs_open_file *of = kernfs_of(file);
> +
> +	/*
> +	 * We cannot keep track of split operations,
> +	 * so refuse splitting a VMA if someone uses close.
> +	 */
> +	if (of->vm_ops && of->vm_ops->close)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
>  static vm_fault_t kernfs_vma_fault(struct vm_fault *vmf)
>  {
>  	struct file *file = vmf->vma->vm_file;
> @@ -457,6 +489,8 @@ static struct mempolicy *kernfs_vma_get_policy(struct vm_area_struct *vma,
>  
>  static const struct vm_operations_struct kernfs_vm_ops = {
>  	.open		= kernfs_vma_open,
> +	.close		= kernfs_vma_close,
> +	.split		= kernfs_vma_split,
>  	.fault		= kernfs_vma_fault,
>  	.page_mkwrite	= kernfs_vma_page_mkwrite,
>  	.access		= kernfs_vma_access,
> @@ -505,14 +539,6 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma)
>  	if (of->mmapped && of->vm_ops != vma->vm_ops)
>  		goto out_put;
>  
> -	/*
> -	 * It is not possible to successfully wrap close.
> -	 * So error if someone is trying to use close.
> -	 */
> -	rc = -EINVAL;
> -	if (vma->vm_ops && vma->vm_ops->close)
> -		goto out_put;
> -
>  	rc = 0;
>  	of->mmapped = true;
>  	of->vm_ops = vma->vm_ops;

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-04 20:56 [PATCH] [RFC] kernfs: Allow vm_ops->close() if VMA is never split Richard Weinberger
2020-07-13  7:42 ` Richard Weinberger
2020-07-13 12:35 ` Eric W. Biederman

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git