linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs
@ 2021-03-22 21:58 Collin Fijalkovich
  2021-03-22 23:55 ` Song Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Collin Fijalkovich @ 2021-03-22 21:58 UTC (permalink / raw)
  Cc: songliubraving, surenb, hridya, kaleshsingh, hughd, timmurray,
	Collin Fijalkovich, Alexander Viro, Andrew Morton, linux-fsdevel,
	linux-kernel, linux-mm

Transparent huge pages are supported for read-only non-shmem filesystems,
but are only used for vmas with VM_DENYWRITE. This condition ensures that
file THPs are protected from writes while an application is running
(ETXTBSY).  Any existing file THPs are then dropped from the page cache
when a file is opened for write in do_dentry_open(). Since sys_mmap
ignores MAP_DENYWRITE, this constrains the use of file THPs to vmas
produced by execve().

Systems that make heavy use of shared libraries (e.g. Android) are unable
to apply VM_DENYWRITE through the dynamic linker, preventing them from
benefiting from the resultant reduced contention on the TLB.

This patch reduces the constraint on file THPs allowing use with any
executable mapping from a file not opened for write (see
inode_is_open_for_write()). It also introduces additional conditions to
ensure that files opened for write will never be backed by file THPs.

Restricting the use of THPs to executable mappings eliminates the risk that
a read-only file later opened for write would encounter significant
latencies due to page cache truncation.

The ld linker flag '-z max-page-size=(hugepage size)' can be used to
produce executables with the necessary layout. The dynamic linker must
map these file's segments at a hugepage size aligned vma for the mapping to
be backed with THPs.

Signed-off-by: Collin Fijalkovich <cfijalkovich@google.com>
---
 fs/open.c       | 13 +++++++++++--
 mm/khugepaged.c | 16 +++++++++++++++-
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index e53af13b5835..f76e960d10ea 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -852,8 +852,17 @@ static int do_dentry_open(struct file *f,
 	 * XXX: Huge page cache doesn't support writing yet. Drop all page
 	 * cache for this file before processing writes.
 	 */
-	if ((f->f_mode & FMODE_WRITE) && filemap_nr_thps(inode->i_mapping))
-		truncate_pagecache(inode, 0);
+	if (f->f_mode & FMODE_WRITE) {
+		/*
+		 * Paired with smp_mb() in collapse_file() to ensure nr_thps
+		 * is up to date and the update to i_writecount by
+		 * get_write_access() is visible. Ensures subsequent insertion
+		 * of THPs into the page cache will fail.
+		 */
+		smp_mb();
+		if (filemap_nr_thps(inode->i_mapping))
+			truncate_pagecache(inode, 0);
+	}
 
 	return 0;
 
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index a7d6cb912b05..4c7cc877d5e3 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -459,7 +459,8 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
 
 	/* Read-only file mappings need to be aligned for THP to work. */
 	if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
-	    (vm_flags & VM_DENYWRITE)) {
+	    !inode_is_open_for_write(vma->vm_file->f_inode) &&
+	    (vm_flags & VM_EXEC)) {
 		return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
 				HPAGE_PMD_NR);
 	}
@@ -1872,6 +1873,19 @@ static void collapse_file(struct mm_struct *mm,
 	else {
 		__mod_lruvec_page_state(new_page, NR_FILE_THPS, nr);
 		filemap_nr_thps_inc(mapping);
+		/*
+		 * Paired with smp_mb() in do_dentry_open() to ensure
+		 * i_writecount is up to date and the update to nr_thps is
+		 * visible. Ensures the page cache will be truncated if the
+		 * file is opened writable.
+		 */
+		smp_mb();
+		if (inode_is_open_for_write(mapping->host)) {
+			result = SCAN_FAIL;
+			__mod_lruvec_page_state(new_page, NR_FILE_THPS, -nr);
+			filemap_nr_thps_dec(mapping);
+			goto xa_locked;
+		}
 	}
 
 	if (nr_none) {
-- 
2.31.0.rc2.261.g7f71774620-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs
  2021-03-22 21:58 [PATCH] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs Collin Fijalkovich
@ 2021-03-22 23:55 ` Song Liu
       [not found]   ` <CAL+PeoGfCmbMSEYgaJNPHWfLvmmXJGaEM5G6rFstKzhTeY=2yw@mail.gmail.com>
  2021-03-24  3:51 ` William Kucharski
  2021-04-06  0:15 ` Collin Fijalkovich
  2 siblings, 1 reply; 9+ messages in thread
From: Song Liu @ 2021-03-22 23:55 UTC (permalink / raw)
  To: Collin Fijalkovich
  Cc: Song Liu, surenb, hridya, kaleshsingh, Hugh Dickins, timmurray,
	Alexander Viro, Andrew Morton, Linux-Fsdevel, open list,
	Linux-MM

On Mon, Mar 22, 2021 at 3:00 PM Collin Fijalkovich
<cfijalkovich@google.com> wrote:
>
> Transparent huge pages are supported for read-only non-shmem filesystems,
> but are only used for vmas with VM_DENYWRITE. This condition ensures that
> file THPs are protected from writes while an application is running
> (ETXTBSY).  Any existing file THPs are then dropped from the page cache
> when a file is opened for write in do_dentry_open(). Since sys_mmap
> ignores MAP_DENYWRITE, this constrains the use of file THPs to vmas
> produced by execve().
>
> Systems that make heavy use of shared libraries (e.g. Android) are unable
> to apply VM_DENYWRITE through the dynamic linker, preventing them from
> benefiting from the resultant reduced contention on the TLB.
>
> This patch reduces the constraint on file THPs allowing use with any
> executable mapping from a file not opened for write (see
> inode_is_open_for_write()). It also introduces additional conditions to
> ensure that files opened for write will never be backed by file THPs.

Thanks for working on this. We could also use this in many data center
workloads.

Question: when we use this on shared library, the library is still
writable. When the
shared library is opened for write, these pages will refault in as 4kB
pages, right?

Thanks,
Song

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs
  2021-03-22 21:58 [PATCH] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs Collin Fijalkovich
  2021-03-22 23:55 ` Song Liu
@ 2021-03-24  3:51 ` William Kucharski
  2021-04-06  0:15 ` Collin Fijalkovich
  2 siblings, 0 replies; 9+ messages in thread
From: William Kucharski @ 2021-03-24  3:51 UTC (permalink / raw)
  To: Collin Fijalkovich
  Cc: Song Liu, surenb, hridya, kaleshsingh, hughd, timmurray,
	Alexander Viro, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm

I like this, it reminds me of the changes I proposed a few years ago to try
to automatically map read-only text regions of appropriate sizes and
alignment with THPs.

My concern had always been whether commercial software and distro vendors
would buy into supplying the appropriate linker flags when compiling; your
solution is at the very least a good head start down that road.

Matthew Wilcox and I had noticed a lot of ordinary apps such as gcc, Chrome
and Firefox would benefit from such mappings; have you tried building any
of those with the appropriate linker flag to see how they might benefit
from the change?

Thanks,
    -- Bill


> On Mar 22, 2021, at 3:58 PM, Collin Fijalkovich <cfijalkovich@google.com> wrote:
> 
> Transparent huge pages are supported for read-only non-shmem filesystems,
> but are only used for vmas with VM_DENYWRITE. This condition ensures that
> file THPs are protected from writes while an application is running
> (ETXTBSY).  Any existing file THPs are then dropped from the page cache
> when a file is opened for write in do_dentry_open(). Since sys_mmap
> ignores MAP_DENYWRITE, this constrains the use of file THPs to vmas
> produced by execve().
> 
> Systems that make heavy use of shared libraries (e.g. Android) are unable
> to apply VM_DENYWRITE through the dynamic linker, preventing them from
> benefiting from the resultant reduced contention on the TLB.
> 
> This patch reduces the constraint on file THPs allowing use with any
> executable mapping from a file not opened for write (see
> inode_is_open_for_write()). It also introduces additional conditions to
> ensure that files opened for write will never be backed by file THPs.
> 
> Restricting the use of THPs to executable mappings eliminates the risk that
> a read-only file later opened for write would encounter significant
> latencies due to page cache truncation.
> 
> The ld linker flag '-z max-page-size=(hugepage size)' can be used to
> produce executables with the necessary layout. The dynamic linker must
> map these file's segments at a hugepage size aligned vma for the mapping to
> be backed with THPs.
> 
> Signed-off-by: Collin Fijalkovich <cfijalkovich@google.com>
> ---
> fs/open.c       | 13 +++++++++++--
> mm/khugepaged.c | 16 +++++++++++++++-
> 2 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/open.c b/fs/open.c
> index e53af13b5835..f76e960d10ea 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -852,8 +852,17 @@ static int do_dentry_open(struct file *f,
> 	 * XXX: Huge page cache doesn't support writing yet. Drop all page
> 	 * cache for this file before processing writes.
> 	 */
> -	if ((f->f_mode & FMODE_WRITE) && filemap_nr_thps(inode->i_mapping))
> -		truncate_pagecache(inode, 0);
> +	if (f->f_mode & FMODE_WRITE) {
> +		/*
> +		 * Paired with smp_mb() in collapse_file() to ensure nr_thps
> +		 * is up to date and the update to i_writecount by
> +		 * get_write_access() is visible. Ensures subsequent insertion
> +		 * of THPs into the page cache will fail.
> +		 */
> +		smp_mb();
> +		if (filemap_nr_thps(inode->i_mapping))
> +			truncate_pagecache(inode, 0);
> +	}
> 
> 	return 0;
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index a7d6cb912b05..4c7cc877d5e3 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -459,7 +459,8 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
> 
> 	/* Read-only file mappings need to be aligned for THP to work. */
> 	if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
> -	    (vm_flags & VM_DENYWRITE)) {
> +	    !inode_is_open_for_write(vma->vm_file->f_inode) &&
> +	    (vm_flags & VM_EXEC)) {
> 		return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
> 				HPAGE_PMD_NR);
> 	}
> @@ -1872,6 +1873,19 @@ static void collapse_file(struct mm_struct *mm,
> 	else {
> 		__mod_lruvec_page_state(new_page, NR_FILE_THPS, nr);
> 		filemap_nr_thps_inc(mapping);
> +		/*
> +		 * Paired with smp_mb() in do_dentry_open() to ensure
> +		 * i_writecount is up to date and the update to nr_thps is
> +		 * visible. Ensures the page cache will be truncated if the
> +		 * file is opened writable.
> +		 */
> +		smp_mb();
> +		if (inode_is_open_for_write(mapping->host)) {
> +			result = SCAN_FAIL;
> +			__mod_lruvec_page_state(new_page, NR_FILE_THPS, -nr);
> +			filemap_nr_thps_dec(mapping);
> +			goto xa_locked;
> +		}
> 	}
> 
> 	if (nr_none) {
> -- 
> 2.31.0.rc2.261.g7f71774620-goog
> 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs
       [not found]   ` <CAL+PeoGfCmbMSEYgaJNPHWfLvmmXJGaEM5G6rFstKzhTeY=2yw@mail.gmail.com>
@ 2021-03-28 16:45     ` Song Liu
  2021-03-30 20:00       ` Collin Fijalkovich
  0 siblings, 1 reply; 9+ messages in thread
From: Song Liu @ 2021-03-28 16:45 UTC (permalink / raw)
  To: Collin Fijalkovich
  Cc: Song Liu, Suren Baghdasaryan, Hridya Valsaraju, Kalesh Singh,
	Hugh Dickins, Tim Murray, Alexander Viro, Andrew Morton,
	Linux-Fsdevel, open list, Linux-MM



> On Mar 23, 2021, at 10:13 AM, Collin Fijalkovich <cfijalkovich@google.com> wrote:
> 
> Question: when we use this on shared library, the library is still
> writable. When the
> shared library is opened for write, these pages will refault in as 4kB
> pages, right? 
> 
> That's correct, while a file is opened for write it will refault into 4kB pages and block use of THPs. Once all writers complete (i_writecount <=0), the file can fault into THPs again and khugepaged can collapse existing page ranges provided that it can successfully allocate new huge pages.

Will it be a problem if a slow writer (say a slow scp) writes to the 
shared library while the shared library is in use? 

Thanks,
Song

> 
> From,
> Collin 
> 
> On Mon, Mar 22, 2021 at 4:55 PM Song Liu <song@kernel.org> wrote:
> On Mon, Mar 22, 2021 at 3:00 PM Collin Fijalkovich
> <cfijalkovich@google.com> wrote:
> >
> > Transparent huge pages are supported for read-only non-shmem filesystems,
> > but are only used for vmas with VM_DENYWRITE. This condition ensures that
> > file THPs are protected from writes while an application is running
> > (ETXTBSY).  Any existing file THPs are then dropped from the page cache
> > when a file is opened for write in do_dentry_open(). Since sys_mmap
> > ignores MAP_DENYWRITE, this constrains the use of file THPs to vmas
> > produced by execve().
> >
> > Systems that make heavy use of shared libraries (e.g. Android) are unable
> > to apply VM_DENYWRITE through the dynamic linker, preventing them from
> > benefiting from the resultant reduced contention on the TLB.
> >
> > This patch reduces the constraint on file THPs allowing use with any
> > executable mapping from a file not opened for write (see
> > inode_is_open_for_write()). It also introduces additional conditions to
> > ensure that files opened for write will never be backed by file THPs.
> 
> Thanks for working on this. We could also use this in many data center
> workloads.
> 
> Question: when we use this on shared library, the library is still
> writable. When the
> shared library is opened for write, these pages will refault in as 4kB
> pages, right?
> 
> Thanks,
> Song


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs
  2021-03-28 16:45     ` Song Liu
@ 2021-03-30 20:00       ` Collin Fijalkovich
  0 siblings, 0 replies; 9+ messages in thread
From: Collin Fijalkovich @ 2021-03-30 20:00 UTC (permalink / raw)
  To: Song Liu
  Cc: Song Liu, Suren Baghdasaryan, Hridya Valsaraju, Kalesh Singh,
	Hugh Dickins, Tim Murray, Alexander Viro, Andrew Morton,
	Linux-Fsdevel, open list, Linux-MM

There will be an immediate performance hit when the file is opened for
write, as its associated pages are removed from the page cache. While
the writer is present there will be the usual overhead of using 4kB
pages instead of THPs, but there should not be an additional penalty.
It is problematic if a file is repeatedly opened for write, as it will
need to refault each time.

- Collin


On Sun, Mar 28, 2021 at 9:45 AM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> > On Mar 23, 2021, at 10:13 AM, Collin Fijalkovich <cfijalkovich@google.com> wrote:
> >
> > Question: when we use this on shared library, the library is still
> > writable. When the
> > shared library is opened for write, these pages will refault in as 4kB
> > pages, right?
> >
> > That's correct, while a file is opened for write it will refault into 4kB pages and block use of THPs. Once all writers complete (i_writecount <=0), the file can fault into THPs again and khugepaged can collapse existing page ranges provided that it can successfully allocate new huge pages.
>
> Will it be a problem if a slow writer (say a slow scp) writes to the
> shared library while the shared library is in use?
>
> Thanks,
> Song
>
> >
> > From,
> > Collin
> >
> > On Mon, Mar 22, 2021 at 4:55 PM Song Liu <song@kernel.org> wrote:
> > On Mon, Mar 22, 2021 at 3:00 PM Collin Fijalkovich
> > <cfijalkovich@google.com> wrote:
> > >
> > > Transparent huge pages are supported for read-only non-shmem filesystems,
> > > but are only used for vmas with VM_DENYWRITE. This condition ensures that
> > > file THPs are protected from writes while an application is running
> > > (ETXTBSY).  Any existing file THPs are then dropped from the page cache
> > > when a file is opened for write in do_dentry_open(). Since sys_mmap
> > > ignores MAP_DENYWRITE, this constrains the use of file THPs to vmas
> > > produced by execve().
> > >
> > > Systems that make heavy use of shared libraries (e.g. Android) are unable
> > > to apply VM_DENYWRITE through the dynamic linker, preventing them from
> > > benefiting from the resultant reduced contention on the TLB.
> > >
> > > This patch reduces the constraint on file THPs allowing use with any
> > > executable mapping from a file not opened for write (see
> > > inode_is_open_for_write()). It also introduces additional conditions to
> > > ensure that files opened for write will never be backed by file THPs.
> >
> > Thanks for working on this. We could also use this in many data center
> > workloads.
> >
> > Question: when we use this on shared library, the library is still
> > writable. When the
> > shared library is opened for write, these pages will refault in as 4kB
> > pages, right?
> >
> > Thanks,
> > Song
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs
  2021-03-22 21:58 [PATCH] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs Collin Fijalkovich
  2021-03-22 23:55 ` Song Liu
  2021-03-24  3:51 ` William Kucharski
@ 2021-04-06  0:15 ` Collin Fijalkovich
  2021-04-06  2:04   ` William Kucharski
  2 siblings, 1 reply; 9+ messages in thread
From: Collin Fijalkovich @ 2021-04-06  0:15 UTC (permalink / raw)
  Cc: Song Liu, Suren Baghdasaryan, Hridya Valsaraju, Kalesh Singh,
	Hugh Dickins, Tim Murray, Alexander Viro, Andrew Morton,
	Linux-Fsdevel, open list, Linux-MM

v2 has been uploaded with performance testing results:
https://lore.kernel.org/patchwork/patch/1408266/



On Mon, Mar 22, 2021 at 2:59 PM Collin Fijalkovich
<cfijalkovich@google.com> wrote:
>
> Transparent huge pages are supported for read-only non-shmem filesystems,
> but are only used for vmas with VM_DENYWRITE. This condition ensures that
> file THPs are protected from writes while an application is running
> (ETXTBSY).  Any existing file THPs are then dropped from the page cache
> when a file is opened for write in do_dentry_open(). Since sys_mmap
> ignores MAP_DENYWRITE, this constrains the use of file THPs to vmas
> produced by execve().
>
> Systems that make heavy use of shared libraries (e.g. Android) are unable
> to apply VM_DENYWRITE through the dynamic linker, preventing them from
> benefiting from the resultant reduced contention on the TLB.
>
> This patch reduces the constraint on file THPs allowing use with any
> executable mapping from a file not opened for write (see
> inode_is_open_for_write()). It also introduces additional conditions to
> ensure that files opened for write will never be backed by file THPs.
>
> Restricting the use of THPs to executable mappings eliminates the risk that
> a read-only file later opened for write would encounter significant
> latencies due to page cache truncation.
>
> The ld linker flag '-z max-page-size=(hugepage size)' can be used to
> produce executables with the necessary layout. The dynamic linker must
> map these file's segments at a hugepage size aligned vma for the mapping to
> be backed with THPs.
>
> Signed-off-by: Collin Fijalkovich <cfijalkovich@google.com>
> ---
>  fs/open.c       | 13 +++++++++++--
>  mm/khugepaged.c | 16 +++++++++++++++-
>  2 files changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/fs/open.c b/fs/open.c
> index e53af13b5835..f76e960d10ea 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -852,8 +852,17 @@ static int do_dentry_open(struct file *f,
>          * XXX: Huge page cache doesn't support writing yet. Drop all page
>          * cache for this file before processing writes.
>          */
> -       if ((f->f_mode & FMODE_WRITE) && filemap_nr_thps(inode->i_mapping))
> -               truncate_pagecache(inode, 0);
> +       if (f->f_mode & FMODE_WRITE) {
> +               /*
> +                * Paired with smp_mb() in collapse_file() to ensure nr_thps
> +                * is up to date and the update to i_writecount by
> +                * get_write_access() is visible. Ensures subsequent insertion
> +                * of THPs into the page cache will fail.
> +                */
> +               smp_mb();
> +               if (filemap_nr_thps(inode->i_mapping))
> +                       truncate_pagecache(inode, 0);
> +       }
>
>         return 0;
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index a7d6cb912b05..4c7cc877d5e3 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -459,7 +459,8 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
>
>         /* Read-only file mappings need to be aligned for THP to work. */
>         if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
> -           (vm_flags & VM_DENYWRITE)) {
> +           !inode_is_open_for_write(vma->vm_file->f_inode) &&
> +           (vm_flags & VM_EXEC)) {
>                 return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
>                                 HPAGE_PMD_NR);
>         }
> @@ -1872,6 +1873,19 @@ static void collapse_file(struct mm_struct *mm,
>         else {
>                 __mod_lruvec_page_state(new_page, NR_FILE_THPS, nr);
>                 filemap_nr_thps_inc(mapping);
> +               /*
> +                * Paired with smp_mb() in do_dentry_open() to ensure
> +                * i_writecount is up to date and the update to nr_thps is
> +                * visible. Ensures the page cache will be truncated if the
> +                * file is opened writable.
> +                */
> +               smp_mb();
> +               if (inode_is_open_for_write(mapping->host)) {
> +                       result = SCAN_FAIL;
> +                       __mod_lruvec_page_state(new_page, NR_FILE_THPS, -nr);
> +                       filemap_nr_thps_dec(mapping);
> +                       goto xa_locked;
> +               }
>         }
>
>         if (nr_none) {
> --
> 2.31.0.rc2.261.g7f71774620-goog
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs
  2021-04-06  0:15 ` Collin Fijalkovich
@ 2021-04-06  2:04   ` William Kucharski
  2021-04-06 17:48     ` Collin Fijalkovich
  0 siblings, 1 reply; 9+ messages in thread
From: William Kucharski @ 2021-04-06  2:04 UTC (permalink / raw)
  To: Collin Fijalkovich
  Cc: Song Liu, Suren Baghdasaryan, Hridya Valsaraju, Kalesh Singh,
	Hugh Dickins, Tim Murray, Alexander Viro, Andrew Morton,
	Linux-Fsdevel, open list, Linux-MM


I saw a similar change a few years ago with my prototype:

https://lore.kernel.org/linux-mm/5BB682E1-DD52-4AA9-83E9-DEF091E0C709@oracle.com/

the key being a very nice drop in iTLB-load-misses, so it looks like your code is
having the right effect.

What about the call to filemap_nr_thps_dec() in unaccount_page_cache_page() - does
that need an smp_mb() as well?

-- Bill

> On Apr 5, 2021, at 6:15 PM, Collin Fijalkovich <cfijalkovich@google.com> wrote:
> 
> v2 has been uploaded with performance testing results:
> https://lore.kernel.org/patchwork/patch/1408266/
> 
> 
> 
> On Mon, Mar 22, 2021 at 2:59 PM Collin Fijalkovich
> <cfijalkovich@google.com> wrote:
>> 
>> Transparent huge pages are supported for read-only non-shmem filesystems,
>> but are only used for vmas with VM_DENYWRITE. This condition ensures that
>> file THPs are protected from writes while an application is running
>> (ETXTBSY).  Any existing file THPs are then dropped from the page cache
>> when a file is opened for write in do_dentry_open(). Since sys_mmap
>> ignores MAP_DENYWRITE, this constrains the use of file THPs to vmas
>> produced by execve().
>> 
>> Systems that make heavy use of shared libraries (e.g. Android) are unable
>> to apply VM_DENYWRITE through the dynamic linker, preventing them from
>> benefiting from the resultant reduced contention on the TLB.
>> 
>> This patch reduces the constraint on file THPs allowing use with any
>> executable mapping from a file not opened for write (see
>> inode_is_open_for_write()). It also introduces additional conditions to
>> ensure that files opened for write will never be backed by file THPs.
>> 
>> Restricting the use of THPs to executable mappings eliminates the risk that
>> a read-only file later opened for write would encounter significant
>> latencies due to page cache truncation.
>> 
>> The ld linker flag '-z max-page-size=(hugepage size)' can be used to
>> produce executables with the necessary layout. The dynamic linker must
>> map these file's segments at a hugepage size aligned vma for the mapping to
>> be backed with THPs.
>> 
>> Signed-off-by: Collin Fijalkovich <cfijalkovich@google.com>
>> ---
>> fs/open.c       | 13 +++++++++++--
>> mm/khugepaged.c | 16 +++++++++++++++-
>> 2 files changed, 26 insertions(+), 3 deletions(-)
>> 
>> diff --git a/fs/open.c b/fs/open.c
>> index e53af13b5835..f76e960d10ea 100644
>> --- a/fs/open.c
>> +++ b/fs/open.c
>> @@ -852,8 +852,17 @@ static int do_dentry_open(struct file *f,
>>         * XXX: Huge page cache doesn't support writing yet. Drop all page
>>         * cache for this file before processing writes.
>>         */
>> -       if ((f->f_mode & FMODE_WRITE) && filemap_nr_thps(inode->i_mapping))
>> -               truncate_pagecache(inode, 0);
>> +       if (f->f_mode & FMODE_WRITE) {
>> +               /*
>> +                * Paired with smp_mb() in collapse_file() to ensure nr_thps
>> +                * is up to date and the update to i_writecount by
>> +                * get_write_access() is visible. Ensures subsequent insertion
>> +                * of THPs into the page cache will fail.
>> +                */
>> +               smp_mb();
>> +               if (filemap_nr_thps(inode->i_mapping))
>> +                       truncate_pagecache(inode, 0);
>> +       }
>> 
>>        return 0;
>> 
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index a7d6cb912b05..4c7cc877d5e3 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -459,7 +459,8 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
>> 
>>        /* Read-only file mappings need to be aligned for THP to work. */
>>        if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
>> -           (vm_flags & VM_DENYWRITE)) {
>> +           !inode_is_open_for_write(vma->vm_file->f_inode) &&
>> +           (vm_flags & VM_EXEC)) {
>>                return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
>>                                HPAGE_PMD_NR);
>>        }
>> @@ -1872,6 +1873,19 @@ static void collapse_file(struct mm_struct *mm,
>>        else {
>>                __mod_lruvec_page_state(new_page, NR_FILE_THPS, nr);
>>                filemap_nr_thps_inc(mapping);
>> +               /*
>> +                * Paired with smp_mb() in do_dentry_open() to ensure
>> +                * i_writecount is up to date and the update to nr_thps is
>> +                * visible. Ensures the page cache will be truncated if the
>> +                * file is opened writable.
>> +                */
>> +               smp_mb();
>> +               if (inode_is_open_for_write(mapping->host)) {
>> +                       result = SCAN_FAIL;
>> +                       __mod_lruvec_page_state(new_page, NR_FILE_THPS, -nr);
>> +                       filemap_nr_thps_dec(mapping);
>> +                       goto xa_locked;
>> +               }
>>        }
>> 
>>        if (nr_none) {
>> --
>> 2.31.0.rc2.261.g7f71774620-goog
>> 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs
  2021-04-06  2:04   ` William Kucharski
@ 2021-04-06 17:48     ` Collin Fijalkovich
  2021-04-06 22:26       ` William Kucharski
  0 siblings, 1 reply; 9+ messages in thread
From: Collin Fijalkovich @ 2021-04-06 17:48 UTC (permalink / raw)
  To: William Kucharski
  Cc: Song Liu, Suren Baghdasaryan, Hridya Valsaraju, Kalesh Singh,
	Hugh Dickins, Tim Murray, Alexander Viro, Andrew Morton,
	Linux-Fsdevel, open list, Linux-MM

Instrumenting filemap_nr_thps_inc() should be sufficient for ensuring
writable file mappings will not be THP-backed.

If filemap_nr_thps_dec() in unaccount_page_cache_page() and
filemap_nr_thps() in do_dentry_open() race, the worst case is an
unnecessary truncation. We could introduce a memory barrier in
unaccount_page_cache_page(), but I'm unsure it would significantly
mitigate the risk of spurious truncations. Barring synchronization
between do_dentry_open() and ongoing page cache operations, there
could be an in-progress delete_from_page_cache_batch() that has not
yet updated accounting for THPs in its targeted range.

-- Collin

On Mon, Apr 5, 2021 at 7:05 PM William Kucharski
<william.kucharski@oracle.com> wrote:
>
>
> I saw a similar change a few years ago with my prototype:
>
> https://lore.kernel.org/linux-mm/5BB682E1-DD52-4AA9-83E9-DEF091E0C709@oracle.com/
>
> the key being a very nice drop in iTLB-load-misses, so it looks like your code is
> having the right effect.
>
> What about the call to filemap_nr_thps_dec() in unaccount_page_cache_page() - does
> that need an smp_mb() as well?
>
> -- Bill
>
> > On Apr 5, 2021, at 6:15 PM, Collin Fijalkovich <cfijalkovich@google.com> wrote:
> >
> > v2 has been uploaded with performance testing results:
> > https://lore.kernel.org/patchwork/patch/1408266/
> >
> >
> >
> > On Mon, Mar 22, 2021 at 2:59 PM Collin Fijalkovich
> > <cfijalkovich@google.com> wrote:
> >>
> >> Transparent huge pages are supported for read-only non-shmem filesystems,
> >> but are only used for vmas with VM_DENYWRITE. This condition ensures that
> >> file THPs are protected from writes while an application is running
> >> (ETXTBSY).  Any existing file THPs are then dropped from the page cache
> >> when a file is opened for write in do_dentry_open(). Since sys_mmap
> >> ignores MAP_DENYWRITE, this constrains the use of file THPs to vmas
> >> produced by execve().
> >>
> >> Systems that make heavy use of shared libraries (e.g. Android) are unable
> >> to apply VM_DENYWRITE through the dynamic linker, preventing them from
> >> benefiting from the resultant reduced contention on the TLB.
> >>
> >> This patch reduces the constraint on file THPs allowing use with any
> >> executable mapping from a file not opened for write (see
> >> inode_is_open_for_write()). It also introduces additional conditions to
> >> ensure that files opened for write will never be backed by file THPs.
> >>
> >> Restricting the use of THPs to executable mappings eliminates the risk that
> >> a read-only file later opened for write would encounter significant
> >> latencies due to page cache truncation.
> >>
> >> The ld linker flag '-z max-page-size=(hugepage size)' can be used to
> >> produce executables with the necessary layout. The dynamic linker must
> >> map these file's segments at a hugepage size aligned vma for the mapping to
> >> be backed with THPs.
> >>
> >> Signed-off-by: Collin Fijalkovich <cfijalkovich@google.com>
> >> ---
> >> fs/open.c       | 13 +++++++++++--
> >> mm/khugepaged.c | 16 +++++++++++++++-
> >> 2 files changed, 26 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/fs/open.c b/fs/open.c
> >> index e53af13b5835..f76e960d10ea 100644
> >> --- a/fs/open.c
> >> +++ b/fs/open.c
> >> @@ -852,8 +852,17 @@ static int do_dentry_open(struct file *f,
> >>         * XXX: Huge page cache doesn't support writing yet. Drop all page
> >>         * cache for this file before processing writes.
> >>         */
> >> -       if ((f->f_mode & FMODE_WRITE) && filemap_nr_thps(inode->i_mapping))
> >> -               truncate_pagecache(inode, 0);
> >> +       if (f->f_mode & FMODE_WRITE) {
> >> +               /*
> >> +                * Paired with smp_mb() in collapse_file() to ensure nr_thps
> >> +                * is up to date and the update to i_writecount by
> >> +                * get_write_access() is visible. Ensures subsequent insertion
> >> +                * of THPs into the page cache will fail.
> >> +                */
> >> +               smp_mb();
> >> +               if (filemap_nr_thps(inode->i_mapping))
> >> +                       truncate_pagecache(inode, 0);
> >> +       }
> >>
> >>        return 0;
> >>
> >> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> >> index a7d6cb912b05..4c7cc877d5e3 100644
> >> --- a/mm/khugepaged.c
> >> +++ b/mm/khugepaged.c
> >> @@ -459,7 +459,8 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
> >>
> >>        /* Read-only file mappings need to be aligned for THP to work. */
> >>        if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
> >> -           (vm_flags & VM_DENYWRITE)) {
> >> +           !inode_is_open_for_write(vma->vm_file->f_inode) &&
> >> +           (vm_flags & VM_EXEC)) {
> >>                return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
> >>                                HPAGE_PMD_NR);
> >>        }
> >> @@ -1872,6 +1873,19 @@ static void collapse_file(struct mm_struct *mm,
> >>        else {
> >>                __mod_lruvec_page_state(new_page, NR_FILE_THPS, nr);
> >>                filemap_nr_thps_inc(mapping);
> >> +               /*
> >> +                * Paired with smp_mb() in do_dentry_open() to ensure
> >> +                * i_writecount is up to date and the update to nr_thps is
> >> +                * visible. Ensures the page cache will be truncated if the
> >> +                * file is opened writable.
> >> +                */
> >> +               smp_mb();
> >> +               if (inode_is_open_for_write(mapping->host)) {
> >> +                       result = SCAN_FAIL;
> >> +                       __mod_lruvec_page_state(new_page, NR_FILE_THPS, -nr);
> >> +                       filemap_nr_thps_dec(mapping);
> >> +                       goto xa_locked;
> >> +               }
> >>        }
> >>
> >>        if (nr_none) {
> >> --
> >> 2.31.0.rc2.261.g7f71774620-goog
> >>
> >
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs
  2021-04-06 17:48     ` Collin Fijalkovich
@ 2021-04-06 22:26       ` William Kucharski
  0 siblings, 0 replies; 9+ messages in thread
From: William Kucharski @ 2021-04-06 22:26 UTC (permalink / raw)
  To: Collin Fijalkovich
  Cc: Song Liu, Suren Baghdasaryan, Hridya Valsaraju, Kalesh Singh,
	Hugh Dickins, Tim Murray, Alexander Viro, Andrew Morton,
	Linux-Fsdevel, open list, Linux-MM

Sounds good.

Reviewed-by: William Kucharski <william.kucharski@oracle.com>

> On Apr 6, 2021, at 11:48 AM, Collin Fijalkovich <cfijalkovich@google.com> wrote:
> 
> Instrumenting filemap_nr_thps_inc() should be sufficient for ensuring
> writable file mappings will not be THP-backed.
> 
> If filemap_nr_thps_dec() in unaccount_page_cache_page() and
> filemap_nr_thps() in do_dentry_open() race, the worst case is an
> unnecessary truncation. We could introduce a memory barrier in
> unaccount_page_cache_page(), but I'm unsure it would significantly
> mitigate the risk of spurious truncations. Barring synchronization
> between do_dentry_open() and ongoing page cache operations, there
> could be an in-progress delete_from_page_cache_batch() that has not
> yet updated accounting for THPs in its targeted range.
> 
> -- Collin
> 
> On Mon, Apr 5, 2021 at 7:05 PM William Kucharski
> <william.kucharski@oracle.com> wrote:
>> 
>> 
>> I saw a similar change a few years ago with my prototype:
>> 
>> https://lore.kernel.org/linux-mm/5BB682E1-DD52-4AA9-83E9-DEF091E0C709@oracle.com/
>> 
>> the key being a very nice drop in iTLB-load-misses, so it looks like your code is
>> having the right effect.
>> 
>> What about the call to filemap_nr_thps_dec() in unaccount_page_cache_page() - does
>> that need an smp_mb() as well?
>> 
>> -- Bill
>> 
>>> On Apr 5, 2021, at 6:15 PM, Collin Fijalkovich <cfijalkovich@google.com> wrote:
>>> 
>>> v2 has been uploaded with performance testing results:
>>> https://lore.kernel.org/patchwork/patch/1408266/
>>> 
>>> 
>>> 
>>> On Mon, Mar 22, 2021 at 2:59 PM Collin Fijalkovich
>>> <cfijalkovich@google.com> wrote:
>>>> 
>>>> Transparent huge pages are supported for read-only non-shmem filesystems,
>>>> but are only used for vmas with VM_DENYWRITE. This condition ensures that
>>>> file THPs are protected from writes while an application is running
>>>> (ETXTBSY).  Any existing file THPs are then dropped from the page cache
>>>> when a file is opened for write in do_dentry_open(). Since sys_mmap
>>>> ignores MAP_DENYWRITE, this constrains the use of file THPs to vmas
>>>> produced by execve().
>>>> 
>>>> Systems that make heavy use of shared libraries (e.g. Android) are unable
>>>> to apply VM_DENYWRITE through the dynamic linker, preventing them from
>>>> benefiting from the resultant reduced contention on the TLB.
>>>> 
>>>> This patch reduces the constraint on file THPs allowing use with any
>>>> executable mapping from a file not opened for write (see
>>>> inode_is_open_for_write()). It also introduces additional conditions to
>>>> ensure that files opened for write will never be backed by file THPs.
>>>> 
>>>> Restricting the use of THPs to executable mappings eliminates the risk that
>>>> a read-only file later opened for write would encounter significant
>>>> latencies due to page cache truncation.
>>>> 
>>>> The ld linker flag '-z max-page-size=(hugepage size)' can be used to
>>>> produce executables with the necessary layout. The dynamic linker must
>>>> map these file's segments at a hugepage size aligned vma for the mapping to
>>>> be backed with THPs.
>>>> 
>>>> Signed-off-by: Collin Fijalkovich <cfijalkovich@google.com>
>>>> ---
>>>> fs/open.c       | 13 +++++++++++--
>>>> mm/khugepaged.c | 16 +++++++++++++++-
>>>> 2 files changed, 26 insertions(+), 3 deletions(-)
>>>> 
>>>> diff --git a/fs/open.c b/fs/open.c
>>>> index e53af13b5835..f76e960d10ea 100644
>>>> --- a/fs/open.c
>>>> +++ b/fs/open.c
>>>> @@ -852,8 +852,17 @@ static int do_dentry_open(struct file *f,
>>>>        * XXX: Huge page cache doesn't support writing yet. Drop all page
>>>>        * cache for this file before processing writes.
>>>>        */
>>>> -       if ((f->f_mode & FMODE_WRITE) && filemap_nr_thps(inode->i_mapping))
>>>> -               truncate_pagecache(inode, 0);
>>>> +       if (f->f_mode & FMODE_WRITE) {
>>>> +               /*
>>>> +                * Paired with smp_mb() in collapse_file() to ensure nr_thps
>>>> +                * is up to date and the update to i_writecount by
>>>> +                * get_write_access() is visible. Ensures subsequent insertion
>>>> +                * of THPs into the page cache will fail.
>>>> +                */
>>>> +               smp_mb();
>>>> +               if (filemap_nr_thps(inode->i_mapping))
>>>> +                       truncate_pagecache(inode, 0);
>>>> +       }
>>>> 
>>>>       return 0;
>>>> 
>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>> index a7d6cb912b05..4c7cc877d5e3 100644
>>>> --- a/mm/khugepaged.c
>>>> +++ b/mm/khugepaged.c
>>>> @@ -459,7 +459,8 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
>>>> 
>>>>       /* Read-only file mappings need to be aligned for THP to work. */
>>>>       if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
>>>> -           (vm_flags & VM_DENYWRITE)) {
>>>> +           !inode_is_open_for_write(vma->vm_file->f_inode) &&
>>>> +           (vm_flags & VM_EXEC)) {
>>>>               return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
>>>>                               HPAGE_PMD_NR);
>>>>       }
>>>> @@ -1872,6 +1873,19 @@ static void collapse_file(struct mm_struct *mm,
>>>>       else {
>>>>               __mod_lruvec_page_state(new_page, NR_FILE_THPS, nr);
>>>>               filemap_nr_thps_inc(mapping);
>>>> +               /*
>>>> +                * Paired with smp_mb() in do_dentry_open() to ensure
>>>> +                * i_writecount is up to date and the update to nr_thps is
>>>> +                * visible. Ensures the page cache will be truncated if the
>>>> +                * file is opened writable.
>>>> +                */
>>>> +               smp_mb();
>>>> +               if (inode_is_open_for_write(mapping->host)) {
>>>> +                       result = SCAN_FAIL;
>>>> +                       __mod_lruvec_page_state(new_page, NR_FILE_THPS, -nr);
>>>> +                       filemap_nr_thps_dec(mapping);
>>>> +                       goto xa_locked;
>>>> +               }
>>>>       }
>>>> 
>>>>       if (nr_none) {
>>>> --
>>>> 2.31.0.rc2.261.g7f71774620-goog
>>>> 
>>> 
>> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-04-06 22:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-22 21:58 [PATCH] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs Collin Fijalkovich
2021-03-22 23:55 ` Song Liu
     [not found]   ` <CAL+PeoGfCmbMSEYgaJNPHWfLvmmXJGaEM5G6rFstKzhTeY=2yw@mail.gmail.com>
2021-03-28 16:45     ` Song Liu
2021-03-30 20:00       ` Collin Fijalkovich
2021-03-24  3:51 ` William Kucharski
2021-04-06  0:15 ` Collin Fijalkovich
2021-04-06  2:04   ` William Kucharski
2021-04-06 17:48     ` Collin Fijalkovich
2021-04-06 22:26       ` William Kucharski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).