linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs
@ 2021-04-06  0:09 Collin Fijalkovich
  2021-04-17  4:31 ` Hugh Dickins
  2021-04-22  0:08 ` Song Liu
  0 siblings, 2 replies; 3+ messages in thread
From: Collin Fijalkovich @ 2021-04-06  0:09 UTC (permalink / raw)
  Cc: songliubraving, surenb, hridya, kaleshsingh, hughd, timmurray,
	william.kucharski, akpm, willy, Collin Fijalkovich,
	Alexander Viro, linux-fsdevel, linux-kernel, linux-mm

Transparent huge pages are supported for read-only non-shmem files,
but are only used for vmas with VM_DENYWRITE. This condition ensures that
file THPs are protected from writes while an application is running
(ETXTBSY).  Any existing file THPs are then dropped from the page cache
when a file is opened for write in do_dentry_open(). Since sys_mmap
ignores MAP_DENYWRITE, this constrains the use of file THPs to vmas
produced by execve().

Systems that make heavy use of shared libraries (e.g. Android) are unable
to apply VM_DENYWRITE through the dynamic linker, preventing them from
benefiting from the resultant reduced contention on the TLB.

This patch reduces the constraint on file THPs allowing use with any
executable mapping from a file not opened for write (see
inode_is_open_for_write()). It also introduces additional conditions to
ensure that files opened for write will never be backed by file THPs.

Restricting the use of THPs to executable mappings eliminates the risk that
a read-only file later opened for write would encounter significant
latencies due to page cache truncation.

The ld linker flag '-z max-page-size=(hugepage size)' can be used to
produce executables with the necessary layout. The dynamic linker must
map these file's segments at a hugepage size aligned vma for the mapping to
be backed with THPs.

Comparison of the performance characteristics of 4KB and 2MB-backed
libraries follows; the Android dex2oat tool was used to AOT compile an
example application on a single ARM core.

4KB Pages:
==========

count              event_name            # count / runtime
598,995,035,942    cpu-cycles            # 1.800861 GHz
 81,195,620,851    raw-stall-frontend    # 244.112 M/sec
347,754,466,597    iTLB-loads            # 1.046 G/sec
  2,970,248,900    iTLB-load-misses      # 0.854122% miss rate

Total test time: 332.854998 seconds.

2MB Pages:
==========

count              event_name            # count / runtime
592,872,663,047    cpu-cycles            # 1.800358 GHz
 76,485,624,143    raw-stall-frontend    # 232.261 M/sec
350,478,413,710    iTLB-loads            # 1.064 G/sec
    803,233,322    iTLB-load-misses      # 0.229182% miss rate

Total test time: 329.826087 seconds

A check of /proc/$(pidof dex2oat64)/smaps shows THPs in use:

/apex/com.android.art/lib64/libart.so
FilePmdMapped:      4096 kB

/apex/com.android.art/lib64/libart-compiler.so
FilePmdMapped:      2048 kB

Signed-off-by: Collin Fijalkovich <cfijalkovich@google.com>
---
Changes v1 -> v2:
* commit message 'non-shmem filesystems' -> 'non-shmem files'
* Add performance testing data to commit message

 fs/open.c       | 13 +++++++++++--
 mm/khugepaged.c | 16 +++++++++++++++-
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index e53af13b5835..f76e960d10ea 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -852,8 +852,17 @@ static int do_dentry_open(struct file *f,
 	 * XXX: Huge page cache doesn't support writing yet. Drop all page
 	 * cache for this file before processing writes.
 	 */
-	if ((f->f_mode & FMODE_WRITE) && filemap_nr_thps(inode->i_mapping))
-		truncate_pagecache(inode, 0);
+	if (f->f_mode & FMODE_WRITE) {
+		/*
+		 * Paired with smp_mb() in collapse_file() to ensure nr_thps
+		 * is up to date and the update to i_writecount by
+		 * get_write_access() is visible. Ensures subsequent insertion
+		 * of THPs into the page cache will fail.
+		 */
+		smp_mb();
+		if (filemap_nr_thps(inode->i_mapping))
+			truncate_pagecache(inode, 0);
+	}
 
 	return 0;
 
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index a7d6cb912b05..4c7cc877d5e3 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -459,7 +459,8 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
 
 	/* Read-only file mappings need to be aligned for THP to work. */
 	if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
-	    (vm_flags & VM_DENYWRITE)) {
+	    !inode_is_open_for_write(vma->vm_file->f_inode) &&
+	    (vm_flags & VM_EXEC)) {
 		return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
 				HPAGE_PMD_NR);
 	}
@@ -1872,6 +1873,19 @@ static void collapse_file(struct mm_struct *mm,
 	else {
 		__mod_lruvec_page_state(new_page, NR_FILE_THPS, nr);
 		filemap_nr_thps_inc(mapping);
+		/*
+		 * Paired with smp_mb() in do_dentry_open() to ensure
+		 * i_writecount is up to date and the update to nr_thps is
+		 * visible. Ensures the page cache will be truncated if the
+		 * file is opened writable.
+		 */
+		smp_mb();
+		if (inode_is_open_for_write(mapping->host)) {
+			result = SCAN_FAIL;
+			__mod_lruvec_page_state(new_page, NR_FILE_THPS, -nr);
+			filemap_nr_thps_dec(mapping);
+			goto xa_locked;
+		}
 	}
 
 	if (nr_none) {
-- 
2.31.0.208.g409f899ff0-goog


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs
  2021-04-06  0:09 [PATCH v2] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs Collin Fijalkovich
@ 2021-04-17  4:31 ` Hugh Dickins
  2021-04-22  0:08 ` Song Liu
  1 sibling, 0 replies; 3+ messages in thread
From: Hugh Dickins @ 2021-04-17  4:31 UTC (permalink / raw)
  To: Collin Fijalkovich
  Cc: Andrew Morton, songliubraving, surenb, hridya, kaleshsingh,
	hughd, timmurray, william.kucharski, willy, Alexander Viro,
	linux-fsdevel, linux-kernel, linux-mm

On Mon, 5 Apr 2021, Collin Fijalkovich wrote:

> Transparent huge pages are supported for read-only non-shmem files,
> but are only used for vmas with VM_DENYWRITE. This condition ensures that
> file THPs are protected from writes while an application is running
> (ETXTBSY).  Any existing file THPs are then dropped from the page cache
> when a file is opened for write in do_dentry_open(). Since sys_mmap
> ignores MAP_DENYWRITE, this constrains the use of file THPs to vmas
> produced by execve().
> 
> Systems that make heavy use of shared libraries (e.g. Android) are unable
> to apply VM_DENYWRITE through the dynamic linker, preventing them from
> benefiting from the resultant reduced contention on the TLB.
> 
> This patch reduces the constraint on file THPs allowing use with any
> executable mapping from a file not opened for write (see
> inode_is_open_for_write()). It also introduces additional conditions to
> ensure that files opened for write will never be backed by file THPs.
> 
> Restricting the use of THPs to executable mappings eliminates the risk that
> a read-only file later opened for write would encounter significant
> latencies due to page cache truncation.
> 
> The ld linker flag '-z max-page-size=(hugepage size)' can be used to
> produce executables with the necessary layout. The dynamic linker must
> map these file's segments at a hugepage size aligned vma for the mapping to
> be backed with THPs.
> 
> Comparison of the performance characteristics of 4KB and 2MB-backed
> libraries follows; the Android dex2oat tool was used to AOT compile an
> example application on a single ARM core.
> 
> 4KB Pages:
> ==========
> 
> count              event_name            # count / runtime
> 598,995,035,942    cpu-cycles            # 1.800861 GHz
>  81,195,620,851    raw-stall-frontend    # 244.112 M/sec
> 347,754,466,597    iTLB-loads            # 1.046 G/sec
>   2,970,248,900    iTLB-load-misses      # 0.854122% miss rate
> 
> Total test time: 332.854998 seconds.
> 
> 2MB Pages:
> ==========
> 
> count              event_name            # count / runtime
> 592,872,663,047    cpu-cycles            # 1.800358 GHz
>  76,485,624,143    raw-stall-frontend    # 232.261 M/sec
> 350,478,413,710    iTLB-loads            # 1.064 G/sec
>     803,233,322    iTLB-load-misses      # 0.229182% miss rate
> 
> Total test time: 329.826087 seconds
> 
> A check of /proc/$(pidof dex2oat64)/smaps shows THPs in use:
> 
> /apex/com.android.art/lib64/libart.so
> FilePmdMapped:      4096 kB
> 
> /apex/com.android.art/lib64/libart-compiler.so
> FilePmdMapped:      2048 kB
> 
> Signed-off-by: Collin Fijalkovich <cfijalkovich@google.com>

Acked-by: Hugh Dickins <hughd@google.com>

and you also won

Reviewed-by: William Kucharski <william.kucharski@oracle.com>

in the v1 thread.

I had hoped to see a more dramatic difference in the numbers above,
but I'm a performance naif, and presume other loads and other
libraries may show further benefit.

> ---
> Changes v1 -> v2:
> * commit message 'non-shmem filesystems' -> 'non-shmem files'
> * Add performance testing data to commit message
> 
>  fs/open.c       | 13 +++++++++++--
>  mm/khugepaged.c | 16 +++++++++++++++-
>  2 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/open.c b/fs/open.c
> index e53af13b5835..f76e960d10ea 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -852,8 +852,17 @@ static int do_dentry_open(struct file *f,
>  	 * XXX: Huge page cache doesn't support writing yet. Drop all page
>  	 * cache for this file before processing writes.
>  	 */
> -	if ((f->f_mode & FMODE_WRITE) && filemap_nr_thps(inode->i_mapping))
> -		truncate_pagecache(inode, 0);
> +	if (f->f_mode & FMODE_WRITE) {
> +		/*
> +		 * Paired with smp_mb() in collapse_file() to ensure nr_thps
> +		 * is up to date and the update to i_writecount by
> +		 * get_write_access() is visible. Ensures subsequent insertion
> +		 * of THPs into the page cache will fail.
> +		 */
> +		smp_mb();
> +		if (filemap_nr_thps(inode->i_mapping))
> +			truncate_pagecache(inode, 0);
> +	}
>  
>  	return 0;
>  
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index a7d6cb912b05..4c7cc877d5e3 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -459,7 +459,8 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
>  
>  	/* Read-only file mappings need to be aligned for THP to work. */
>  	if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
> -	    (vm_flags & VM_DENYWRITE)) {
> +	    !inode_is_open_for_write(vma->vm_file->f_inode) &&
> +	    (vm_flags & VM_EXEC)) {
>  		return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
>  				HPAGE_PMD_NR);
>  	}
> @@ -1872,6 +1873,19 @@ static void collapse_file(struct mm_struct *mm,
>  	else {
>  		__mod_lruvec_page_state(new_page, NR_FILE_THPS, nr);
>  		filemap_nr_thps_inc(mapping);
> +		/*
> +		 * Paired with smp_mb() in do_dentry_open() to ensure
> +		 * i_writecount is up to date and the update to nr_thps is
> +		 * visible. Ensures the page cache will be truncated if the
> +		 * file is opened writable.
> +		 */
> +		smp_mb();
> +		if (inode_is_open_for_write(mapping->host)) {
> +			result = SCAN_FAIL;
> +			__mod_lruvec_page_state(new_page, NR_FILE_THPS, -nr);
> +			filemap_nr_thps_dec(mapping);
> +			goto xa_locked;
> +		}
>  	}
>  
>  	if (nr_none) {
> -- 
> 2.31.0.208.g409f899ff0-goog

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs
  2021-04-06  0:09 [PATCH v2] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs Collin Fijalkovich
  2021-04-17  4:31 ` Hugh Dickins
@ 2021-04-22  0:08 ` Song Liu
  1 sibling, 0 replies; 3+ messages in thread
From: Song Liu @ 2021-04-22  0:08 UTC (permalink / raw)
  To: Collin Fijalkovich
  Cc: Suren Baghdasaryan, hridya, kaleshsingh, hughd, timmurray,
	william.kucharski, akpm, willy, Alexander Viro, linux-fsdevel,
	linux-kernel, linux-mm



> On Apr 5, 2021, at 5:09 PM, Collin Fijalkovich <cfijalkovich@google.com> wrote:
> 
> Transparent huge pages are supported for read-only non-shmem files,
> but are only used for vmas with VM_DENYWRITE. This condition ensures that
> file THPs are protected from writes while an application is running
> (ETXTBSY).  Any existing file THPs are then dropped from the page cache
> when a file is opened for write in do_dentry_open(). Since sys_mmap
> ignores MAP_DENYWRITE, this constrains the use of file THPs to vmas
> produced by execve().
> 
> Systems that make heavy use of shared libraries (e.g. Android) are unable
> to apply VM_DENYWRITE through the dynamic linker, preventing them from
> benefiting from the resultant reduced contention on the TLB.
> 
> This patch reduces the constraint on file THPs allowing use with any
> executable mapping from a file not opened for write (see
> inode_is_open_for_write()). It also introduces additional conditions to
> ensure that files opened for write will never be backed by file THPs.
> 
> Restricting the use of THPs to executable mappings eliminates the risk that
> a read-only file later opened for write would encounter significant
> latencies due to page cache truncation.
> 
> The ld linker flag '-z max-page-size=(hugepage size)' can be used to
> produce executables with the necessary layout. The dynamic linker must
> map these file's segments at a hugepage size aligned vma for the mapping to
> be backed with THPs.
> 
> Comparison of the performance characteristics of 4KB and 2MB-backed
> libraries follows; the Android dex2oat tool was used to AOT compile an
> example application on a single ARM core.
> 
> 4KB Pages:
> ==========
> 
> count              event_name            # count / runtime
> 598,995,035,942    cpu-cycles            # 1.800861 GHz
> 81,195,620,851    raw-stall-frontend    # 244.112 M/sec
> 347,754,466,597    iTLB-loads            # 1.046 G/sec
>  2,970,248,900    iTLB-load-misses      # 0.854122% miss rate
> 
> Total test time: 332.854998 seconds.
> 
> 2MB Pages:
> ==========
> 
> count              event_name            # count / runtime
> 592,872,663,047    cpu-cycles            # 1.800358 GHz
> 76,485,624,143    raw-stall-frontend    # 232.261 M/sec
> 350,478,413,710    iTLB-loads            # 1.064 G/sec
>    803,233,322    iTLB-load-misses      # 0.229182% miss rate
> 
> Total test time: 329.826087 seconds
> 
> A check of /proc/$(pidof dex2oat64)/smaps shows THPs in use:
> 
> /apex/com.android.art/lib64/libart.so
> FilePmdMapped:      4096 kB
> 
> /apex/com.android.art/lib64/libart-compiler.so
> FilePmdMapped:      2048 kB
> 
> Signed-off-by: Collin Fijalkovich <cfijalkovich@google.com>

Acked-by: Song Liu <song@kernel.org>

> ---
> Changes v1 -> v2:
> * commit message 'non-shmem filesystems' -> 'non-shmem files'
> * Add performance testing data to commit message
> 
> fs/open.c       | 13 +++++++++++--
> mm/khugepaged.c | 16 +++++++++++++++-
> 2 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/open.c b/fs/open.c
> index e53af13b5835..f76e960d10ea 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -852,8 +852,17 @@ static int do_dentry_open(struct file *f,
> 	 * XXX: Huge page cache doesn't support writing yet. Drop all page
> 	 * cache for this file before processing writes.
> 	 */
> -	if ((f->f_mode & FMODE_WRITE) && filemap_nr_thps(inode->i_mapping))
> -		truncate_pagecache(inode, 0);
> +	if (f->f_mode & FMODE_WRITE) {
> +		/*
> +		 * Paired with smp_mb() in collapse_file() to ensure nr_thps
> +		 * is up to date and the update to i_writecount by
> +		 * get_write_access() is visible. Ensures subsequent insertion
> +		 * of THPs into the page cache will fail.
> +		 */
> +		smp_mb();
> +		if (filemap_nr_thps(inode->i_mapping))
> +			truncate_pagecache(inode, 0);
> +	}
> 
> 	return 0;
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index a7d6cb912b05..4c7cc877d5e3 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -459,7 +459,8 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
> 
> 	/* Read-only file mappings need to be aligned for THP to work. */
> 	if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
> -	    (vm_flags & VM_DENYWRITE)) {
> +	    !inode_is_open_for_write(vma->vm_file->f_inode) &&
> +	    (vm_flags & VM_EXEC)) {
> 		return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
> 				HPAGE_PMD_NR);
> 	}
> @@ -1872,6 +1873,19 @@ static void collapse_file(struct mm_struct *mm,
> 	else {
> 		__mod_lruvec_page_state(new_page, NR_FILE_THPS, nr);
> 		filemap_nr_thps_inc(mapping);
> +		/*
> +		 * Paired with smp_mb() in do_dentry_open() to ensure
> +		 * i_writecount is up to date and the update to nr_thps is
> +		 * visible. Ensures the page cache will be truncated if the
> +		 * file is opened writable.
> +		 */
> +		smp_mb();
> +		if (inode_is_open_for_write(mapping->host)) {
> +			result = SCAN_FAIL;
> +			__mod_lruvec_page_state(new_page, NR_FILE_THPS, -nr);
> +			filemap_nr_thps_dec(mapping);
> +			goto xa_locked;
> +		}
> 	}
> 
> 	if (nr_none) {
> -- 
> 2.31.0.208.g409f899ff0-goog
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-04-22  0:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-06  0:09 [PATCH v2] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs Collin Fijalkovich
2021-04-17  4:31 ` Hugh Dickins
2021-04-22  0:08 ` Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).