linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Collin Fijalkovich <cfijalkovich@google.com>
To: unlisted-recipients:; (no To-header on input)
Cc: songliubraving@fb.com, surenb@google.com, hridya@google.com,
	kaleshsingh@google.com, hughd@google.com, timmurray@google.com,
	Collin Fijalkovich <cfijalkovich@google.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: [PATCH] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs
Date: Mon, 22 Mar 2021 14:58:23 -0700	[thread overview]
Message-ID: <20210322215823.962758-1-cfijalkovich@google.com> (raw)

Transparent huge pages are supported for read-only non-shmem filesystems,
but are only used for vmas with VM_DENYWRITE. This condition ensures that
file THPs are protected from writes while an application is running
(ETXTBSY).  Any existing file THPs are then dropped from the page cache
when a file is opened for write in do_dentry_open(). Since sys_mmap
ignores MAP_DENYWRITE, this constrains the use of file THPs to vmas
produced by execve().

Systems that make heavy use of shared libraries (e.g. Android) are unable
to apply VM_DENYWRITE through the dynamic linker, preventing them from
benefiting from the resultant reduced contention on the TLB.

This patch reduces the constraint on file THPs allowing use with any
executable mapping from a file not opened for write (see
inode_is_open_for_write()). It also introduces additional conditions to
ensure that files opened for write will never be backed by file THPs.

Restricting the use of THPs to executable mappings eliminates the risk that
a read-only file later opened for write would encounter significant
latencies due to page cache truncation.

The ld linker flag '-z max-page-size=(hugepage size)' can be used to
produce executables with the necessary layout. The dynamic linker must
map these file's segments at a hugepage size aligned vma for the mapping to
be backed with THPs.

Signed-off-by: Collin Fijalkovich <cfijalkovich@google.com>
---
 fs/open.c       | 13 +++++++++++--
 mm/khugepaged.c | 16 +++++++++++++++-
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index e53af13b5835..f76e960d10ea 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -852,8 +852,17 @@ static int do_dentry_open(struct file *f,
 	 * XXX: Huge page cache doesn't support writing yet. Drop all page
 	 * cache for this file before processing writes.
 	 */
-	if ((f->f_mode & FMODE_WRITE) && filemap_nr_thps(inode->i_mapping))
-		truncate_pagecache(inode, 0);
+	if (f->f_mode & FMODE_WRITE) {
+		/*
+		 * Paired with smp_mb() in collapse_file() to ensure nr_thps
+		 * is up to date and the update to i_writecount by
+		 * get_write_access() is visible. Ensures subsequent insertion
+		 * of THPs into the page cache will fail.
+		 */
+		smp_mb();
+		if (filemap_nr_thps(inode->i_mapping))
+			truncate_pagecache(inode, 0);
+	}
 
 	return 0;
 
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index a7d6cb912b05..4c7cc877d5e3 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -459,7 +459,8 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
 
 	/* Read-only file mappings need to be aligned for THP to work. */
 	if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
-	    (vm_flags & VM_DENYWRITE)) {
+	    !inode_is_open_for_write(vma->vm_file->f_inode) &&
+	    (vm_flags & VM_EXEC)) {
 		return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
 				HPAGE_PMD_NR);
 	}
@@ -1872,6 +1873,19 @@ static void collapse_file(struct mm_struct *mm,
 	else {
 		__mod_lruvec_page_state(new_page, NR_FILE_THPS, nr);
 		filemap_nr_thps_inc(mapping);
+		/*
+		 * Paired with smp_mb() in do_dentry_open() to ensure
+		 * i_writecount is up to date and the update to nr_thps is
+		 * visible. Ensures the page cache will be truncated if the
+		 * file is opened writable.
+		 */
+		smp_mb();
+		if (inode_is_open_for_write(mapping->host)) {
+			result = SCAN_FAIL;
+			__mod_lruvec_page_state(new_page, NR_FILE_THPS, -nr);
+			filemap_nr_thps_dec(mapping);
+			goto xa_locked;
+		}
 	}
 
 	if (nr_none) {
-- 
2.31.0.rc2.261.g7f71774620-goog


             reply	other threads:[~2021-03-22 22:00 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-22 21:58 Collin Fijalkovich [this message]
2021-03-22 23:55 ` [PATCH] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs Song Liu
     [not found]   ` <CAL+PeoGfCmbMSEYgaJNPHWfLvmmXJGaEM5G6rFstKzhTeY=2yw@mail.gmail.com>
2021-03-28 16:45     ` Song Liu
2021-03-30 20:00       ` Collin Fijalkovich
2021-03-24  3:51 ` William Kucharski
2021-04-06  0:15 ` Collin Fijalkovich
2021-04-06  2:04   ` William Kucharski
2021-04-06 17:48     ` Collin Fijalkovich
2021-04-06 22:26       ` William Kucharski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210322215823.962758-1-cfijalkovich@google.com \
    --to=cfijalkovich@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=hridya@google.com \
    --cc=hughd@google.com \
    --cc=kaleshsingh@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=songliubraving@fb.com \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).