All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Song Liu <songliubraving@fb.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <matthew.wilcox@oracle.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Kernel Team <Kernel-team@fb.com>,
	William Kucharski <william.kucharski@oracle.com>,
	"srikar@linux.vnet.ibm.com" <srikar@linux.vnet.ibm.com>
Subject: Re: [PATCH v12 5/6] khugepaged: enable collapse pmd for pte-mapped THP
Date: Fri, 16 Aug 2019 17:54:43 +0300	[thread overview]
Message-ID: <20190816145443.6ard3iilytc6jlgv@box> (raw)
In-Reply-To: <20190813162451.GD6971@redhat.com>

On Tue, Aug 13, 2019 at 06:24:51PM +0200, Oleg Nesterov wrote:
> > Let me see first that my explanation makes sense :P
> 
> It does ;)

Does it look fine to you? It's on top of Song's patchset.

From 58834d6c1e63321af742b208558a6b5cb86fc7ec Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Date: Fri, 16 Aug 2019 17:50:41 +0300
Subject: [PATCH] khugepaged: Add comments for retract_page_tables()

Oleg Nesterov pointed that logic behind checks in retract_page_tables()
are not obvious.

Add comments to clarify the reasoning for the checks and why they are
safe.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/khugepaged.c | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 5c0a5f0826b2..00cec6a127aa 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1421,7 +1421,22 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
 
 	i_mmap_lock_write(mapping);
 	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
-		/* probably overkill */
+		/*
+		 * Check vma->anon_vma to exclude MAP_PRIVATE mappings that
+		 * got written to. These VMAs are likely not worth investing
+		 * down_write(mmap_sem) as PMD-mapping is likely to be split
+		 * later.
+		 *
+		 * Not that vma->anon_vma check is racy: it can be set up after
+		 * the check but before we took mmap_sem by the fault path.
+		 * But page lock would prevent establishing any new ptes of the
+		 * page, so we are safe.
+		 *
+		 * An alternative would be drop the check, but check that page
+		 * table is clear before calling pmdp_collapse_flush() under
+		 * ptl. It has higher chance to recover THP for the VMA, but
+		 * has higher cost too.
+		 */
 		if (vma->anon_vma)
 			continue;
 		addr = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
@@ -1434,9 +1449,10 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
 			continue;
 		/*
 		 * We need exclusive mmap_sem to retract page table.
-		 * If trylock fails we would end up with pte-mapped THP after
-		 * re-fault. Not ideal, but it's more important to not disturb
-		 * the system too much.
+		 *
+		 * We use trylock due to lock inversion: we need to acquire
+		 * mmap_sem while holding page lock. Fault path does it in
+		 * reverse order. Trylock is a way to avoid deadlock.
 		 */
 		if (down_write_trylock(&vma->vm_mm->mmap_sem)) {
 			spinlock_t *ptl = pmd_lock(vma->vm_mm, pmd);
@@ -1446,8 +1462,10 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
 			up_write(&vma->vm_mm->mmap_sem);
 			mm_dec_nr_ptes(vma->vm_mm);
 			pte_free(vma->vm_mm, pmd_pgtable(_pmd));
-		} else
+		} else {
+			/* Try again later */
 			khugepaged_add_pte_mapped_thp(vma->vm_mm, addr);
+		}
 	}
 	i_mmap_unlock_write(mapping);
 }
-- 
 Kirill A. Shutemov

  reply	other threads:[~2019-08-16 14:54 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-07 23:37 [PATCH v12 0/6] THP aware uprobe Song Liu
2019-08-07 23:37 ` [PATCH v12 1/6] mm: move memcmp_pages() and pages_identical() Song Liu
2019-08-07 23:37 ` [PATCH v12 2/6] uprobe: use original page when all uprobes are removed Song Liu
2019-08-07 23:37 ` [PATCH v12 3/6] mm, thp: introduce FOLL_SPLIT_PMD Song Liu
2019-08-08 16:37   ` Oleg Nesterov
2019-08-08 17:16     ` Song Liu
2019-08-09 16:35       ` Oleg Nesterov
2019-08-09 16:50         ` Song Liu
2019-08-07 23:37 ` [PATCH v12 4/6] uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT Song Liu
2019-08-07 23:37 ` [PATCH v12 5/6] khugepaged: enable collapse pmd for pte-mapped THP Song Liu
2019-08-08 16:33   ` Oleg Nesterov
2019-08-08 17:05     ` Song Liu
2019-08-09 15:24       ` Oleg Nesterov
2019-08-09 16:30         ` Song Liu
2019-08-09 18:01           ` Song Liu
2019-08-12 12:11             ` Kirill A. Shutemov
2019-08-12 13:22               ` Oleg Nesterov
2019-08-12 14:40                 ` Kirill A. Shutemov
2019-08-12 21:04                   ` Song Liu
2019-08-13 14:44                     ` Song Liu
2019-08-15 10:16                       ` Oleg Nesterov
2019-08-15 16:27                         ` Song Liu
2019-08-13 13:30                   ` Oleg Nesterov
2019-08-13 14:05                     ` Oleg Nesterov
2019-08-13 15:05                       ` Kirill A. Shutemov
2019-08-13 16:24                         ` Oleg Nesterov
2019-08-16 14:54                           ` Kirill A. Shutemov [this message]
2019-08-12 13:06             ` Oleg Nesterov
2019-08-12 14:36               ` Song Liu
2019-08-07 23:37 ` [PATCH v12 6/6] uprobe: collapse THP pmd after removing all uprobes Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190816145443.6ard3iilytc6jlgv@box \
    --to=kirill@shutemov.name \
    --cc=Kernel-team@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.wilcox@oracle.com \
    --cc=oleg@redhat.com \
    --cc=songliubraving@fb.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=william.kucharski@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.