linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] hugetlbfs: Take read_lock on i_mmap for PMD sharing
@ 2019-11-07 19:06 Waiman Long
  2019-11-07 19:13 ` Waiman Long
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Waiman Long @ 2019-11-07 19:06 UTC (permalink / raw)
  To: Mike Kravetz, Andrew Morton
  Cc: linux-kernel, linux-mm, Davidlohr Bueso, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Waiman Long

A customer with large SMP systems (up to 16 sockets) with application
that uses large amount of static hugepages (~500-1500GB) are experiencing
random multisecond delays. These delays was caused by the long time it
took to scan the VMA interval tree with mmap_sem held.

The sharing of huge PMD does not require changes to the i_mmap at all.
As a result, we can just take the read lock and let other threads
searching for the right VMA to share in parallel. Once the right
VMA is found, either the PMD lock (2M huge page for x86-64) or the
mm->page_table_lock will be acquired to perform the actual PMD sharing.

Lock contention, if present, will happen in the spinlock. That is much
better than contention in the rwsem where the time needed to scan the
the interval tree is indeterminate.

With this patch applied, the customer is seeing significant improvements
over the unpatched kernel.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/hugetlb.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index b45a95363a84..087e7ff00137 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4842,7 +4842,11 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
 	if (!vma_shareable(vma, addr))
 		return (pte_t *)pmd_alloc(mm, pud, addr);
 
-	i_mmap_lock_write(mapping);
+	/*
+	 * PMD sharing does not require changes to i_mmap. So a read lock
+	 * is enuogh.
+	 */
+	i_mmap_lock_read(mapping);
 	vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) {
 		if (svma == vma)
 			continue;
@@ -4872,7 +4876,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
 	spin_unlock(ptl);
 out:
 	pte = (pte_t *)pmd_alloc(mm, pud, addr);
-	i_mmap_unlock_write(mapping);
+	i_mmap_unlock_read(mapping);
 	return pte;
 }
 
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-11-13  2:55 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-07 19:06 [PATCH] hugetlbfs: Take read_lock on i_mmap for PMD sharing Waiman Long
2019-11-07 19:13 ` Waiman Long
2019-11-07 19:15 ` Waiman Long
2019-11-07 19:31 ` Mike Kravetz
2019-11-07 19:42 ` Matthew Wilcox
2019-11-07 21:06   ` Waiman Long
2019-11-07 19:54 ` Matthew Wilcox
2019-11-07 21:27   ` Waiman Long
2019-11-07 21:49   ` Mike Kravetz
2019-11-07 21:56     ` Mike Kravetz
2019-11-08  2:04     ` Davidlohr Bueso
2019-11-08  3:22       ` Andrew Morton
2019-11-08 19:10       ` Mike Kravetz
2019-11-09  1:47         ` Mike Kravetz
2019-11-12 17:27           ` Waiman Long
2019-11-12 23:11             ` Mike Kravetz
2019-11-13  2:55               ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).