linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] hugetlbfs: Disable PMD sharing for large systems
@ 2019-09-11 15:05 Waiman Long
  2019-09-11 15:05 ` [PATCH 1/5] locking/rwsem: Add down_write_timedlock() Waiman Long
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Waiman Long @ 2019-09-11 15:05 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Alexander Viro, Mike Kravetz
  Cc: linux-kernel, linux-fsdevel, linux-mm, Davidlohr Bueso, Waiman Long

A customer with large SMP systems (up to 16 sockets) with application
that uses large amount of static hugepages (~500-1500GB) are experiencing
random multisecond delays. These delays was caused by the long time it
took to scan the VMA interval tree with mmap_sem held.

To fix this problem while perserving existing behavior as much as
possible, we need to allow timeout in down_write() and disabling PMD
sharing when it is taking too long to do so. Since a transaction can
involving touching multiple huge pages, timing out for each of the huge
page interactions does not completely solve the problem. So a threshold
is set to completely disable PMD sharing if too many timeouts happen.

The first 4 patches of this 5-patch series adds a new
down_write_timedlock() API which accepts a timeout argument and return
true is locking is successful or false otherwise. It works more or less
than a down_write_trylock() but the calling thread may sleep.

The last patch implements the timeout mechanism as described above. With
the patched kernel installed, the customer confirmed that the problem
was gone.

Waiman Long (5):
  locking/rwsem: Add down_write_timedlock()
  locking/rwsem: Enable timeout check when spinning on owner
  locking/osq: Allow early break from OSQ
  locking/rwsem: Enable timeout check when staying in the OSQ
  hugetlbfs: Limit wait time when trying to share huge PMD

 include/linux/fs.h                |   7 ++
 include/linux/osq_lock.h          |  13 +--
 include/linux/rwsem.h             |   4 +-
 kernel/locking/lock_events_list.h |   1 +
 kernel/locking/mutex.c            |   2 +-
 kernel/locking/osq_lock.c         |  12 +-
 kernel/locking/rwsem.c            | 183 +++++++++++++++++++++++++-----
 mm/hugetlb.c                      |  24 +++-
 8 files changed, 201 insertions(+), 45 deletions(-)

-- 
2.18.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2019-09-25  8:36 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-11 15:05 [PATCH 0/5] hugetlbfs: Disable PMD sharing for large systems Waiman Long
2019-09-11 15:05 ` [PATCH 1/5] locking/rwsem: Add down_write_timedlock() Waiman Long
2019-09-11 15:05 ` [PATCH 2/5] locking/rwsem: Enable timeout check when spinning on owner Waiman Long
2019-09-11 15:05 ` [PATCH 3/5] locking/osq: Allow early break from OSQ Waiman Long
2019-09-11 15:05 ` [PATCH 4/5] locking/rwsem: Enable timeout check when staying in the OSQ Waiman Long
2019-09-11 15:05 ` [PATCH 5/5] hugetlbfs: Limit wait time when trying to share huge PMD Waiman Long
2019-09-11 15:14   ` Matthew Wilcox
2019-09-11 15:44     ` Waiman Long
2019-09-11 17:03       ` Mike Kravetz
2019-09-11 17:15         ` Waiman Long
2019-09-11 17:22           ` Qian Cai
2019-09-11 17:28           ` Waiman Long
2019-09-11 16:01   ` Qian Cai
2019-09-11 16:34     ` Waiman Long
2019-09-11 19:42       ` Qian Cai
2019-09-11 20:54         ` Waiman Long
2019-09-11 21:57           ` Qian Cai
2019-09-11 19:57   ` Matthew Wilcox
2019-09-11 20:51     ` Waiman Long
2019-09-12  3:26   ` Mike Kravetz
2019-09-12  3:41     ` Matthew Wilcox
2019-09-12  4:40       ` Davidlohr Bueso
2019-09-16 13:53         ` Waiman Long
2019-09-12  9:06     ` Waiman Long
2019-09-12 16:43       ` Mike Kravetz
2019-09-13 18:23         ` Waiman Long
2019-09-13  1:50 ` [PATCH 0/5] hugetlbfs: Disable PMD sharing for large systems Dave Chinner
2019-09-25  8:35   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).