linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] hugetlbfs: use i_mmap_rwsem for more synchronization
@ 2020-03-16 20:57 Mike Kravetz
  2020-03-16 20:57 ` [PATCH v2 1/2] hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization Mike Kravetz
  2020-03-16 20:57 ` [PATCH v2 2/2] hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race Mike Kravetz
  0 siblings, 2 replies; 11+ messages in thread
From: Mike Kravetz @ 2020-03-16 20:57 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Hugh Dickins, Naoya Horiguchi, Aneesh Kumar K . V,
	Andrea Arcangeli, Kirill A . Shutemov, Davidlohr Bueso,
	Prakash Sangappa, Andrew Morton, Mike Kravetz

v2
- Fixed a hang that could be reproduced via a ltp test [4].
  Note that the issue was in one of the return paths of one of the
  callers of hugetlb_page_mapping_lock_write which left a huge page
  locked.  The routine hugetlb_page_mapping_lock_write was not modified
  in v2, and is still in need of review/comments.
- Cleaned up warnings produced on powerpc builds [5].

While discussing the issue with huge_pte_offset [1], I remembered that
there were more outstanding hugetlb races.  These issues are:

1) For shared pmds, huge PTE pointers returned by huge_pte_alloc can become
   invalid via a call to huge_pmd_unshare by another thread.
2) hugetlbfs page faults can race with truncation causing invalid global
   reserve counts and state.

A previous attempt was made to use i_mmap_rwsem in this manner as described
at [2].  However, those patches were reverted starting with [3] due to
locking issues.

To effectively use i_mmap_rwsem to address the above issues it needs to
be held (in read mode) during page fault processing.  However, during
fault processing we need to lock the page we will be adding.  Lock
ordering requires we take page lock before i_mmap_rwsem.  Waiting until
after taking the page lock is too late in the fault process for the
synchronization we want to do.

To address this lock ordering issue, the following patches change the
lock ordering for hugetlb pages.  This is not too invasive as hugetlbfs
processing is done separate from core mm in many places.  However, I
don't really like this idea.  Much ugliness is contained in the new
routine hugetlb_page_mapping_lock_write() of patch 1.

The only other way I can think of to address these issues is by catching
all the races.  After catching a race, cleanup, backout, retry ... etc,
as needed.  This can get really ugly, especially for huge page reservations.
At one time, I started writing some of the reservation backout code for
page faults and it got so ugly and complicated I went down the path of
adding synchronization to avoid the races.  Any other suggestions would
be welcome.

[1] https://lore.kernel.org/linux-mm/1582342427-230392-1-git-send-email-longpeng2@huawei.com/
[2] https://lore.kernel.org/linux-mm/20181222223013.22193-1-mike.kravetz@oracle.com/
[3] https://lore.kernel.org/linux-mm/20190103235452.29335-1-mike.kravetz@oracle.com
[4] https://lore.kernel.org/linux-mm/1584028670.7365.182.camel@lca.pw/
[5] https://lore.kernel.org/lkml/20200312183142.108df9ac@canb.auug.org.au/

Mike Kravetz (2):
  hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
  hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race

 fs/hugetlbfs/inode.c    |  30 +++++--
 include/linux/fs.h      |   5 ++
 include/linux/hugetlb.h |   8 ++
 mm/hugetlb.c            | 175 +++++++++++++++++++++++++++++++++++-----
 mm/memory-failure.c     |  29 ++++++-
 mm/migrate.c            |  25 +++++-
 mm/rmap.c               |  17 +++-
 mm/userfaultfd.c        |  11 ++-
 8 files changed, 263 insertions(+), 37 deletions(-)

-- 
2.24.1



^ permalink raw reply	[flat|nested] 11+ messages in thread
* [PATCH v2 0/2] hugetlbfs: use i_mmap_rwsem for better synchronization
@ 2018-12-18 22:35 Mike Kravetz
  2018-12-18 22:35 ` [PATCH v2 1/2] hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization Mike Kravetz
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Kravetz @ 2018-12-18 22:35 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Hugh Dickins, Naoya Horiguchi, Aneesh Kumar K . V,
	Andrea Arcangeli, Kirill A . Shutemov, Davidlohr Bueso,
	Prakash Sangappa, Andrew Morton, Mike Kravetz

There are two primary issues addressed here:
1) For shared pmds, huge PTE pointers returned by huge_pte_alloc can become
   invalid via a call to huge_pmd_unshare by another thread.
2) hugetlbfs page faults can race with truncation causing invalid global
   reserve counts and state.
Both issues are addressed by expanding the use of i_mmap_rwsem.

These issues have existed for a long time.  They can be recreated with a
test program that causes page fault/truncation races.  For simple mappings,
this results in a negative HugePages_Rsvd count.  If racing with mappings
that contain shared pmds, we can hit "BUG at fs/hugetlbfs/inode.c:444!" or
Oops! as the result of an invalid memory reference.

v1 -> v2
  Combined patches 2 and 3 of v1 series as suggested by Aneesh.  No other
  changes were made.
Patches are a follow up to the RFC,
  http://lkml.kernel.org/r/20181024045053.1467-1-mike.kravetz@oracle.com
  Comments made by Naoya were addressed.

Mike Kravetz (2):
  hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
  hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race

 fs/hugetlbfs/inode.c | 50 +++++++++----------------
 mm/hugetlb.c         | 87 +++++++++++++++++++++++++++++++-------------
 mm/memory-failure.c  | 14 ++++++-
 mm/migrate.c         | 13 ++++++-
 mm/rmap.c            |  3 ++
 mm/userfaultfd.c     | 11 +++++-
 6 files changed, 116 insertions(+), 62 deletions(-)

-- 
2.17.2

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-03-31 18:40 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-16 20:57 [PATCH v2 0/2] hugetlbfs: use i_mmap_rwsem for more synchronization Mike Kravetz
2020-03-16 20:57 ` [PATCH v2 1/2] hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization Mike Kravetz
2020-03-30 13:30   ` Naresh Kamboju
2020-03-30 14:01     ` Naresh Kamboju
2020-03-30 23:35       ` Mike Kravetz
2020-03-31 18:40         ` Mike Kravetz
2020-03-16 20:57 ` [PATCH v2 2/2] hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race Mike Kravetz
  -- strict thread matches above, loose matches on Subject: below --
2018-12-18 22:35 [PATCH v2 0/2] hugetlbfs: use i_mmap_rwsem for better synchronization Mike Kravetz
2018-12-18 22:35 ` [PATCH v2 1/2] hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization Mike Kravetz
2018-12-19  1:24   ` Sasha Levin
2018-12-21 10:05   ` Kirill A. Shutemov
2018-12-21 18:20     ` Mike Kravetz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).