linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] A couple hugetlbfs fixes
@ 2019-03-08 22:48 Mike Kravetz
  2019-03-08 22:48 ` [PATCH 1/2] huegtlbfs: on restore reserve error path retain subpool reservation Mike Kravetz
  2019-03-08 22:48 ` [PATCH 2/2] hugetlb: use same fault hash key for shared and private mappings Mike Kravetz
  0 siblings, 2 replies; 4+ messages in thread
From: Mike Kravetz @ 2019-03-08 22:48 UTC (permalink / raw)
  To: linux-mm, linux-kernel, Davidlohr Bueso
  Cc: Joonsoo Kim, Michal Hocko, Naoya Horiguchi, Kirill A . Shutemov,
	Andrew Morton, Mike Kravetz

I stumbled on these two hugetlbfs issues while looking at other things:
- The 'restore reserve' functionality at page free time should not
  be adjusting subpool counts.
- A BUG can be triggered (not easily) due to temporarily mapping a
  page before doing a COW.

Both are described in detail in the commit message of the patches.
I would appreciate comments from Davidlohr Bueso as one patch is
directly related to code he added in commit 8382d914ebf7.

I did not cc stable as the first problem has been around since reserves
were added to hugetlbfs and nobody has noticed.  The second is very hard
to hit/reproduce.

Mike Kravetz (2):
  huegtlbfs: on restore reserve error path retain subpool reservation
  hugetlb: use same fault hash key for shared and private mappings

 mm/hugetlb.c | 30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] huegtlbfs: on restore reserve error path retain subpool reservation
  2019-03-08 22:48 [PATCH 0/2] A couple hugetlbfs fixes Mike Kravetz
@ 2019-03-08 22:48 ` Mike Kravetz
  2019-03-08 22:48 ` [PATCH 2/2] hugetlb: use same fault hash key for shared and private mappings Mike Kravetz
  1 sibling, 0 replies; 4+ messages in thread
From: Mike Kravetz @ 2019-03-08 22:48 UTC (permalink / raw)
  To: linux-mm, linux-kernel, Davidlohr Bueso
  Cc: Joonsoo Kim, Michal Hocko, Naoya Horiguchi, Kirill A . Shutemov,
	Andrew Morton, Mike Kravetz

When a huge page is allocated, PagePrivate() is set if the allocation
consumed a reservation.  When freeing a huge page, PagePrivate is checked.
If set, it indicates the reservation should be restored.  PagePrivate
being set at free huge page time mostly happens on error paths.

When huge page reservations are created, a check is made to determine if
the mapping is associated with an explicitly mounted filesystem.  If so,
pages are also reserved within the filesystem.  The default action when
freeing a huge page is to decrement the usage count in any associated
explicitly mounted filesystem.  However, if the reservation is to be
restored the reservation/use count within the filesystem should not be
decrementd.  Otherwise, a subsequent page allocation and free for the same
mapping location will cause the file filesystem usage to go 'negative'.

Filesystem                         Size  Used Avail Use% Mounted on
nodev                              4.0G -4.0M  4.1G    - /opt/hugepool

To fix, when freeing a huge page do not adjust filesystem usage if
PagePrivate() is set to indicate the reservation should be restored.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/hugetlb.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8dfdffc34a99..64ef640126cd 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1257,12 +1257,23 @@ void free_huge_page(struct page *page)
 	ClearPagePrivate(page);
 
 	/*
-	 * A return code of zero implies that the subpool will be under its
-	 * minimum size if the reservation is not restored after page is free.
-	 * Therefore, force restore_reserve operation.
+	 * If PagePrivate() was set on page, page allocation consumed a
+	 * reservation.  If the page was associated with a subpool, there
+	 * would have been a page reserved in the subpool before allocation
+	 * via hugepage_subpool_get_pages().  Since we are 'restoring' the
+	 * reservtion, do not call hugepage_subpool_put_pages() as this will
+	 * remove the reserved page from the subpool.
 	 */
-	if (hugepage_subpool_put_pages(spool, 1) == 0)
-		restore_reserve = true;
+	if (!restore_reserve) {
+		/*
+		 * A return code of zero implies that the subpool will be
+		 * under its minimum size if the reservation is not restored
+		 * after page is free.  Therefore, force restore_reserve
+		 * operation.
+		 */
+		if (hugepage_subpool_put_pages(spool, 1) == 0)
+			restore_reserve = true;
+	}
 
 	spin_lock(&hugetlb_lock);
 	clear_page_huge_active(page);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] hugetlb: use same fault hash key for shared and private mappings
  2019-03-08 22:48 [PATCH 0/2] A couple hugetlbfs fixes Mike Kravetz
  2019-03-08 22:48 ` [PATCH 1/2] huegtlbfs: on restore reserve error path retain subpool reservation Mike Kravetz
@ 2019-03-08 22:48 ` Mike Kravetz
  2019-03-08 23:08   ` Mike Kravetz
  1 sibling, 1 reply; 4+ messages in thread
From: Mike Kravetz @ 2019-03-08 22:48 UTC (permalink / raw)
  To: linux-mm, linux-kernel, Davidlohr Bueso
  Cc: Joonsoo Kim, Michal Hocko, Naoya Horiguchi, Kirill A . Shutemov,
	Andrew Morton, Mike Kravetz

hugetlb uses a fault mutex hash table to prevent page faults of the
same pages concurrently.  The key for shared and private mappings is
different.  Shared keys off address_space and file index.  Private
keys off mm and virtual address.  Consider a private mappings of a
populated hugetlbfs file.  A write fault will first map the page from
the file and then do a COW to map a writable page.

Hugetlbfs hole punch uses the fault mutex to prevent mappings of file
pages.  It uses the address_space file index key.  However, private
mappings will use a different key and could temporarily map the file
page before COW.  This causes problems (BUG) for the hole punch code
as it expects the mutex to prevent additional uses/mappings of the page.

There seems to be another potential COW issue/race with this approach
of different private and shared keys as notes in commit 8382d914ebf7
("mm, hugetlb: improve page-fault scalability").

Since every hugetlb mapping (even anon and private) is actually a file
mapping, just use the address_space index key for all mappings.  This
results in potentially more hash collisions.  However, this should not
be the common case.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/hugetlb.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 64ef640126cd..0527732c71f0 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3904,13 +3904,8 @@ u32 hugetlb_fault_mutex_hash(struct hstate *h, struct mm_struct *mm,
 	unsigned long key[2];
 	u32 hash;
 
-	if (vma->vm_flags & VM_SHARED) {
-		key[0] = (unsigned long) mapping;
-		key[1] = idx;
-	} else {
-		key[0] = (unsigned long) mm;
-		key[1] = address >> huge_page_shift(h);
-	}
+	key[0] = (unsigned long) mapping;
+	key[1] = idx;
 
 	hash = jhash2((u32 *)&key, sizeof(key)/sizeof(u32), 0);
 
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] hugetlb: use same fault hash key for shared and private mappings
  2019-03-08 22:48 ` [PATCH 2/2] hugetlb: use same fault hash key for shared and private mappings Mike Kravetz
@ 2019-03-08 23:08   ` Mike Kravetz
  0 siblings, 0 replies; 4+ messages in thread
From: Mike Kravetz @ 2019-03-08 23:08 UTC (permalink / raw)
  To: linux-mm, linux-kernel, Davidlohr Bueso
  Cc: Joonsoo Kim, Michal Hocko, Naoya Horiguchi, Kirill A . Shutemov,
	Andrew Morton

On 3/8/19 2:48 PM, Mike Kravetz wrote:
>  mm/hugetlb.c | 9 ++-------
>  1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 64ef640126cd..0527732c71f0 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3904,13 +3904,8 @@ u32 hugetlb_fault_mutex_hash(struct hstate *h, struct mm_struct *mm,
>  	unsigned long key[2];
>  	u32 hash;
>  
> -	if (vma->vm_flags & VM_SHARED) {
> -		key[0] = (unsigned long) mapping;
> -		key[1] = idx;
> -	} else {
> -		key[0] = (unsigned long) mm;
> -		key[1] = address >> huge_page_shift(h);
> -	}
> +	key[0] = (unsigned long) mapping;
> +	key[1] = idx;
>  
>  	hash = jhash2((u32 *)&key, sizeof(key)/sizeof(u32), 0);

Duh!

If we no longer use mm and address they can be dropped from the function
arguments and all callers.  Before doing that, let's see if there is any
objection to using the same key for shared and private mappings.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-03-08 23:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-08 22:48 [PATCH 0/2] A couple hugetlbfs fixes Mike Kravetz
2019-03-08 22:48 ` [PATCH 1/2] huegtlbfs: on restore reserve error path retain subpool reservation Mike Kravetz
2019-03-08 22:48 ` [PATCH 2/2] hugetlb: use same fault hash key for shared and private mappings Mike Kravetz
2019-03-08 23:08   ` Mike Kravetz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).