linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] Remove assumptions of THP size
@ 2020-09-08 19:55 Matthew Wilcox (Oracle)
  2020-09-08 19:55 ` [PATCH 01/11] mm/filemap: Fix page cache removal for arbitrary sized THPs Matthew Wilcox (Oracle)
                   ` (10 more replies)
  0 siblings, 11 replies; 39+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-08 19:55 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Matthew Wilcox (Oracle), Kirill A . Shutemov, Huang Ying

There are a number of places in the VM which assume that a THP is a PMD
in size.  That's true today, and remains true after this patch series,
but this is a prerequisite for switching to arbitrary-sized THPs.
thp_nr_pages() still returns either HPAGE_PMD_NR or 1, but will be
changed later.

Kirill A. Shutemov (2):
  mm/huge_memory: Fix total_mapcount assumption of page size
  mm/huge_memory: Fix split assumption of page size

Matthew Wilcox (Oracle) (9):
  mm/filemap: Fix page cache removal for arbitrary sized THPs
  mm/memory: Remove page fault assumption of compound page size
  mm/page_owner: Change split_page_owner to take a count
  mm/huge_memory: Fix page_trans_huge_mapcount assumption of THP size
  mm/huge_memory: Fix can_split_huge_page assumption of THP size
  mm/rmap: Fix assumptions of THP size
  mm/truncate: Fix truncation for pages of arbitrary size
  mm/page-writeback: Support tail pages in wait_for_stable_page
  mm/vmscan: Allow arbitrary sized pages to be paged out

 include/linux/page_owner.h |  6 +++---
 mm/filemap.c               |  2 +-
 mm/huge_memory.c           | 32 +++++++++++++++++---------------
 mm/memory.c                |  7 ++++---
 mm/page-writeback.c        |  1 +
 mm/page_owner.c            |  4 ++--
 mm/rmap.c                  | 10 +++++-----
 mm/truncate.c              |  6 +++---
 mm/vmscan.c                |  3 +--
 9 files changed, 37 insertions(+), 34 deletions(-)

-- 
2.28.0



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 01/11] mm/filemap: Fix page cache removal for arbitrary sized THPs
  2020-09-08 19:55 [PATCH 00/11] Remove assumptions of THP size Matthew Wilcox (Oracle)
@ 2020-09-08 19:55 ` Matthew Wilcox (Oracle)
  2020-09-09 14:27   ` Kirill A. Shutemov
  2020-09-15  7:13   ` SeongJae Park
  2020-09-08 19:55 ` [PATCH 02/11] mm/memory: Remove page fault assumption of compound page size Matthew Wilcox (Oracle)
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 39+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-08 19:55 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Matthew Wilcox (Oracle), Kirill A . Shutemov, Huang Ying

page_cache_free_page() assumes THPs are PMD_SIZE; fix that assumption.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/filemap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 1aaea26556cc..c60b94fd74ec 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -249,7 +249,7 @@ static void page_cache_free_page(struct address_space *mapping,
 		freepage(page);
 
 	if (PageTransHuge(page) && !PageHuge(page)) {
-		page_ref_sub(page, HPAGE_PMD_NR);
+		page_ref_sub(page, thp_nr_pages(page));
 		VM_BUG_ON_PAGE(page_count(page) <= 0, page);
 	} else {
 		put_page(page);
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 02/11] mm/memory: Remove page fault assumption of compound page size
  2020-09-08 19:55 [PATCH 00/11] Remove assumptions of THP size Matthew Wilcox (Oracle)
  2020-09-08 19:55 ` [PATCH 01/11] mm/filemap: Fix page cache removal for arbitrary sized THPs Matthew Wilcox (Oracle)
@ 2020-09-08 19:55 ` Matthew Wilcox (Oracle)
  2020-09-09 14:29   ` Kirill A. Shutemov
  2020-09-08 19:55 ` [PATCH 03/11] mm/page_owner: Change split_page_owner to take a count Matthew Wilcox (Oracle)
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-08 19:55 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Matthew Wilcox (Oracle), Kirill A . Shutemov, Huang Ying

A compound page in the page cache will not necessarily be of PMD size,
so check explicitly.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/memory.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 602f4283122f..4b35b4e71e64 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3562,13 +3562,14 @@ static vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
 	unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
 	pmd_t entry;
 	int i;
-	vm_fault_t ret;
+	vm_fault_t ret = VM_FAULT_FALLBACK;
 
 	if (!transhuge_vma_suitable(vma, haddr))
-		return VM_FAULT_FALLBACK;
+		return ret;
 
-	ret = VM_FAULT_FALLBACK;
 	page = compound_head(page);
+	if (page_order(page) != HPAGE_PMD_ORDER)
+		return ret;
 
 	/*
 	 * Archs like ppc64 need additonal space to store information
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 03/11] mm/page_owner: Change split_page_owner to take a count
  2020-09-08 19:55 [PATCH 00/11] Remove assumptions of THP size Matthew Wilcox (Oracle)
  2020-09-08 19:55 ` [PATCH 01/11] mm/filemap: Fix page cache removal for arbitrary sized THPs Matthew Wilcox (Oracle)
  2020-09-08 19:55 ` [PATCH 02/11] mm/memory: Remove page fault assumption of compound page size Matthew Wilcox (Oracle)
@ 2020-09-08 19:55 ` Matthew Wilcox (Oracle)
  2020-09-09 14:42   ` Kirill A. Shutemov
                     ` (2 more replies)
  2020-09-08 19:55 ` [PATCH 04/11] mm/huge_memory: Fix total_mapcount assumption of page size Matthew Wilcox (Oracle)
                   ` (7 subsequent siblings)
  10 siblings, 3 replies; 39+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-08 19:55 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Matthew Wilcox (Oracle), Kirill A . Shutemov, Huang Ying

The implementation of split_page_owner() prefers a count rather than the
old order of the page.  When we support a variable size THP, we won't
have the order at this point, but we will have the number of pages.
So change the interface to what the caller and callee would prefer.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/page_owner.h | 6 +++---
 mm/huge_memory.c           | 2 +-
 mm/page_owner.c            | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
index 8679ccd722e8..3468794f83d2 100644
--- a/include/linux/page_owner.h
+++ b/include/linux/page_owner.h
@@ -11,7 +11,7 @@ extern struct page_ext_operations page_owner_ops;
 extern void __reset_page_owner(struct page *page, unsigned int order);
 extern void __set_page_owner(struct page *page,
 			unsigned int order, gfp_t gfp_mask);
-extern void __split_page_owner(struct page *page, unsigned int order);
+extern void __split_page_owner(struct page *page, unsigned int nr);
 extern void __copy_page_owner(struct page *oldpage, struct page *newpage);
 extern void __set_page_owner_migrate_reason(struct page *page, int reason);
 extern void __dump_page_owner(struct page *page);
@@ -31,10 +31,10 @@ static inline void set_page_owner(struct page *page,
 		__set_page_owner(page, order, gfp_mask);
 }
 
-static inline void split_page_owner(struct page *page, unsigned int order)
+static inline void split_page_owner(struct page *page, unsigned int nr)
 {
 	if (static_branch_unlikely(&page_owner_inited))
-		__split_page_owner(page, order);
+		__split_page_owner(page, nr);
 }
 static inline void copy_page_owner(struct page *oldpage, struct page *newpage)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 2ccff8472cd4..adc5a91d8fd4 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2418,7 +2418,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 
 	ClearPageCompound(head);
 
-	split_page_owner(head, HPAGE_PMD_ORDER);
+	split_page_owner(head, HPAGE_PMD_NR);
 
 	/* See comment in __split_huge_page_tail() */
 	if (PageAnon(head)) {
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 360461509423..4ca3051a1035 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -204,7 +204,7 @@ void __set_page_owner_migrate_reason(struct page *page, int reason)
 	page_owner->last_migrate_reason = reason;
 }
 
-void __split_page_owner(struct page *page, unsigned int order)
+void __split_page_owner(struct page *page, unsigned int nr)
 {
 	int i;
 	struct page_ext *page_ext = lookup_page_ext(page);
@@ -213,7 +213,7 @@ void __split_page_owner(struct page *page, unsigned int order)
 	if (unlikely(!page_ext))
 		return;
 
-	for (i = 0; i < (1 << order); i++) {
+	for (i = 0; i < nr; i++) {
 		page_owner = get_page_owner(page_ext);
 		page_owner->order = 0;
 		page_ext = page_ext_next(page_ext);
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 04/11] mm/huge_memory: Fix total_mapcount assumption of page size
  2020-09-08 19:55 [PATCH 00/11] Remove assumptions of THP size Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2020-09-08 19:55 ` [PATCH 03/11] mm/page_owner: Change split_page_owner to take a count Matthew Wilcox (Oracle)
@ 2020-09-08 19:55 ` Matthew Wilcox (Oracle)
  2020-09-15  7:21   ` SeongJae Park
  2020-09-08 19:55 ` [PATCH 05/11] mm/huge_memory: Fix split " Matthew Wilcox (Oracle)
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-08 19:55 UTC (permalink / raw)
  To: linux-mm, Andrew Morton; +Cc: Kirill A. Shutemov, Huang Ying, Matthew Wilcox

From: "Kirill A. Shutemov" <kirill@shutemov.name>

File THPs may now be of arbitrary order.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/huge_memory.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index adc5a91d8fd4..a882d770a812 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2458,7 +2458,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 
 int total_mapcount(struct page *page)
 {
-	int i, compound, ret;
+	int i, compound, nr, ret;
 
 	VM_BUG_ON_PAGE(PageTail(page), page);
 
@@ -2466,16 +2466,17 @@ int total_mapcount(struct page *page)
 		return atomic_read(&page->_mapcount) + 1;
 
 	compound = compound_mapcount(page);
+	nr = compound_nr(page);
 	if (PageHuge(page))
 		return compound;
 	ret = compound;
-	for (i = 0; i < HPAGE_PMD_NR; i++)
+	for (i = 0; i < nr; i++)
 		ret += atomic_read(&page[i]._mapcount) + 1;
 	/* File pages has compound_mapcount included in _mapcount */
 	if (!PageAnon(page))
-		return ret - compound * HPAGE_PMD_NR;
+		return ret - compound * nr;
 	if (PageDoubleMap(page))
-		ret -= HPAGE_PMD_NR;
+		ret -= nr;
 	return ret;
 }
 
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 05/11] mm/huge_memory: Fix split assumption of page size
  2020-09-08 19:55 [PATCH 00/11] Remove assumptions of THP size Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2020-09-08 19:55 ` [PATCH 04/11] mm/huge_memory: Fix total_mapcount assumption of page size Matthew Wilcox (Oracle)
@ 2020-09-08 19:55 ` Matthew Wilcox (Oracle)
  2020-09-15  7:23   ` SeongJae Park
  2020-09-08 19:55 ` [PATCH 06/11] mm/huge_memory: Fix page_trans_huge_mapcount assumption of THP size Matthew Wilcox (Oracle)
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-08 19:55 UTC (permalink / raw)
  To: linux-mm, Andrew Morton; +Cc: Kirill A. Shutemov, Huang Ying, Matthew Wilcox

From: "Kirill A. Shutemov" <kirill@shutemov.name>

File THPs may now be of arbitrary size, and we can't rely on that size
after doing the split so remember the number of pages before we start
the split.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/huge_memory.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a882d770a812..7bf837c32e3f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2302,13 +2302,13 @@ static void unmap_page(struct page *page)
 	VM_BUG_ON_PAGE(!unmap_success, page);
 }
 
-static void remap_page(struct page *page)
+static void remap_page(struct page *page, unsigned int nr)
 {
 	int i;
 	if (PageTransHuge(page)) {
 		remove_migration_ptes(page, page, true);
 	} else {
-		for (i = 0; i < HPAGE_PMD_NR; i++)
+		for (i = 0; i < nr; i++)
 			remove_migration_ptes(page + i, page + i, true);
 	}
 }
@@ -2383,6 +2383,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 	struct lruvec *lruvec;
 	struct address_space *swap_cache = NULL;
 	unsigned long offset = 0;
+	unsigned int nr = thp_nr_pages(head);
 	int i;
 
 	lruvec = mem_cgroup_page_lruvec(head, pgdat);
@@ -2398,7 +2399,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 		xa_lock(&swap_cache->i_pages);
 	}
 
-	for (i = HPAGE_PMD_NR - 1; i >= 1; i--) {
+	for (i = nr - 1; i >= 1; i--) {
 		__split_huge_page_tail(head, i, lruvec, list);
 		/* Some pages can be beyond i_size: drop them from page cache */
 		if (head[i].index >= end) {
@@ -2418,7 +2419,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 
 	ClearPageCompound(head);
 
-	split_page_owner(head, HPAGE_PMD_NR);
+	split_page_owner(head, nr);
 
 	/* See comment in __split_huge_page_tail() */
 	if (PageAnon(head)) {
@@ -2437,9 +2438,9 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 
 	spin_unlock_irqrestore(&pgdat->lru_lock, flags);
 
-	remap_page(head);
+	remap_page(head, nr);
 
-	for (i = 0; i < HPAGE_PMD_NR; i++) {
+	for (i = 0; i < nr; i++) {
 		struct page *subpage = head + i;
 		if (subpage == page)
 			continue;
@@ -2693,7 +2694,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 fail:		if (mapping)
 			xa_unlock(&mapping->i_pages);
 		spin_unlock_irqrestore(&pgdata->lru_lock, flags);
-		remap_page(head);
+		remap_page(head, thp_nr_pages(head));
 		ret = -EBUSY;
 	}
 
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 06/11] mm/huge_memory: Fix page_trans_huge_mapcount assumption of THP size
  2020-09-08 19:55 [PATCH 00/11] Remove assumptions of THP size Matthew Wilcox (Oracle)
                   ` (4 preceding siblings ...)
  2020-09-08 19:55 ` [PATCH 05/11] mm/huge_memory: Fix split " Matthew Wilcox (Oracle)
@ 2020-09-08 19:55 ` Matthew Wilcox (Oracle)
  2020-09-09 14:45   ` Kirill A. Shutemov
  2020-09-15  7:24   ` SeongJae Park
  2020-09-08 19:55 ` [PATCH 07/11] mm/huge_memory: Fix can_split_huge_page " Matthew Wilcox (Oracle)
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 39+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-08 19:55 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Matthew Wilcox (Oracle), Kirill A . Shutemov, Huang Ying

Ask the page what size it is instead of assuming it's PMD size.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/huge_memory.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 7bf837c32e3f..e9503b10df8f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2522,14 +2522,14 @@ int page_trans_huge_mapcount(struct page *page, int *total_mapcount)
 	page = compound_head(page);
 
 	_total_mapcount = ret = 0;
-	for (i = 0; i < HPAGE_PMD_NR; i++) {
+	for (i = 0; i < thp_nr_pages(page); i++) {
 		mapcount = atomic_read(&page[i]._mapcount) + 1;
 		ret = max(ret, mapcount);
 		_total_mapcount += mapcount;
 	}
 	if (PageDoubleMap(page)) {
 		ret -= 1;
-		_total_mapcount -= HPAGE_PMD_NR;
+		_total_mapcount -= thp_nr_pages(page);
 	}
 	mapcount = compound_mapcount(page);
 	ret += mapcount;
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 07/11] mm/huge_memory: Fix can_split_huge_page assumption of THP size
  2020-09-08 19:55 [PATCH 00/11] Remove assumptions of THP size Matthew Wilcox (Oracle)
                   ` (5 preceding siblings ...)
  2020-09-08 19:55 ` [PATCH 06/11] mm/huge_memory: Fix page_trans_huge_mapcount assumption of THP size Matthew Wilcox (Oracle)
@ 2020-09-08 19:55 ` Matthew Wilcox (Oracle)
  2020-09-09 14:46   ` Kirill A. Shutemov
                     ` (2 more replies)
  2020-09-08 19:55 ` [PATCH 08/11] mm/rmap: Fix assumptions " Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  10 siblings, 3 replies; 39+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-08 19:55 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Matthew Wilcox (Oracle), Kirill A . Shutemov, Huang Ying

Ask the page how many subpages it has instead of assuming it's PMD size.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/huge_memory.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e9503b10df8f..8bf8caf66923 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2546,9 +2546,9 @@ bool can_split_huge_page(struct page *page, int *pextra_pins)
 
 	/* Additional pins from page cache */
 	if (PageAnon(page))
-		extra_pins = PageSwapCache(page) ? HPAGE_PMD_NR : 0;
+		extra_pins = PageSwapCache(page) ? thp_nr_pages(page) : 0;
 	else
-		extra_pins = HPAGE_PMD_NR;
+		extra_pins = thp_nr_pages(page);
 	if (pextra_pins)
 		*pextra_pins = extra_pins;
 	return total_mapcount(page) == page_count(page) - extra_pins - 1;
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 08/11] mm/rmap: Fix assumptions of THP size
  2020-09-08 19:55 [PATCH 00/11] Remove assumptions of THP size Matthew Wilcox (Oracle)
                   ` (6 preceding siblings ...)
  2020-09-08 19:55 ` [PATCH 07/11] mm/huge_memory: Fix can_split_huge_page " Matthew Wilcox (Oracle)
@ 2020-09-08 19:55 ` Matthew Wilcox (Oracle)
  2020-09-09 14:47   ` Kirill A. Shutemov
  2020-09-15  7:27   ` SeongJae Park
  2020-09-08 19:55 ` [PATCH 09/11] mm/truncate: Fix truncation for pages of arbitrary size Matthew Wilcox (Oracle)
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 39+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-08 19:55 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Matthew Wilcox (Oracle), Kirill A . Shutemov, Huang Ying

Ask the page what size it is instead of assuming it's PMD size.  Do this
for anon pages as well as file pages for when someone decides to support
that.  Leave the assumption alone for pages which are PMD mapped; we
don't currently grow THPs beyond PMD size, so we don't need to change
this code yet.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/rmap.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 83cc459edc40..10f93129648c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1205,7 +1205,7 @@ void page_add_file_rmap(struct page *page, bool compound)
 	VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page);
 	lock_page_memcg(page);
 	if (compound && PageTransHuge(page)) {
-		for (i = 0, nr = 0; i < HPAGE_PMD_NR; i++) {
+		for (i = 0, nr = 0; i < thp_nr_pages(page); i++) {
 			if (atomic_inc_and_test(&page[i]._mapcount))
 				nr++;
 		}
@@ -1246,7 +1246,7 @@ static void page_remove_file_rmap(struct page *page, bool compound)
 
 	/* page still mapped by someone else? */
 	if (compound && PageTransHuge(page)) {
-		for (i = 0, nr = 0; i < HPAGE_PMD_NR; i++) {
+		for (i = 0, nr = 0; i < thp_nr_pages(page); i++) {
 			if (atomic_add_negative(-1, &page[i]._mapcount))
 				nr++;
 		}
@@ -1293,7 +1293,7 @@ static void page_remove_anon_compound_rmap(struct page *page)
 		 * Subpages can be mapped with PTEs too. Check how many of
 		 * them are still mapped.
 		 */
-		for (i = 0, nr = 0; i < HPAGE_PMD_NR; i++) {
+		for (i = 0, nr = 0; i < thp_nr_pages(page); i++) {
 			if (atomic_add_negative(-1, &page[i]._mapcount))
 				nr++;
 		}
@@ -1303,10 +1303,10 @@ static void page_remove_anon_compound_rmap(struct page *page)
 		 * page of the compound page is unmapped, but at least one
 		 * small page is still mapped.
 		 */
-		if (nr && nr < HPAGE_PMD_NR)
+		if (nr && nr < thp_nr_pages(page))
 			deferred_split_huge_page(page);
 	} else {
-		nr = HPAGE_PMD_NR;
+		nr = thp_nr_pages(page);
 	}
 
 	if (unlikely(PageMlocked(page)))
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 09/11] mm/truncate: Fix truncation for pages of arbitrary size
  2020-09-08 19:55 [PATCH 00/11] Remove assumptions of THP size Matthew Wilcox (Oracle)
                   ` (7 preceding siblings ...)
  2020-09-08 19:55 ` [PATCH 08/11] mm/rmap: Fix assumptions " Matthew Wilcox (Oracle)
@ 2020-09-08 19:55 ` Matthew Wilcox (Oracle)
  2020-09-09 14:50   ` Kirill A. Shutemov
  2020-09-15  7:36   ` SeongJae Park
  2020-09-08 19:55 ` [PATCH 10/11] mm/page-writeback: Support tail pages in wait_for_stable_page Matthew Wilcox (Oracle)
  2020-09-08 19:55 ` [PATCH 11/11] mm/vmscan: Allow arbitrary sized pages to be paged out Matthew Wilcox (Oracle)
  10 siblings, 2 replies; 39+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-08 19:55 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Matthew Wilcox (Oracle), Kirill A . Shutemov, Huang Ying

Remove the assumption that a compound page is HPAGE_PMD_SIZE,
and the assumption that any page is PAGE_SIZE.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/truncate.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index dd9ebc1da356..1cc93b57fb41 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -168,7 +168,7 @@ void do_invalidatepage(struct page *page, unsigned int offset,
  * becomes orphaned.  It will be left on the LRU and may even be mapped into
  * user pagetables if we're racing with filemap_fault().
  *
- * We need to bale out if page->mapping is no longer equal to the original
+ * We need to bail out if page->mapping is no longer equal to the original
  * mapping.  This happens a) when the VM reclaimed the page while we waited on
  * its lock, b) when a concurrent invalidate_mapping_pages got there first and
  * c) when tmpfs swizzles a page between a tmpfs inode and swapper_space.
@@ -177,12 +177,12 @@ static void
 truncate_cleanup_page(struct address_space *mapping, struct page *page)
 {
 	if (page_mapped(page)) {
-		pgoff_t nr = PageTransHuge(page) ? HPAGE_PMD_NR : 1;
+		unsigned int nr = thp_nr_pages(page);
 		unmap_mapping_pages(mapping, page->index, nr, false);
 	}
 
 	if (page_has_private(page))
-		do_invalidatepage(page, 0, PAGE_SIZE);
+		do_invalidatepage(page, 0, thp_size(page));
 
 	/*
 	 * Some filesystems seem to re-dirty the page even after
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 10/11] mm/page-writeback: Support tail pages in wait_for_stable_page
  2020-09-08 19:55 [PATCH 00/11] Remove assumptions of THP size Matthew Wilcox (Oracle)
                   ` (8 preceding siblings ...)
  2020-09-08 19:55 ` [PATCH 09/11] mm/truncate: Fix truncation for pages of arbitrary size Matthew Wilcox (Oracle)
@ 2020-09-08 19:55 ` Matthew Wilcox (Oracle)
  2020-09-09 14:53   ` Kirill A. Shutemov
  2020-09-15  7:37   ` SeongJae Park
  2020-09-08 19:55 ` [PATCH 11/11] mm/vmscan: Allow arbitrary sized pages to be paged out Matthew Wilcox (Oracle)
  10 siblings, 2 replies; 39+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-08 19:55 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Matthew Wilcox (Oracle), Kirill A . Shutemov, Huang Ying

page->mapping is undefined for tail pages, so operate exclusively on
the head page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/page-writeback.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 4e4ddd67b71e..dac075e451d3 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2849,6 +2849,7 @@ EXPORT_SYMBOL_GPL(wait_on_page_writeback);
  */
 void wait_for_stable_page(struct page *page)
 {
+	page = thp_head(page);
 	if (bdi_cap_stable_pages_required(inode_to_bdi(page->mapping->host)))
 		wait_on_page_writeback(page);
 }
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 11/11] mm/vmscan: Allow arbitrary sized pages to be paged out
  2020-09-08 19:55 [PATCH 00/11] Remove assumptions of THP size Matthew Wilcox (Oracle)
                   ` (9 preceding siblings ...)
  2020-09-08 19:55 ` [PATCH 10/11] mm/page-writeback: Support tail pages in wait_for_stable_page Matthew Wilcox (Oracle)
@ 2020-09-08 19:55 ` Matthew Wilcox (Oracle)
  2020-09-09 14:55   ` Kirill A. Shutemov
  2020-09-15  7:40   ` SeongJae Park
  10 siblings, 2 replies; 39+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-08 19:55 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Matthew Wilcox (Oracle), Kirill A . Shutemov, Huang Ying

Remove the assumption that a compound page has HPAGE_PMD_NR pins from
the page cache.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Huang Ying <ying.huang@intel.com>
---
 mm/vmscan.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 99e1796eb833..91b787fff71a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -722,8 +722,7 @@ static inline int is_page_cache_freeable(struct page *page)
 	 * that isolated the page, the page cache and optional buffer
 	 * heads at page->private.
 	 */
-	int page_cache_pins = PageTransHuge(page) && PageSwapCache(page) ?
-		HPAGE_PMD_NR : 1;
+	int page_cache_pins = thp_nr_pages(page);
 	return page_count(page) - page_has_private(page) == 1 + page_cache_pins;
 }
 
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/11] mm/filemap: Fix page cache removal for arbitrary sized THPs
  2020-09-08 19:55 ` [PATCH 01/11] mm/filemap: Fix page cache removal for arbitrary sized THPs Matthew Wilcox (Oracle)
@ 2020-09-09 14:27   ` Kirill A. Shutemov
  2020-09-15  7:13   ` SeongJae Park
  1 sibling, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2020-09-09 14:27 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, Andrew Morton, Huang Ying

On Tue, Sep 08, 2020 at 08:55:28PM +0100, Matthew Wilcox (Oracle) wrote:
> page_cache_free_page() assumes THPs are PMD_SIZE; fix that assumption.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 02/11] mm/memory: Remove page fault assumption of compound page size
  2020-09-08 19:55 ` [PATCH 02/11] mm/memory: Remove page fault assumption of compound page size Matthew Wilcox (Oracle)
@ 2020-09-09 14:29   ` Kirill A. Shutemov
  2020-09-09 14:50     ` Matthew Wilcox
  0 siblings, 1 reply; 39+ messages in thread
From: Kirill A. Shutemov @ 2020-09-09 14:29 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, Andrew Morton, Huang Ying

On Tue, Sep 08, 2020 at 08:55:29PM +0100, Matthew Wilcox (Oracle) wrote:
> A compound page in the page cache will not necessarily be of PMD size,
> so check explicitly.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/memory.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 602f4283122f..4b35b4e71e64 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3562,13 +3562,14 @@ static vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
>  	unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
>  	pmd_t entry;
>  	int i;
> -	vm_fault_t ret;
> +	vm_fault_t ret = VM_FAULT_FALLBACK;
>  
>  	if (!transhuge_vma_suitable(vma, haddr))
> -		return VM_FAULT_FALLBACK;
> +		return ret;
>  
> -	ret = VM_FAULT_FALLBACK;
>  	page = compound_head(page);
> +	if (page_order(page) != HPAGE_PMD_ORDER)
> +		return ret;

Maybe also VM_BUG_ON_PAGE(page_order(page) > HPAGE_PMD_ORDER, page)?
Just in case.

>  
>  	/*
>  	 * Archs like ppc64 need additonal space to store information
> -- 
> 2.28.0
> 

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 03/11] mm/page_owner: Change split_page_owner to take a count
  2020-09-08 19:55 ` [PATCH 03/11] mm/page_owner: Change split_page_owner to take a count Matthew Wilcox (Oracle)
@ 2020-09-09 14:42   ` Kirill A. Shutemov
  2020-09-15  7:17   ` SeongJae Park
  2020-10-13 13:52   ` Matthew Wilcox
  2 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2020-09-09 14:42 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, Andrew Morton, Huang Ying

On Tue, Sep 08, 2020 at 08:55:30PM +0100, Matthew Wilcox (Oracle) wrote:
> The implementation of split_page_owner() prefers a count rather than the
> old order of the page.  When we support a variable size THP, we won't
> have the order at this point, but we will have the number of pages.
> So change the interface to what the caller and callee would prefer.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 06/11] mm/huge_memory: Fix page_trans_huge_mapcount assumption of THP size
  2020-09-08 19:55 ` [PATCH 06/11] mm/huge_memory: Fix page_trans_huge_mapcount assumption of THP size Matthew Wilcox (Oracle)
@ 2020-09-09 14:45   ` Kirill A. Shutemov
  2020-09-15  7:24   ` SeongJae Park
  1 sibling, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2020-09-09 14:45 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, Andrew Morton, Huang Ying

On Tue, Sep 08, 2020 at 08:55:33PM +0100, Matthew Wilcox (Oracle) wrote:
> Ask the page what size it is instead of assuming it's PMD size.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 07/11] mm/huge_memory: Fix can_split_huge_page assumption of THP size
  2020-09-08 19:55 ` [PATCH 07/11] mm/huge_memory: Fix can_split_huge_page " Matthew Wilcox (Oracle)
@ 2020-09-09 14:46   ` Kirill A. Shutemov
  2020-09-15  7:25   ` SeongJae Park
  2020-09-16  1:44   ` Huang, Ying
  2 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2020-09-09 14:46 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, Andrew Morton, Huang Ying

On Tue, Sep 08, 2020 at 08:55:34PM +0100, Matthew Wilcox (Oracle) wrote:
> Ask the page how many subpages it has instead of assuming it's PMD size.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 08/11] mm/rmap: Fix assumptions of THP size
  2020-09-08 19:55 ` [PATCH 08/11] mm/rmap: Fix assumptions " Matthew Wilcox (Oracle)
@ 2020-09-09 14:47   ` Kirill A. Shutemov
  2020-09-15  7:27   ` SeongJae Park
  1 sibling, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2020-09-09 14:47 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, Andrew Morton, Huang Ying

On Tue, Sep 08, 2020 at 08:55:35PM +0100, Matthew Wilcox (Oracle) wrote:
> Ask the page what size it is instead of assuming it's PMD size.  Do this
> for anon pages as well as file pages for when someone decides to support
> that.  Leave the assumption alone for pages which are PMD mapped; we
> don't currently grow THPs beyond PMD size, so we don't need to change
> this code yet.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 09/11] mm/truncate: Fix truncation for pages of arbitrary size
  2020-09-08 19:55 ` [PATCH 09/11] mm/truncate: Fix truncation for pages of arbitrary size Matthew Wilcox (Oracle)
@ 2020-09-09 14:50   ` Kirill A. Shutemov
  2020-09-15  7:36   ` SeongJae Park
  1 sibling, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2020-09-09 14:50 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, Andrew Morton, Huang Ying

On Tue, Sep 08, 2020 at 08:55:36PM +0100, Matthew Wilcox (Oracle) wrote:
> Remove the assumption that a compound page is HPAGE_PMD_SIZE,
> and the assumption that any page is PAGE_SIZE.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 02/11] mm/memory: Remove page fault assumption of compound page size
  2020-09-09 14:29   ` Kirill A. Shutemov
@ 2020-09-09 14:50     ` Matthew Wilcox
  2020-09-11 14:51       ` Kirill A. Shutemov
  0 siblings, 1 reply; 39+ messages in thread
From: Matthew Wilcox @ 2020-09-09 14:50 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: linux-mm, Andrew Morton, Huang Ying

On Wed, Sep 09, 2020 at 05:29:04PM +0300, Kirill A. Shutemov wrote:
> On Tue, Sep 08, 2020 at 08:55:29PM +0100, Matthew Wilcox (Oracle) wrote:
> > A compound page in the page cache will not necessarily be of PMD size,
> > so check explicitly.
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> >  mm/memory.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 602f4283122f..4b35b4e71e64 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3562,13 +3562,14 @@ static vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
> >  	unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
> >  	pmd_t entry;
> >  	int i;
> > -	vm_fault_t ret;
> > +	vm_fault_t ret = VM_FAULT_FALLBACK;
> >  
> >  	if (!transhuge_vma_suitable(vma, haddr))
> > -		return VM_FAULT_FALLBACK;
> > +		return ret;
> >  
> > -	ret = VM_FAULT_FALLBACK;
> >  	page = compound_head(page);
> > +	if (page_order(page) != HPAGE_PMD_ORDER)
> > +		return ret;
> 
> Maybe also VM_BUG_ON_PAGE(page_order(page) > HPAGE_PMD_ORDER, page)?
> Just in case.

In the patch where I actually start creating THPs, I limit the order to
HPAGE_PMD_ORDER, so we're not going to see this today.  At some point
in the future, I can imagine that we allow THPs larger than PMD size,
and what we'd want alloc_set_pte() to look like is:

	if (pud_none(*vmf->pud) && PageTransCompound(page)) {
		ret = do_set_pud(vmf, page);
		if (ret != VM_FAULT_FALLBACK)
			return ret;
	}
	if (pmd_none(*vmf->pmd) && PageTransCompound(page)) {
		ret = do_set_pmd(vmf, page);
		if (ret != VM_FAULT_FALLBACK)
			return ret;
	}

Once we're in that situation, in do_set_pmd(), we'd want to figure out
which sub-page of the >PMD-sized page to insert.  But I don't want to
write code for that now.

So, what's the right approach if somebody does call alloc_set_pte()
with a >PMD sized page?  It's not exported, so the only two ways to get
it called with a >PMD sized page is to (1) persuade filemap_map_pages()
to call it, which means putting it in the page cache or (2) return it
from vm_ops->fault.  If someone actually does that (an interesting
device driver, perhaps), I don't think hitting it with a BUG is the
right response.  I think it should actually be to map the right PMD-sized
chunk of the page, but we don't even do that today -- we map the first
PMD-sized chunk of the page.

With this patch, we'll simply map the appropriate PAGE_SIZE chunk at the
requested address.  So this would be a bugfix for such a demented driver.
At some point, it'd be nice to handle this with a PMD, but I don't want
to write that code without a test-case.  We could probably simulate
it with the page cache THP code and be super-aggressive about creating
order-10 pages ... but this is feeling more and more out of scope for
this patch set, which today hit 99 patches.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 10/11] mm/page-writeback: Support tail pages in wait_for_stable_page
  2020-09-08 19:55 ` [PATCH 10/11] mm/page-writeback: Support tail pages in wait_for_stable_page Matthew Wilcox (Oracle)
@ 2020-09-09 14:53   ` Kirill A. Shutemov
  2020-09-15  7:37   ` SeongJae Park
  1 sibling, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2020-09-09 14:53 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, Andrew Morton, Huang Ying

On Tue, Sep 08, 2020 at 08:55:37PM +0100, Matthew Wilcox (Oracle) wrote:
> page->mapping is undefined for tail pages, so operate exclusively on
> the head page.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 11/11] mm/vmscan: Allow arbitrary sized pages to be paged out
  2020-09-08 19:55 ` [PATCH 11/11] mm/vmscan: Allow arbitrary sized pages to be paged out Matthew Wilcox (Oracle)
@ 2020-09-09 14:55   ` Kirill A. Shutemov
  2020-09-15  7:40   ` SeongJae Park
  1 sibling, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2020-09-09 14:55 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, Andrew Morton, Huang Ying

On Tue, Sep 08, 2020 at 08:55:38PM +0100, Matthew Wilcox (Oracle) wrote:
> Remove the assumption that a compound page has HPAGE_PMD_NR pins from
> the page cache.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Huang Ying <ying.huang@intel.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 02/11] mm/memory: Remove page fault assumption of compound page size
  2020-09-09 14:50     ` Matthew Wilcox
@ 2020-09-11 14:51       ` Kirill A. Shutemov
  0 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2020-09-11 14:51 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, Andrew Morton, Huang Ying

On Wed, Sep 09, 2020 at 03:50:35PM +0100, Matthew Wilcox wrote:
> On Wed, Sep 09, 2020 at 05:29:04PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Sep 08, 2020 at 08:55:29PM +0100, Matthew Wilcox (Oracle) wrote:
> > > A compound page in the page cache will not necessarily be of PMD size,
> > > so check explicitly.
> > > 
> > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > ---
> > >  mm/memory.c | 7 ++++---
> > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index 602f4283122f..4b35b4e71e64 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -3562,13 +3562,14 @@ static vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
> > >  	unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
> > >  	pmd_t entry;
> > >  	int i;
> > > -	vm_fault_t ret;
> > > +	vm_fault_t ret = VM_FAULT_FALLBACK;
> > >  
> > >  	if (!transhuge_vma_suitable(vma, haddr))
> > > -		return VM_FAULT_FALLBACK;
> > > +		return ret;
> > >  
> > > -	ret = VM_FAULT_FALLBACK;
> > >  	page = compound_head(page);
> > > +	if (page_order(page) != HPAGE_PMD_ORDER)
> > > +		return ret;
> > 
> > Maybe also VM_BUG_ON_PAGE(page_order(page) > HPAGE_PMD_ORDER, page)?
> > Just in case.
> 
> In the patch where I actually start creating THPs, I limit the order to
> HPAGE_PMD_ORDER, so we're not going to see this today.  At some point
> in the future, I can imagine that we allow THPs larger than PMD size,
> and what we'd want alloc_set_pte() to look like is:
> 
> 	if (pud_none(*vmf->pud) && PageTransCompound(page)) {
> 		ret = do_set_pud(vmf, page);
> 		if (ret != VM_FAULT_FALLBACK)
> 			return ret;
> 	}
> 	if (pmd_none(*vmf->pmd) && PageTransCompound(page)) {
> 		ret = do_set_pmd(vmf, page);
> 		if (ret != VM_FAULT_FALLBACK)
> 			return ret;
> 	}
> 
> Once we're in that situation, in do_set_pmd(), we'd want to figure out
> which sub-page of the >PMD-sized page to insert.  But I don't want to
> write code for that now.
> 
> So, what's the right approach if somebody does call alloc_set_pte()
> with a >PMD sized page?  It's not exported, so the only two ways to get
> it called with a >PMD sized page is to (1) persuade filemap_map_pages()
> to call it, which means putting it in the page cache or (2) return it
> from vm_ops->fault.  If someone actually does that (an interesting
> device driver, perhaps), I don't think hitting it with a BUG is the
> right response.  I think it should actually be to map the right PMD-sized
> chunk of the page, but we don't even do that today -- we map the first
> PMD-sized chunk of the page.
> 
> With this patch, we'll simply map the appropriate PAGE_SIZE chunk at the
> requested address.  So this would be a bugfix for such a demented driver.
> At some point, it'd be nice to handle this with a PMD, but I don't want
> to write that code without a test-case.  We could probably simulate
> it with the page cache THP code and be super-aggressive about creating
> order-10 pages ... but this is feeling more and more out of scope for
> this patch set, which today hit 99 patches.

Okay, fair enough. VM_BUG is too strong reaction here as we can make a
reasonable fallback. Maybe WARN_ON_ONCE() would make sense? It would
indicate the place that has to be adjust once we would get abouve
PMD-order pages.

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/11] mm/filemap: Fix page cache removal for arbitrary sized THPs
  2020-09-08 19:55 ` [PATCH 01/11] mm/filemap: Fix page cache removal for arbitrary sized THPs Matthew Wilcox (Oracle)
  2020-09-09 14:27   ` Kirill A. Shutemov
@ 2020-09-15  7:13   ` SeongJae Park
  1 sibling, 0 replies; 39+ messages in thread
From: SeongJae Park @ 2020-09-15  7:13 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, Andrew Morton, Kirill A . Shutemov, Huang Ying

On Tue,  8 Sep 2020 20:55:28 +0100 "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:

> page_cache_free_page() assumes THPs are PMD_SIZE; fix that assumption.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: SeongJae Park <sjpark@amazon.de>

Thanks,
SeongJae Park


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 03/11] mm/page_owner: Change split_page_owner to take a count
  2020-09-08 19:55 ` [PATCH 03/11] mm/page_owner: Change split_page_owner to take a count Matthew Wilcox (Oracle)
  2020-09-09 14:42   ` Kirill A. Shutemov
@ 2020-09-15  7:17   ` SeongJae Park
  2020-10-13 13:52   ` Matthew Wilcox
  2 siblings, 0 replies; 39+ messages in thread
From: SeongJae Park @ 2020-09-15  7:17 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, Andrew Morton, Kirill A . Shutemov, Huang Ying

On Tue,  8 Sep 2020 20:55:30 +0100 "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:

> The implementation of split_page_owner() prefers a count rather than the
> old order of the page.  When we support a variable size THP, we won't
> have the order at this point, but we will have the number of pages.
> So change the interface to what the caller and callee would prefer.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: SeongJae Park <sjpark@amazon.de>


Thanks,
SeongJae Park


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 04/11] mm/huge_memory: Fix total_mapcount assumption of page size
  2020-09-08 19:55 ` [PATCH 04/11] mm/huge_memory: Fix total_mapcount assumption of page size Matthew Wilcox (Oracle)
@ 2020-09-15  7:21   ` SeongJae Park
  0 siblings, 0 replies; 39+ messages in thread
From: SeongJae Park @ 2020-09-15  7:21 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, Andrew Morton, Kirill A. Shutemov, Huang Ying

On Tue,  8 Sep 2020 20:55:31 +0100 "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:

> From: "Kirill A. Shutemov" <kirill@shutemov.name>
> 
> File THPs may now be of arbitrary order.
> 
> Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: SeongJae Park <sjpark@amazon.de>


Thanks,
SeongJae Park


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 05/11] mm/huge_memory: Fix split assumption of page size
  2020-09-08 19:55 ` [PATCH 05/11] mm/huge_memory: Fix split " Matthew Wilcox (Oracle)
@ 2020-09-15  7:23   ` SeongJae Park
  0 siblings, 0 replies; 39+ messages in thread
From: SeongJae Park @ 2020-09-15  7:23 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, Andrew Morton, Kirill A. Shutemov, Huang Ying

On Tue,  8 Sep 2020 20:55:32 +0100 "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:

> From: "Kirill A. Shutemov" <kirill@shutemov.name>
> 
> File THPs may now be of arbitrary size, and we can't rely on that size
> after doing the split so remember the number of pages before we start
> the split.
> 
> Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: SeongJae Park <sjpark@amazon.de>


Thanks,
SeongJae Park


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 06/11] mm/huge_memory: Fix page_trans_huge_mapcount assumption of THP size
  2020-09-08 19:55 ` [PATCH 06/11] mm/huge_memory: Fix page_trans_huge_mapcount assumption of THP size Matthew Wilcox (Oracle)
  2020-09-09 14:45   ` Kirill A. Shutemov
@ 2020-09-15  7:24   ` SeongJae Park
  1 sibling, 0 replies; 39+ messages in thread
From: SeongJae Park @ 2020-09-15  7:24 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, Andrew Morton, Kirill A . Shutemov, Huang Ying

On Tue,  8 Sep 2020 20:55:33 +0100 "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:

> Ask the page what size it is instead of assuming it's PMD size.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: SeongJae Park <sjpark@amazon.de>


Thanks,
SeongJae Park


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 07/11] mm/huge_memory: Fix can_split_huge_page assumption of THP size
  2020-09-08 19:55 ` [PATCH 07/11] mm/huge_memory: Fix can_split_huge_page " Matthew Wilcox (Oracle)
  2020-09-09 14:46   ` Kirill A. Shutemov
@ 2020-09-15  7:25   ` SeongJae Park
  2020-09-16  1:44   ` Huang, Ying
  2 siblings, 0 replies; 39+ messages in thread
From: SeongJae Park @ 2020-09-15  7:25 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, Andrew Morton, Kirill A . Shutemov, Huang Ying

On Tue,  8 Sep 2020 20:55:34 +0100 "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:

> Ask the page how many subpages it has instead of assuming it's PMD size.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: SeongJae Park <sjpark@amazon.de>


Thanks,
SeongJae Park


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 08/11] mm/rmap: Fix assumptions of THP size
  2020-09-08 19:55 ` [PATCH 08/11] mm/rmap: Fix assumptions " Matthew Wilcox (Oracle)
  2020-09-09 14:47   ` Kirill A. Shutemov
@ 2020-09-15  7:27   ` SeongJae Park
  1 sibling, 0 replies; 39+ messages in thread
From: SeongJae Park @ 2020-09-15  7:27 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, Andrew Morton, Kirill A . Shutemov, Huang Ying

On Tue,  8 Sep 2020 20:55:35 +0100 "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:

> Ask the page what size it is instead of assuming it's PMD size.  Do this
> for anon pages as well as file pages for when someone decides to support
> that.  Leave the assumption alone for pages which are PMD mapped; we
> don't currently grow THPs beyond PMD size, so we don't need to change
> this code yet.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: SeongJae Park <sjpark@amazon.de>


Thanks,
SeongJae Park


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 09/11] mm/truncate: Fix truncation for pages of arbitrary size
  2020-09-08 19:55 ` [PATCH 09/11] mm/truncate: Fix truncation for pages of arbitrary size Matthew Wilcox (Oracle)
  2020-09-09 14:50   ` Kirill A. Shutemov
@ 2020-09-15  7:36   ` SeongJae Park
  1 sibling, 0 replies; 39+ messages in thread
From: SeongJae Park @ 2020-09-15  7:36 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, Andrew Morton, Kirill A . Shutemov, Huang Ying

On Tue,  8 Sep 2020 20:55:36 +0100 "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:

> Remove the assumption that a compound page is HPAGE_PMD_SIZE,
> and the assumption that any page is PAGE_SIZE.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: SeongJae Park <sjpark@amazon.de>


Thanks,
SeongJae Park


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 10/11] mm/page-writeback: Support tail pages in wait_for_stable_page
  2020-09-08 19:55 ` [PATCH 10/11] mm/page-writeback: Support tail pages in wait_for_stable_page Matthew Wilcox (Oracle)
  2020-09-09 14:53   ` Kirill A. Shutemov
@ 2020-09-15  7:37   ` SeongJae Park
  1 sibling, 0 replies; 39+ messages in thread
From: SeongJae Park @ 2020-09-15  7:37 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, Andrew Morton, Kirill A . Shutemov, Huang Ying

On Tue,  8 Sep 2020 20:55:37 +0100 "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:

> page->mapping is undefined for tail pages, so operate exclusively on
> the head page.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: SeongJae Park <sjpark@amazon.de>


Thanks,
SeongJae Park


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 11/11] mm/vmscan: Allow arbitrary sized pages to be paged out
  2020-09-08 19:55 ` [PATCH 11/11] mm/vmscan: Allow arbitrary sized pages to be paged out Matthew Wilcox (Oracle)
  2020-09-09 14:55   ` Kirill A. Shutemov
@ 2020-09-15  7:40   ` SeongJae Park
  2020-09-15 12:52     ` Matthew Wilcox
  1 sibling, 1 reply; 39+ messages in thread
From: SeongJae Park @ 2020-09-15  7:40 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, Andrew Morton, Kirill A . Shutemov, Huang Ying

On Tue,  8 Sep 2020 20:55:38 +0100 "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:

> Remove the assumption that a compound page has HPAGE_PMD_NR pins from
> the page cache.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Huang Ying <ying.huang@intel.com>
> ---
>  mm/vmscan.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 99e1796eb833..91b787fff71a 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -722,8 +722,7 @@ static inline int is_page_cache_freeable(struct page *page)
>  	 * that isolated the page, the page cache and optional buffer
>  	 * heads at page->private.
>  	 */
> -	int page_cache_pins = PageTransHuge(page) && PageSwapCache(page) ?
> -		HPAGE_PMD_NR : 1;
> +	int page_cache_pins = thp_nr_pages(page);

Is it ok to remove the PageSwapCache() check?


Thanks,
SeongJae Park


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 11/11] mm/vmscan: Allow arbitrary sized pages to be paged out
  2020-09-15  7:40   ` SeongJae Park
@ 2020-09-15 12:52     ` Matthew Wilcox
  2020-09-16  1:40       ` Huang, Ying
  0 siblings, 1 reply; 39+ messages in thread
From: Matthew Wilcox @ 2020-09-15 12:52 UTC (permalink / raw)
  To: SeongJae Park; +Cc: linux-mm, Andrew Morton, Kirill A . Shutemov, Huang Ying

On Tue, Sep 15, 2020 at 09:40:45AM +0200, SeongJae Park wrote:
> On Tue,  8 Sep 2020 20:55:38 +0100 "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:
> > Remove the assumption that a compound page has HPAGE_PMD_NR pins from
> > the page cache.
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Cc: Huang Ying <ying.huang@intel.com>

> > -	int page_cache_pins = PageTransHuge(page) && PageSwapCache(page) ?
> > -		HPAGE_PMD_NR : 1;
> > +	int page_cache_pins = thp_nr_pages(page);
> 
> Is it ok to remove the PageSwapCache() check?

I think so?  My understanding is that it was added in commit bd4c82c22c36
to catch shmem pages, but there was really no reason to only do this for
shmem pages.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 11/11] mm/vmscan: Allow arbitrary sized pages to be paged out
  2020-09-15 12:52     ` Matthew Wilcox
@ 2020-09-16  1:40       ` Huang, Ying
  2020-09-16  6:09         ` SeongJae Park
  2020-09-30 12:13         ` Matthew Wilcox
  0 siblings, 2 replies; 39+ messages in thread
From: Huang, Ying @ 2020-09-16  1:40 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: SeongJae Park, linux-mm, Andrew Morton, Kirill A . Shutemov

Matthew Wilcox <willy@infradead.org> writes:

> On Tue, Sep 15, 2020 at 09:40:45AM +0200, SeongJae Park wrote:
>> On Tue,  8 Sep 2020 20:55:38 +0100 "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:
>> > Remove the assumption that a compound page has HPAGE_PMD_NR pins from
>> > the page cache.
>> > 
>> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>> > Cc: Huang Ying <ying.huang@intel.com>
>
>> > -	int page_cache_pins = PageTransHuge(page) && PageSwapCache(page) ?
>> > -		HPAGE_PMD_NR : 1;
>> > +	int page_cache_pins = thp_nr_pages(page);
>> 
>> Is it ok to remove the PageSwapCache() check?
>
> I think so?  My understanding is that it was added in commit bd4c82c22c36
> to catch shmem pages, but there was really no reason to only do this for
> shmem pages.

The original implementation is to write out Anonymous THP (not shmem).
The code should work after the changing, because now any THP except
normal Anonymous THP in swap cache will be split during reclaiming
already.

Acked-by: "Huang, Ying" <ying.huang@intel.com>

Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 07/11] mm/huge_memory: Fix can_split_huge_page assumption of THP size
  2020-09-08 19:55 ` [PATCH 07/11] mm/huge_memory: Fix can_split_huge_page " Matthew Wilcox (Oracle)
  2020-09-09 14:46   ` Kirill A. Shutemov
  2020-09-15  7:25   ` SeongJae Park
@ 2020-09-16  1:44   ` Huang, Ying
  2 siblings, 0 replies; 39+ messages in thread
From: Huang, Ying @ 2020-09-16  1:44 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, Andrew Morton, Kirill A . Shutemov

"Matthew Wilcox (Oracle)" <willy@infradead.org> writes:

> Ask the page how many subpages it has instead of assuming it's PMD size.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/huge_memory.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index e9503b10df8f..8bf8caf66923 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2546,9 +2546,9 @@ bool can_split_huge_page(struct page *page, int *pextra_pins)
>  
>  	/* Additional pins from page cache */
>  	if (PageAnon(page))
> -		extra_pins = PageSwapCache(page) ? HPAGE_PMD_NR : 0;
> +		extra_pins = PageSwapCache(page) ? thp_nr_pages(page) : 0;
>  	else
> -		extra_pins = HPAGE_PMD_NR;
> +		extra_pins = thp_nr_pages(page);
>  	if (pextra_pins)
>  		*pextra_pins = extra_pins;
>  	return total_mapcount(page) == page_count(page) - extra_pins - 1;

Acked-by: "Huang, Ying" <ying.huang@intel.com>

Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 11/11] mm/vmscan: Allow arbitrary sized pages to be paged out
  2020-09-16  1:40       ` Huang, Ying
@ 2020-09-16  6:09         ` SeongJae Park
  2020-09-30 12:13         ` Matthew Wilcox
  1 sibling, 0 replies; 39+ messages in thread
From: SeongJae Park @ 2020-09-16  6:09 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Matthew Wilcox, SeongJae Park, linux-mm, Andrew Morton,
	Kirill A . Shutemov

On Wed, 16 Sep 2020 09:40:10 +0800 "Huang\, Ying" <ying.huang@intel.com> wrote:

> Matthew Wilcox <willy@infradead.org> writes:
> 
> > On Tue, Sep 15, 2020 at 09:40:45AM +0200, SeongJae Park wrote:
> >> On Tue,  8 Sep 2020 20:55:38 +0100 "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:
> >> > Remove the assumption that a compound page has HPAGE_PMD_NR pins from
> >> > the page cache.
> >> > 
> >> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> >> > Cc: Huang Ying <ying.huang@intel.com>
> >
> >> > -	int page_cache_pins = PageTransHuge(page) && PageSwapCache(page) ?
> >> > -		HPAGE_PMD_NR : 1;
> >> > +	int page_cache_pins = thp_nr_pages(page);
> >> 
> >> Is it ok to remove the PageSwapCache() check?
> >
> > I think so?  My understanding is that it was added in commit bd4c82c22c36
> > to catch shmem pages, but there was really no reason to only do this for
> > shmem pages.
> 
> The original implementation is to write out Anonymous THP (not shmem).
> The code should work after the changing, because now any THP except
> normal Anonymous THP in swap cache will be split during reclaiming
> already.

Thanks for the kind explanation :)

> 
> Acked-by: "Huang, Ying" <ying.huang@intel.com>

Reviewed-by: SeongJae Park <sjpark@amazon.de>


Thanks,
SeongJae Park


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 11/11] mm/vmscan: Allow arbitrary sized pages to be paged out
  2020-09-16  1:40       ` Huang, Ying
  2020-09-16  6:09         ` SeongJae Park
@ 2020-09-30 12:13         ` Matthew Wilcox
  1 sibling, 0 replies; 39+ messages in thread
From: Matthew Wilcox @ 2020-09-30 12:13 UTC (permalink / raw)
  To: Huang, Ying; +Cc: SeongJae Park, linux-mm, Andrew Morton, Kirill A . Shutemov

On Wed, Sep 16, 2020 at 09:40:10AM +0800, Huang, Ying wrote:
> Matthew Wilcox <willy@infradead.org> writes:
> > On Tue, Sep 15, 2020 at 09:40:45AM +0200, SeongJae Park wrote:
> >> On Tue,  8 Sep 2020 20:55:38 +0100 "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:
> >> > Remove the assumption that a compound page has HPAGE_PMD_NR pins from
> >> > the page cache.
> >> > 
> >> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> >> > Cc: Huang Ying <ying.huang@intel.com>
> >
> >> > -	int page_cache_pins = PageTransHuge(page) && PageSwapCache(page) ?
> >> > -		HPAGE_PMD_NR : 1;
> >> > +	int page_cache_pins = thp_nr_pages(page);
> >> 
> >> Is it ok to remove the PageSwapCache() check?
> >
> > I think so?  My understanding is that it was added in commit bd4c82c22c36
> > to catch shmem pages, but there was really no reason to only do this for
> > shmem pages.
> 
> The original implementation is to write out Anonymous THP (not shmem).
> The code should work after the changing, because now any THP except
> normal Anonymous THP in swap cache will be split during reclaiming
> already.

Actually, that's a problem I just hit.  Simple to reproduce:

git clone git://git.infradead.org/users/willy/pagecache.git
build it, boot it:
mkdir /mnt/scratch; mkfs.xfs /dev/sdb; mount /dev/sdb /mnt/scratch/; dd if=/dev/zero of=/mnt/scratch/bigfile count=2000 bs=1M; umount /mnt/scratch/; mount /dev/sdb /mnt/scratch/; cat /mnt/scratch/bigfile >/dev/null

(the virtual machine i'm using only has 2GB of memory so this forces
vmscan to happen).  Anyway, we quickly run into OOM and get this kind
of report:

 active_anon:307 inactive_anon:4137 isolated_anon:0
  active_file:0 inactive_file:436964 isolated_file:192
  unevictable:0 dirty:0 writeback:0
  slab_reclaimable:3774 slab_unreclaimable:4132
  mapped:40 shmem:320 pagetables:167 bounce:0
  free:24315 free_pcp:0 free_cma:0

A little debugging shows split_huge_page_to_list() is failing because
the page still has page_private set, so the refcount is one higher
than expected.  This patch makes the problem go away:

@@ -1271,10 +1271,6 @@ static unsigned int shrink_page_list(struct list_head *page_list,
                                /* Adding to swap updated mapping */
                                mapping = page_mapping(page);
                        }
-               } else if (unlikely(PageTransHuge(page))) {
-                       /* Split file THP */
-                       if (split_huge_page_to_list(page, page_list))
-                               goto keep_locked;
                }
 
                /*

but I'm not sure what that's going to do to tmpfs/swap.  Could you guide
me here?


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 03/11] mm/page_owner: Change split_page_owner to take a count
  2020-09-08 19:55 ` [PATCH 03/11] mm/page_owner: Change split_page_owner to take a count Matthew Wilcox (Oracle)
  2020-09-09 14:42   ` Kirill A. Shutemov
  2020-09-15  7:17   ` SeongJae Park
@ 2020-10-13 13:52   ` Matthew Wilcox
  2 siblings, 0 replies; 39+ messages in thread
From: Matthew Wilcox @ 2020-10-13 13:52 UTC (permalink / raw)
  To: linux-mm, Andrew Morton; +Cc: Kirill A . Shutemov, Huang Ying

Andrew, I missed one.  Thanks to Zi Yan for spotting this.

From 93abfc1e81a1c96e4603766ea33308b74b221a30 Mon Sep 17 00:00:00 2001
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Date: Sat, 10 Oct 2020 11:19:05 -0400
Subject: [PATCH] mm: Fix call to split_page_owner

Missed this call.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 780c8f023b28..763bbcec65b7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3209,7 +3209,7 @@ void split_page(struct page *page, unsigned int order)
 
 	for (i = 1; i < (1 << order); i++)
 		set_page_refcounted(page + i);
-	split_page_owner(page, order);
+	split_page_owner(page, 1 << order);
 }
 EXPORT_SYMBOL_GPL(split_page);
 
-- 
2.28.0




^ permalink raw reply related	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2020-10-13 13:52 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-08 19:55 [PATCH 00/11] Remove assumptions of THP size Matthew Wilcox (Oracle)
2020-09-08 19:55 ` [PATCH 01/11] mm/filemap: Fix page cache removal for arbitrary sized THPs Matthew Wilcox (Oracle)
2020-09-09 14:27   ` Kirill A. Shutemov
2020-09-15  7:13   ` SeongJae Park
2020-09-08 19:55 ` [PATCH 02/11] mm/memory: Remove page fault assumption of compound page size Matthew Wilcox (Oracle)
2020-09-09 14:29   ` Kirill A. Shutemov
2020-09-09 14:50     ` Matthew Wilcox
2020-09-11 14:51       ` Kirill A. Shutemov
2020-09-08 19:55 ` [PATCH 03/11] mm/page_owner: Change split_page_owner to take a count Matthew Wilcox (Oracle)
2020-09-09 14:42   ` Kirill A. Shutemov
2020-09-15  7:17   ` SeongJae Park
2020-10-13 13:52   ` Matthew Wilcox
2020-09-08 19:55 ` [PATCH 04/11] mm/huge_memory: Fix total_mapcount assumption of page size Matthew Wilcox (Oracle)
2020-09-15  7:21   ` SeongJae Park
2020-09-08 19:55 ` [PATCH 05/11] mm/huge_memory: Fix split " Matthew Wilcox (Oracle)
2020-09-15  7:23   ` SeongJae Park
2020-09-08 19:55 ` [PATCH 06/11] mm/huge_memory: Fix page_trans_huge_mapcount assumption of THP size Matthew Wilcox (Oracle)
2020-09-09 14:45   ` Kirill A. Shutemov
2020-09-15  7:24   ` SeongJae Park
2020-09-08 19:55 ` [PATCH 07/11] mm/huge_memory: Fix can_split_huge_page " Matthew Wilcox (Oracle)
2020-09-09 14:46   ` Kirill A. Shutemov
2020-09-15  7:25   ` SeongJae Park
2020-09-16  1:44   ` Huang, Ying
2020-09-08 19:55 ` [PATCH 08/11] mm/rmap: Fix assumptions " Matthew Wilcox (Oracle)
2020-09-09 14:47   ` Kirill A. Shutemov
2020-09-15  7:27   ` SeongJae Park
2020-09-08 19:55 ` [PATCH 09/11] mm/truncate: Fix truncation for pages of arbitrary size Matthew Wilcox (Oracle)
2020-09-09 14:50   ` Kirill A. Shutemov
2020-09-15  7:36   ` SeongJae Park
2020-09-08 19:55 ` [PATCH 10/11] mm/page-writeback: Support tail pages in wait_for_stable_page Matthew Wilcox (Oracle)
2020-09-09 14:53   ` Kirill A. Shutemov
2020-09-15  7:37   ` SeongJae Park
2020-09-08 19:55 ` [PATCH 11/11] mm/vmscan: Allow arbitrary sized pages to be paged out Matthew Wilcox (Oracle)
2020-09-09 14:55   ` Kirill A. Shutemov
2020-09-15  7:40   ` SeongJae Park
2020-09-15 12:52     ` Matthew Wilcox
2020-09-16  1:40       ` Huang, Ying
2020-09-16  6:09         ` SeongJae Park
2020-09-30 12:13         ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).