All of lore.kernel.org
 help / color / mirror / Atom feed
* Are THPs the right model for the pagecache?
@ 2020-11-13  4:46 Matthew Wilcox
  2020-11-13  6:39 ` John Hubbard
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Matthew Wilcox @ 2020-11-13  4:46 UTC (permalink / raw)
  To: linux-mm, linux-fsdevel

When I started working on using larger pages in the page cache, I was
thinking about calling them large pages or lpages.  As I worked my way
through the code, I switched to simply adopting the transparent huge
page terminology that is used by anonymous and shmem.  I just changed
the definition so that a thp is a page of arbitrary order.

But now I'm wondering if that expediency has brought me to the right
place.  To enable THP, you have to select CONFIG_TRANSPARENT_HUGEPAGE,
which is only available on architectures which support using larger TLB
entries to map PMD-sized pages.  Fair enough, since that was the original
definition, but the point of suppoting larger page sizes in the page
cache is to reduce software overhead.  Why shouldn't Alpha or m68k use
large pages in the page cache, even if they can't use them in their TLBs?

I'm also thinking about the number of asserts about
PageHead/PageTail/PageCompound and the repeated invocations of
compound_head().  If we had a different type for large pages, we could use
the compiler to assert these things instead of putting in runtime asserts.

IOWs, something like this:

struct lpage {
	struct page subpages[4];
};

static inline struct lpage *page_lpage(struct page *page)
{
	unsigned long head = READ_ONCE(page->compound_head);

	if (unlikely(head & 1))
		return (struct lpage *)(head - 1);
	return (struct lpage *)page;
}

We can then work our way through the code, distinguishing between
functions which really want to get an lpage (ie ones which currently
assert that they see only a PageHead) and functions which want to get
a particular subpage.

Some functions are going to need to be split.  eg pagecache_get_page()
currently takes an FGP_HEAD flag which determines whether it returns
a head page or the subpage for the index.  FGP_HEAD will have to
go away in favour of having separate pagecache_get_subpage() and
pagecache_get_lpage().  Or preferably, all callers of pagecache_get_page()
get converted to use lpages and they can call find_subpage() all by
themselves, if they need it.

Feels like a lot of work, but it can be done gradually.  My fear with
the current code is that filesystem writers who want to convert to
supporting THPs are not going to understand which interfaces expect a
THP and which expect a subpage.  For example, vmf->page (in the mkwrite
handler) is a subpage.  But the page passed to ->readpage is a THP.
I don't think we're going to be able to switch either of those any time
soon, so distinguishing them with a type seems only fair to fs authors.
See, for example, Darrick's reasonable question here:
https://lore.kernel.org/linux-fsdevel/20201014161216.GE9832@magnolia/

I'm not volunteering to do any of this in time for the next merge window!
I have lots of patches to get approved by various maintainers in the
next two weeks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Are THPs the right model for the pagecache?
  2020-11-13  4:46 Are THPs the right model for the pagecache? Matthew Wilcox
@ 2020-11-13  6:39 ` John Hubbard
  2020-11-13 12:38   ` Matthew Wilcox
  2020-11-13  7:08 ` Hugh Dickins
  2020-11-13 15:19 ` Zi Yan
  2 siblings, 1 reply; 7+ messages in thread
From: John Hubbard @ 2020-11-13  6:39 UTC (permalink / raw)
  To: Matthew Wilcox, linux-mm, linux-fsdevel

On 11/12/20 8:46 PM, Matthew Wilcox wrote:
> When I started working on using larger pages in the page cache, I was
> thinking about calling them large pages or lpages.  As I worked my way
> through the code, I switched to simply adopting the transparent huge
> page terminology that is used by anonymous and shmem.  I just changed
> the definition so that a thp is a page of arbitrary order.
> 
> But now I'm wondering if that expediency has brought me to the right
> place.  To enable THP, you have to select CONFIG_TRANSPARENT_HUGEPAGE,
> which is only available on architectures which support using larger TLB
> entries to map PMD-sized pages.  Fair enough, since that was the original
> definition, but the point of suppoting larger page sizes in the page
> cache is to reduce software overhead.  Why shouldn't Alpha or m68k use
> large pages in the page cache, even if they can't use them in their TLBs?
> 
> I'm also thinking about the number of asserts about
> PageHead/PageTail/PageCompound and the repeated invocations of
> compound_head().  If we had a different type for large pages, we could use
> the compiler to assert these things instead of putting in runtime asserts.

This seems like a really good idea to me, anyway. Even in the fairly
small area of gup.c, some type safety (normal pages vs. large pages)
would have helped keep things straight when I was fooling around with
pinning pages.


> 
> IOWs, something like this:
> 
> struct lpage {
> 	struct page subpages[4];
> };
> 
> static inline struct lpage *page_lpage(struct page *page)
> {
> 	unsigned long head = READ_ONCE(page->compound_head);
> 
> 	if (unlikely(head & 1))
> 		return (struct lpage *)(head - 1);
> 	return (struct lpage *)page;
> }

This is really a "get_head_page()" function, not a "get_large_page()"
function. But even renaming it doesn't seem quite right, because
wouldn't it be better to avoid discarding that tail bit information? In
other words, you might be looking at 3 cases, one of which is *not*
involving large pages at all:

     The page is a single, non-compound page.
     The page is a head page of a compound page
     The page is a tail page of a compound page

...but this function returns a type of "large page", even for the first
case. That's misleading, isn't it?

Given that you've said we could get compile time asserts, I guess you're
not envisioning writing any code that could get the first case at
runtime?

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Are THPs the right model for the pagecache?
  2020-11-13  4:46 Are THPs the right model for the pagecache? Matthew Wilcox
  2020-11-13  6:39 ` John Hubbard
@ 2020-11-13  7:08 ` Hugh Dickins
  2020-11-13 15:19 ` Zi Yan
  2 siblings, 0 replies; 7+ messages in thread
From: Hugh Dickins @ 2020-11-13  7:08 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, Linux-FSDevel

On Thu, Nov 12, 2020 at 8:47 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> When I started working on using larger pages in the page cache, I was
> thinking about calling them large pages or lpages.  As I worked my way
> through the code, I switched to simply adopting the transparent huge
> page terminology that is used by anonymous and shmem.  I just changed
> the definition so that a thp is a page of arbitrary order.
>
> But now I'm wondering if that expediency has brought me to the right
> place.  To enable THP, you have to select CONFIG_TRANSPARENT_HUGEPAGE,
> which is only available on architectures which support using larger TLB
> entries to map PMD-sized pages.  Fair enough, since that was the original
> definition, but the point of suppoting larger page sizes in the page
> cache is to reduce software overhead.  Why shouldn't Alpha or m68k use
> large pages in the page cache, even if they can't use them in their TLBs?

Yes, I strongly agree with this new position. While I understood your
desire to avoid all the confusions of yet another config option, it
always seemed a wrong direction to require THugeP support for your
not-necessarily-huge TLargePs. Most of the subtlety and significance
of traditional THPs lies with the page table mappings thereof; whereas
your TLPs or whatever are aiming for I/O and page cache efficiencies:
very useful, but a mistake to muddle it with the page table issues.

Hugh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Are THPs the right model for the pagecache?
  2020-11-13  6:39 ` John Hubbard
@ 2020-11-13 12:38   ` Matthew Wilcox
  2020-11-13 17:44     ` Matthew Wilcox
  0 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2020-11-13 12:38 UTC (permalink / raw)
  To: John Hubbard; +Cc: linux-mm, linux-fsdevel

On Thu, Nov 12, 2020 at 10:39:10PM -0800, John Hubbard wrote:
> > IOWs, something like this:
> > 
> > struct lpage {
> > 	struct page subpages[4];
> > };
> > 
> > static inline struct lpage *page_lpage(struct page *page)
> > {
> > 	unsigned long head = READ_ONCE(page->compound_head);
> > 
> > 	if (unlikely(head & 1))
> > 		return (struct lpage *)(head - 1);
> > 	return (struct lpage *)page;
> > }
> 
> This is really a "get_head_page()" function, not a "get_large_page()"
> function. But even renaming it doesn't seem quite right, because
> wouldn't it be better to avoid discarding that tail bit information? In
> other words, you might be looking at 3 cases, one of which is *not*
> involving large pages at all:
> 
>     The page is a single, non-compound page.
>     The page is a head page of a compound page
>     The page is a tail page of a compound page
> 
> ...but this function returns a type of "large page", even for the first
> case. That's misleading, isn't it?

Argh.  Yes, that's part of the problem, so this is still confusing.
An lpage might actually be an order-0 page.  Maybe it needs to be called
something that's not 'page' at all.  There are really four cases:

 - An order-0 page
 - A subpage that happens to be a tail page
 - A subpage that happens to be a head page
 - An order-N page

We have code today that treats tail pages as order-0 pages, but if
the subpage you happen to pass in is a head page, it'll work on the
entire page.  That must, surely, be a bug.

So what if we had:

/* Cache memory */
struct cmem {
	struct page pages[1];
};

Now there's a clear hierarchy.  The page cache stores pointers to cmem.

struct page *cmem_page(struct cmem *cmem, pgoff_t index)
{
	return cmem->pages[index - cmem->pages[0].index];
}

struct cmem *page_cmem(struct page *page)
{
	unsigned long head = READ_ONCE(page->compound_head);

	if (unlikely(head & 1))
		return (struct cmem *)(head - 1);
	return (struct cmem *)page;
}

and we'll need the usualy panoply of functions to get the order/size/...
of a cmem.  We'll also need functions like CMemDirty(), CMemLocked(),
CMemWriteback(), etc.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Are THPs the right model for the pagecache?
  2020-11-13  4:46 Are THPs the right model for the pagecache? Matthew Wilcox
  2020-11-13  6:39 ` John Hubbard
  2020-11-13  7:08 ` Hugh Dickins
@ 2020-11-13 15:19 ` Zi Yan
  2 siblings, 0 replies; 7+ messages in thread
From: Zi Yan @ 2020-11-13 15:19 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1570 bytes --]

On 12 Nov 2020, at 23:46, Matthew Wilcox wrote:

> When I started working on using larger pages in the page cache, I was
> thinking about calling them large pages or lpages.  As I worked my way
> through the code, I switched to simply adopting the transparent huge
> page terminology that is used by anonymous and shmem.  I just changed
> the definition so that a thp is a page of arbitrary order.
>
> But now I'm wondering if that expediency has brought me to the right
> place.  To enable THP, you have to select CONFIG_TRANSPARENT_HUGEPAGE,
> which is only available on architectures which support using larger TLB
> entries to map PMD-sized pages.  Fair enough, since that was the original
> definition, but the point of suppoting larger page sizes in the page
> cache is to reduce software overhead.  Why shouldn't Alpha or m68k use
> large pages in the page cache, even if they can't use them in their TLBs?

I think the issue might come from the mixture of physical page sizes and
page table entry sizes. THP in fact has two parts: the ability of managing
a group of pages using just PageHead and the support for larger than
PTE (the smallest virtual address range mapped by a page table entry) page
table entry mappings. The first part should be independent of the second
one, but the second part relies on the first one. Maybe it is possible
to pull out the code for managing physical pages from CONFIG_TRANSPARENT_HUGEPAGE
and enable it unconditionally, so any arch can take the advantage of
large pages.

—
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Are THPs the right model for the pagecache?
  2020-11-13 12:38   ` Matthew Wilcox
@ 2020-11-13 17:44     ` Matthew Wilcox
  2020-11-13 19:44       ` John Hubbard
  0 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2020-11-13 17:44 UTC (permalink / raw)
  To: John Hubbard; +Cc: linux-mm, linux-fsdevel

On Fri, Nov 13, 2020 at 12:38:36PM +0000, Matthew Wilcox wrote:
> So what if we had:
> 
> /* Cache memory */
> struct cmem {
> 	struct page pages[1];
> };

OK, that's a terrible name.  I went with 'folio' for this demonstration.
Other names suggested include album, sheaf and ream.

I think eventually we might get rid of the term 'compound pages'
altogether.  All compound pages become folios.

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2c775d7d52f8..7d0ec5c1c7ac 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -707,6 +707,21 @@ static inline void set_compound_order(struct page *page, unsigned int order)
 	page[1].compound_nr = 1U << order;
 }
 
+static inline unsigned int folio_order(struct folio *folio)
+{
+	return compound_order(folio->pages);
+}
+
+static inline unsigned long folio_nr_pages(struct folio *folio)
+{
+	return compound_nr(folio->pages);
+}
+
+static inline size_t folio_size(struct folio *folio)
+{
+	return PAGE_SIZE << folio_order(folio);
+}
+
 #include <linux/huge_mm.h>
 
 /*
@@ -814,9 +829,9 @@ static inline void *kvcalloc(size_t n, size_t size, gfp_t flags)
 extern void kvfree(const void *addr);
 extern void kvfree_sensitive(const void *addr, size_t len);
 
-static inline int head_compound_mapcount(struct page *head)
+static inline int folio_mapcount(struct folio *folio)
 {
-	return atomic_read(compound_mapcount_ptr(head)) + 1;
+	return atomic_read(folio_mapcount_ptr(folio)) + 1;
 }
 
 /*
@@ -827,8 +842,7 @@ static inline int head_compound_mapcount(struct page *head)
 static inline int compound_mapcount(struct page *page)
 {
 	VM_BUG_ON_PAGE(!PageCompound(page), page);
-	page = compound_head(page);
-	return head_compound_mapcount(page);
+	return folio_mapcount(page_folio(page));
 }
 
 /*
@@ -934,16 +948,15 @@ static inline bool hpage_pincount_available(struct page *page)
 	return PageCompound(page) && compound_order(page) > 1;
 }
 
-static inline int head_compound_pincount(struct page *head)
+static inline int folio_pincount(struct folio *folio)
 {
-	return atomic_read(compound_pincount_ptr(head));
+	return atomic_read(folio_pincount_ptr(folio));
 }
 
 static inline int compound_pincount(struct page *page)
 {
 	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
-	page = compound_head(page);
-	return head_compound_pincount(page);
+	return folio_pincount(page_folio(page));
 }
 
 /* Returns the number of bytes in this potentially compound page. */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6681fca5011f..0161d7808d60 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -223,14 +223,35 @@ struct page {
 #endif
 } _struct_page_alignment;
 
-static inline atomic_t *compound_mapcount_ptr(struct page *page)
+/*
+ * A folio is a collection of pages.  There may be only one page in the folio.
+ */
+struct folio {
+	struct page pages[1];
+};
+
+static inline struct page *folio_page(struct folio *folio, pgoff_t index)
+{
+	return &folio->pages[index - folio->pages->index];
+}
+
+static inline struct folio *page_folio(struct page *page)
+{
+	unsigned long head = READ_ONCE(page->compound_head);
+
+	if (unlikely(head & 1))
+		return (struct folio *)(head - 1);
+	return (struct folio *)page;
+}
+
+static inline atomic_t *folio_mapcount_ptr(struct folio *folio)
 {
-	return &page[1].compound_mapcount;
+	return &folio->pages[1].compound_mapcount;
 }
 
-static inline atomic_t *compound_pincount_ptr(struct page *page)
+static inline atomic_t *folio_pincount_ptr(struct folio *folio)
 {
-	return &page[2].hpage_pinned_refcount;
+	return &folio->pages[2].hpage_pinned_refcount;
 }
 
 /*
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 0f762217a844..d2a259cd3771 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -652,7 +652,7 @@ static inline int PageTransCompound(struct page *page)
  */
 static inline int PageTransCompoundMap(struct page *page)
 {
-	struct page *head;
+	struct folio *folio;
 
 	if (!PageTransCompound(page))
 		return 0;
@@ -660,10 +660,10 @@ static inline int PageTransCompoundMap(struct page *page)
 	if (PageAnon(page))
 		return atomic_read(&page->_mapcount) < 0;
 
-	head = compound_head(page);
+	folio = page_folio(page);
 	/* File THP is PMD mapped and not PTE mapped */
 	return atomic_read(&page->_mapcount) ==
-	       atomic_read(compound_mapcount_ptr(head));
+	       atomic_read(folio_mapcount_ptr(folio));
 }
 
 /*
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 70085ca1a3fc..d4a353f30748 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -185,7 +185,11 @@ void hugepage_add_new_anon_rmap(struct page *, struct vm_area_struct *,
 
 static inline void page_dup_rmap(struct page *page, bool compound)
 {
-	atomic_inc(compound ? compound_mapcount_ptr(page) : &page->_mapcount);
+	VM_BUG_ON_PAGE(compound && PageTail(page), page);
+	if (compound)
+		atomic_inc(folio_mapcount_ptr(page_folio(page)));
+	else
+		atomic_inc(&page->_mapcount);
 }
 
 /*
diff --git a/mm/debug.c b/mm/debug.c
index 8a40b3fefbeb..ab807dc79c2c 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -44,7 +44,7 @@ const struct trace_print_flags vmaflag_names[] = {
 
 void __dump_page(struct page *page, const char *reason)
 {
-	struct page *head = compound_head(page);
+	struct folio *folio = page_folio(page);
 	struct address_space *mapping;
 	bool page_poisoned = PagePoisoned(page);
 	bool compound = PageCompound(page);
@@ -68,7 +68,8 @@ void __dump_page(struct page *page, const char *reason)
 		goto hex_only;
 	}
 
-	if (page < head || (page >= head + MAX_ORDER_NR_PAGES)) {
+	if (page < folio->pages ||
+	    (page >= folio->pages + MAX_ORDER_NR_PAGES)) {
 		/*
 		 * Corrupt page, so we cannot call page_mapping. Instead, do a
 		 * safe subset of the steps that page_mapping() does. Caution:
@@ -82,7 +83,7 @@ void __dump_page(struct page *page, const char *reason)
 			mapping = NULL;
 		else
 			mapping = (void *)(tmp & ~PAGE_MAPPING_FLAGS);
-		head = page;
+		folio = (struct folio *)page;
 		compound = false;
 	} else {
 		mapping = page_mapping(page);
@@ -93,21 +94,21 @@ void __dump_page(struct page *page, const char *reason)
 	 * page->_mapcount space in struct page is used by sl[aou]b pages to
 	 * encode own info.
 	 */
-	mapcount = PageSlab(head) ? 0 : page_mapcount(page);
+	mapcount = PageSlab(folio->pages) ? 0 : page_mapcount(page);
 
 	pr_warn("page:%p refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n",
-			page, page_ref_count(head), mapcount, mapping,
+			page, page_ref_count(folio->pages), mapcount, mapping,
 			page_to_pgoff(page), page_to_pfn(page));
 	if (compound) {
-		if (hpage_pincount_available(page)) {
-			pr_warn("head:%p order:%u compound_mapcount:%d compound_pincount:%d\n",
-					head, compound_order(head),
-					head_compound_mapcount(head),
-					head_compound_pincount(head));
+		if (hpage_pincount_available(folio->pages)) {
+			pr_warn("folio:%p order:%u mapcount:%d pincount:%d\n",
+					folio, folio_order(folio),
+					folio_mapcount(folio),
+					folio_pincount(folio));
 		} else {
-			pr_warn("head:%p order:%u compound_mapcount:%d\n",
-					head, compound_order(head),
-					head_compound_mapcount(head));
+			pr_warn("folio:%p order:%u mapcount:%d\n",
+					folio, folio_order(folio),
+					folio_mapcount(folio));
 		}
 	}
 	if (PageKsm(page))
@@ -166,16 +167,16 @@ void __dump_page(struct page *page, const char *reason)
 out_mapping:
 	BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
 
-	pr_warn("%sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags,
-		page_cma ? " CMA" : "");
+	pr_warn("%sflags: %#lx(%pGp)%s\n", type, folio->pages->flags,
+			&(folio->pages->flags), page_cma ? " CMA" : "");
 
 hex_only:
 	print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32,
 			sizeof(unsigned long), page,
 			sizeof(struct page), false);
-	if (head != page)
+	if (folio->pages != page)
 		print_hex_dump(KERN_WARNING, "head: ", DUMP_PREFIX_NONE, 32,
-			sizeof(unsigned long), head,
+			sizeof(unsigned long), folio->pages,
 			sizeof(struct page), false);
 
 	if (reason)
diff --git a/mm/gup.c b/mm/gup.c
index 49c4eabca271..fc51c845bc82 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -29,35 +29,31 @@ struct follow_page_context {
 	unsigned int page_mask;
 };
 
-static void hpage_pincount_add(struct page *page, int refs)
+static void folio_pincount_add(struct folio *folio, int refs)
 {
-	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
-	VM_BUG_ON_PAGE(page != compound_head(page), page);
-
-	atomic_add(refs, compound_pincount_ptr(page));
+	VM_BUG_ON_PAGE(!hpage_pincount_available(folio->pages), folio->pages);
+	atomic_add(refs, folio_pincount_ptr(folio));
 }
 
-static void hpage_pincount_sub(struct page *page, int refs)
+static void folio_pincount_sub(struct folio *folio, int refs)
 {
-	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
-	VM_BUG_ON_PAGE(page != compound_head(page), page);
-
-	atomic_sub(refs, compound_pincount_ptr(page));
+	VM_BUG_ON_PAGE(!hpage_pincount_available(folio->pages), folio->pages);
+	atomic_sub(refs, folio_pincount_ptr(folio));
 }
 
 /*
  * Return the compound head page with ref appropriately incremented,
  * or NULL if that failed.
  */
-static inline struct page *try_get_compound_head(struct page *page, int refs)
+static inline struct folio *try_get_folio(struct page *page, int refs)
 {
-	struct page *head = compound_head(page);
+	struct folio *folio = page_folio(page);
 
-	if (WARN_ON_ONCE(page_ref_count(head) < 0))
+	if (WARN_ON_ONCE(page_ref_count(folio->pages) < 0))
 		return NULL;
-	if (unlikely(!page_cache_add_speculative(head, refs)))
+	if (unlikely(!page_cache_add_speculative(folio->pages, refs)))
 		return NULL;
-	return head;
+	return folio;
 }
 
 /*
@@ -84,8 +80,9 @@ static __maybe_unused struct page *try_grab_compound_head(struct page *page,
 							  unsigned int flags)
 {
 	if (flags & FOLL_GET)
-		return try_get_compound_head(page, refs);
+		return try_get_folio(page, refs)->pages;
 	else if (flags & FOLL_PIN) {
+		struct folio *folio;
 		int orig_refs = refs;
 
 		/*
@@ -99,7 +96,7 @@ static __maybe_unused struct page *try_grab_compound_head(struct page *page,
 		/*
 		 * When pinning a compound page of order > 1 (which is what
 		 * hpage_pincount_available() checks for), use an exact count to
-		 * track it, via hpage_pincount_add/_sub().
+		 * track it, via folio_pincount_add/_sub().
 		 *
 		 * However, be sure to *also* increment the normal page refcount
 		 * field at least once, so that the page really is pinned.
@@ -107,17 +104,17 @@ static __maybe_unused struct page *try_grab_compound_head(struct page *page,
 		if (!hpage_pincount_available(page))
 			refs *= GUP_PIN_COUNTING_BIAS;
 
-		page = try_get_compound_head(page, refs);
-		if (!page)
+		folio = try_get_folio(page, refs);
+		if (!folio)
 			return NULL;
 
-		if (hpage_pincount_available(page))
-			hpage_pincount_add(page, refs);
+		if (hpage_pincount_available(folio->pages))
+			folio_pincount_add(folio, refs);
 
-		mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
-				    orig_refs);
+		mod_node_page_state(page_pgdat(folio->pages),
+					NR_FOLL_PIN_ACQUIRED, orig_refs);
 
-		return page;
+		return folio->pages;
 	}
 
 	WARN_ON_ONCE(1);
@@ -152,62 +149,62 @@ bool __must_check try_grab_page(struct page *page, unsigned int flags)
 	if (flags & FOLL_GET)
 		return try_get_page(page);
 	else if (flags & FOLL_PIN) {
+		struct folio *folio = page_folio(page);
 		int refs = 1;
 
-		page = compound_head(page);
-
-		if (WARN_ON_ONCE(page_ref_count(page) <= 0))
+		if (WARN_ON_ONCE(page_ref_count(folio->pages) <= 0))
 			return false;
 
-		if (hpage_pincount_available(page))
-			hpage_pincount_add(page, 1);
+		if (hpage_pincount_available(folio->pages))
+			folio_pincount_add(folio, 1);
 		else
 			refs = GUP_PIN_COUNTING_BIAS;
 
 		/*
 		 * Similar to try_grab_compound_head(): even if using the
-		 * hpage_pincount_add/_sub() routines, be sure to
+		 * folio_pincount_add/_sub() routines, be sure to
 		 * *also* increment the normal page refcount field at least
 		 * once, so that the page really is pinned.
 		 */
-		page_ref_add(page, refs);
+		page_ref_add(folio->pages, refs);
 
-		mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED, 1);
+		mod_node_page_state(page_pgdat(folio->pages),
+				NR_FOLL_PIN_ACQUIRED, 1);
 	}
 
 	return true;
 }
 
 #ifdef CONFIG_DEV_PAGEMAP_OPS
-static bool __unpin_devmap_managed_user_page(struct page *page)
+static bool __unpin_devmap_managed_user_page(struct folio *folio)
 {
 	int count, refs = 1;
 
-	if (!page_is_devmap_managed(page))
+	if (!page_is_devmap_managed(folio->pages))
 		return false;
 
-	if (hpage_pincount_available(page))
-		hpage_pincount_sub(page, 1);
+	if (hpage_pincount_available(folio->pages))
+		folio_pincount_sub(folio, 1);
 	else
 		refs = GUP_PIN_COUNTING_BIAS;
 
-	count = page_ref_sub_return(page, refs);
+	count = page_ref_sub_return(folio->pages, refs);
 
-	mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED, 1);
+	mod_node_page_state(page_pgdat(folio->pages), NR_FOLL_PIN_RELEASED, 1);
 	/*
 	 * devmap page refcounts are 1-based, rather than 0-based: if
 	 * refcount is 1, then the page is free and the refcount is
 	 * stable because nobody holds a reference on the page.
 	 */
 	if (count == 1)
-		free_devmap_managed_page(page);
+		free_devmap_managed_page(folio->pages);
 	else if (!count)
-		__put_page(page);
+		__put_page(folio->pages);
 
 	return true;
 }
 #else
-static bool __unpin_devmap_managed_user_page(struct page *page)
+static bool __unpin_devmap_managed_user_page(struct folio *folio)
 {
 	return false;
 }
@@ -224,28 +221,27 @@ static bool __unpin_devmap_managed_user_page(struct page *page)
  */
 void unpin_user_page(struct page *page)
 {
+	struct folio *folio = page_folio(page);
 	int refs = 1;
 
-	page = compound_head(page);
-
 	/*
 	 * For devmap managed pages we need to catch refcount transition from
 	 * GUP_PIN_COUNTING_BIAS to 1, when refcount reach one it means the
 	 * page is free and we need to inform the device driver through
 	 * callback. See include/linux/memremap.h and HMM for details.
 	 */
-	if (__unpin_devmap_managed_user_page(page))
+	if (__unpin_devmap_managed_user_page(folio))
 		return;
 
-	if (hpage_pincount_available(page))
-		hpage_pincount_sub(page, 1);
+	if (hpage_pincount_available(folio->pages))
+		folio_pincount_sub(folio, 1);
 	else
 		refs = GUP_PIN_COUNTING_BIAS;
 
-	if (page_ref_sub_and_test(page, refs))
-		__put_page(page);
+	if (page_ref_sub_and_test(folio->pages, refs))
+		__put_page(folio->pages);
 
-	mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED, 1);
+	mod_node_page_state(page_pgdat(folio->pages), NR_FOLL_PIN_RELEASED, 1);
 }
 EXPORT_SYMBOL(unpin_user_page);
 
@@ -2074,24 +2070,26 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
 
 static void put_compound_head(struct page *page, int refs, unsigned int flags)
 {
+	struct folio *folio = (struct folio *)page;
+
 	if (flags & FOLL_PIN) {
-		mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED,
-				    refs);
+		mod_node_page_state(page_pgdat(folio->pages),
+				NR_FOLL_PIN_RELEASED, refs);
 
-		if (hpage_pincount_available(page))
-			hpage_pincount_sub(page, refs);
+		if (hpage_pincount_available(folio->pages))
+			folio_pincount_sub(folio, refs);
 		else
 			refs *= GUP_PIN_COUNTING_BIAS;
 	}
 
-	VM_BUG_ON_PAGE(page_ref_count(page) < refs, page);
+	VM_BUG_ON_PAGE(page_ref_count(folio->pages) < refs, page);
 	/*
 	 * Calling put_page() for each ref is unnecessarily slow. Only the last
 	 * ref needs a put_page().
 	 */
 	if (refs > 1)
-		page_ref_sub(page, refs - 1);
-	put_page(page);
+		page_ref_sub(folio->pages, refs - 1);
+	put_page(folio->pages);
 }
 
 #ifdef CONFIG_GUP_GET_PTE_LOW_HIGH
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1bf51d3f2f2d..e145ba69aa81 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2152,6 +2152,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	}
 
 	if (!pmd_migration) {
+		struct folio *folio = (struct folio *)page;
 		/*
 		 * Set PG_double_map before dropping compound_mapcount to avoid
 		 * false-negative page_mapped().
@@ -2163,7 +2164,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 		}
 
 		lock_page_memcg(page);
-		if (atomic_add_negative(-1, compound_mapcount_ptr(page))) {
+		if (atomic_add_negative(-1, folio_mapcount_ptr(folio))) {
 			/* Last compound_mapcount is gone. */
 			__dec_lruvec_page_state(page, NR_ANON_THPS);
 			if (TestClearPageDoubleMap(page)) {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 16ef3d2346d5..19e3ccc2f5fd 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1202,13 +1202,14 @@ static int hstate_next_node_to_free(struct hstate *h, nodemask_t *nodes_allowed)
 static void destroy_compound_gigantic_page(struct page *page,
 					unsigned int order)
 {
+	struct folio *folio = (struct folio *)page;
 	int i;
 	int nr_pages = 1 << order;
 	struct page *p = page + 1;
 
-	atomic_set(compound_mapcount_ptr(page), 0);
+	atomic_set(folio_mapcount_ptr(folio), 0);
 	if (hpage_pincount_available(page))
-		atomic_set(compound_pincount_ptr(page), 0);
+		atomic_set(folio_pincount_ptr(folio), 0);
 
 	for (i = 1; i < nr_pages; i++, p = mem_map_next(p, page, i)) {
 		clear_compound_head(p);
@@ -1509,6 +1510,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
 
 static void prep_compound_gigantic_page(struct page *page, unsigned int order)
 {
+	struct folio *folio = (struct folio *)page;
 	int i;
 	int nr_pages = 1 << order;
 	struct page *p = page + 1;
@@ -1534,10 +1536,10 @@ static void prep_compound_gigantic_page(struct page *page, unsigned int order)
 		set_page_count(p, 0);
 		set_compound_head(p, page);
 	}
-	atomic_set(compound_mapcount_ptr(page), -1);
+	atomic_set(folio_mapcount_ptr(folio), -1);
 
 	if (hpage_pincount_available(page))
-		atomic_set(compound_pincount_ptr(page), 0);
+		atomic_set(folio_pincount_ptr(folio), 0);
 }
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b7a378197dae..48c8d84828dd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -693,6 +693,7 @@ void free_compound_page(struct page *page)
 
 void prep_compound_page(struct page *page, unsigned int order)
 {
+	struct folio *folio = (struct folio *)page;
 	int i;
 	int nr_pages = 1 << order;
 
@@ -706,9 +707,9 @@ void prep_compound_page(struct page *page, unsigned int order)
 
 	set_compound_page_dtor(page, COMPOUND_PAGE_DTOR);
 	set_compound_order(page, order);
-	atomic_set(compound_mapcount_ptr(page), -1);
+	atomic_set(folio_mapcount_ptr(folio), -1);
 	if (hpage_pincount_available(page))
-		atomic_set(compound_pincount_ptr(page), 0);
+		atomic_set(folio_pincount_ptr(folio), 0);
 }
 
 #ifdef CONFIG_DEBUG_PAGEALLOC
diff --git a/mm/rmap.c b/mm/rmap.c
index 6657000b18d4..e90da98ac2f6 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1120,11 +1120,11 @@ void do_page_add_anon_rmap(struct page *page,
 		VM_BUG_ON_PAGE(!PageLocked(page), page);
 
 	if (compound) {
-		atomic_t *mapcount;
+		struct folio *folio = page_folio(page);
+
 		VM_BUG_ON_PAGE(!PageLocked(page), page);
 		VM_BUG_ON_PAGE(!PageTransHuge(page), page);
-		mapcount = compound_mapcount_ptr(page);
-		first = atomic_inc_and_test(mapcount);
+		first = atomic_inc_and_test(folio_mapcount_ptr(folio));
 	} else {
 		first = atomic_inc_and_test(&page->_mapcount);
 	}
@@ -1174,11 +1174,13 @@ void page_add_new_anon_rmap(struct page *page,
 	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
 	__SetPageSwapBacked(page);
 	if (compound) {
+		struct folio *folio = page_folio(page);
+
 		VM_BUG_ON_PAGE(!PageTransHuge(page), page);
 		/* increment count (starts at -1) */
-		atomic_set(compound_mapcount_ptr(page), 0);
+		atomic_set(folio_mapcount_ptr(folio), 0);
 		if (hpage_pincount_available(page))
-			atomic_set(compound_pincount_ptr(page), 0);
+			atomic_set(folio_pincount_ptr(folio), 0);
 
 		__inc_lruvec_page_state(page, NR_ANON_THPS);
 	} else {
@@ -1200,6 +1202,7 @@ void page_add_new_anon_rmap(struct page *page,
  */
 void page_add_file_rmap(struct page *page, bool compound)
 {
+	struct folio *folio = page_folio(page);
 	int i, nr = 1;
 
 	VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page);
@@ -1209,7 +1212,7 @@ void page_add_file_rmap(struct page *page, bool compound)
 			if (atomic_inc_and_test(&page[i]._mapcount))
 				nr++;
 		}
-		if (!atomic_inc_and_test(compound_mapcount_ptr(page)))
+		if (!atomic_inc_and_test(folio_mapcount_ptr(folio)))
 			goto out;
 		if (PageSwapBacked(page))
 			__inc_node_page_state(page, NR_SHMEM_PMDMAPPED);
@@ -1219,9 +1222,9 @@ void page_add_file_rmap(struct page *page, bool compound)
 		if (PageTransCompound(page) && page_mapping(page)) {
 			VM_WARN_ON_ONCE(!PageLocked(page));
 
-			SetPageDoubleMap(compound_head(page));
+			SetPageDoubleMap(folio->pages);
 			if (PageMlocked(page))
-				clear_page_mlock(compound_head(page));
+				clear_page_mlock(folio->pages);
 		}
 		if (!atomic_inc_and_test(&page->_mapcount))
 			goto out;
@@ -1233,6 +1236,7 @@ void page_add_file_rmap(struct page *page, bool compound)
 
 static void page_remove_file_rmap(struct page *page, bool compound)
 {
+	struct folio *folio = page_folio(page);
 	int i, nr = 1;
 
 	VM_BUG_ON_PAGE(compound && !PageHead(page), page);
@@ -1240,7 +1244,7 @@ static void page_remove_file_rmap(struct page *page, bool compound)
 	/* Hugepages are not counted in NR_FILE_MAPPED for now. */
 	if (unlikely(PageHuge(page))) {
 		/* hugetlb pages are always mapped with pmds */
-		atomic_dec(compound_mapcount_ptr(page));
+		atomic_dec(folio_mapcount_ptr(folio));
 		return;
 	}
 
@@ -1250,7 +1254,7 @@ static void page_remove_file_rmap(struct page *page, bool compound)
 			if (atomic_add_negative(-1, &page[i]._mapcount))
 				nr++;
 		}
-		if (!atomic_add_negative(-1, compound_mapcount_ptr(page)))
+		if (!atomic_add_negative(-1, folio_mapcount_ptr(folio)))
 			return;
 		if (PageSwapBacked(page))
 			__dec_node_page_state(page, NR_SHMEM_PMDMAPPED);
@@ -1274,9 +1278,10 @@ static void page_remove_file_rmap(struct page *page, bool compound)
 
 static void page_remove_anon_compound_rmap(struct page *page)
 {
+	struct folio *folio = page_folio(page);
 	int i, nr;
 
-	if (!atomic_add_negative(-1, compound_mapcount_ptr(page)))
+	if (!atomic_add_negative(-1, folio_mapcount_ptr(folio)))
 		return;
 
 	/* Hugepages are not counted in NR_ANON_PAGES for now. */
@@ -1965,7 +1970,7 @@ void hugepage_add_anon_rmap(struct page *page,
 	BUG_ON(!PageLocked(page));
 	BUG_ON(!anon_vma);
 	/* address might be in next vma when migration races vma_adjust */
-	first = atomic_inc_and_test(compound_mapcount_ptr(page));
+	first = atomic_inc_and_test(folio_mapcount_ptr(page_folio(page)));
 	if (first)
 		__page_set_anon_rmap(page, vma, address, 0);
 }
@@ -1973,10 +1978,12 @@ void hugepage_add_anon_rmap(struct page *page,
 void hugepage_add_new_anon_rmap(struct page *page,
 			struct vm_area_struct *vma, unsigned long address)
 {
+	struct folio *folio = page_folio(page);
+
 	BUG_ON(address < vma->vm_start || address >= vma->vm_end);
-	atomic_set(compound_mapcount_ptr(page), 0);
+	atomic_set(folio_mapcount_ptr(folio), 0);
 	if (hpage_pincount_available(page))
-		atomic_set(compound_pincount_ptr(page), 0);
+		atomic_set(folio_pincount_ptr(folio), 0);
 
 	__page_set_anon_rmap(page, vma, address, 1);
 }
diff --git a/mm/util.c b/mm/util.c
index 4ddb6e186dd5..a386908a92f1 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -646,17 +646,18 @@ void *page_rmapping(struct page *page)
  */
 bool page_mapped(struct page *page)
 {
+	struct folio *folio;
 	int i;
 
 	if (likely(!PageCompound(page)))
 		return atomic_read(&page->_mapcount) >= 0;
-	page = compound_head(page);
-	if (atomic_read(compound_mapcount_ptr(page)) >= 0)
+	folio = page_folio(page);
+	if (atomic_read(folio_mapcount_ptr(folio)) >= 0)
 		return true;
-	if (PageHuge(page))
+	if (PageHuge(folio->pages))
 		return false;
-	for (i = 0; i < compound_nr(page); i++) {
-		if (atomic_read(&page[i]._mapcount) >= 0)
+	for (i = 0; i < folio_nr_pages(folio); i++) {
+		if (atomic_read(&folio->pages[i]._mapcount) >= 0)
 			return true;
 	}
 	return false;
@@ -712,6 +713,7 @@ struct address_space *page_mapping_file(struct page *page)
 /* Slow path of page_mapcount() for compound pages */
 int __page_mapcount(struct page *page)
 {
+	struct folio *folio;
 	int ret;
 
 	ret = atomic_read(&page->_mapcount) + 1;
@@ -721,9 +723,9 @@ int __page_mapcount(struct page *page)
 	 */
 	if (!PageAnon(page) && !PageHuge(page))
 		return ret;
-	page = compound_head(page);
-	ret += atomic_read(compound_mapcount_ptr(page)) + 1;
-	if (PageDoubleMap(page))
+	folio = page_folio(page);
+	ret += atomic_read(folio_mapcount_ptr(folio)) + 1;
+	if (PageDoubleMap(folio->pages))
 		ret--;
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Are THPs the right model for the pagecache?
  2020-11-13 17:44     ` Matthew Wilcox
@ 2020-11-13 19:44       ` John Hubbard
  0 siblings, 0 replies; 7+ messages in thread
From: John Hubbard @ 2020-11-13 19:44 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, linux-fsdevel

On 11/13/20 9:44 AM, Matthew Wilcox wrote:
> On Fri, Nov 13, 2020 at 12:38:36PM +0000, Matthew Wilcox wrote:
>> So what if we had:
>>
>> /* Cache memory */
>> struct cmem {
>> 	struct page pages[1];
>> };
> 
> OK, that's a terrible name.  I went with 'folio' for this demonstration.
> Other names suggested include album, sheaf and ream.

+1 for "folio", that's a good name! It stands out, as it should, given that
it's a different type. The others could work too, but this one is especially
nice because there are no pre-existing uses in the kernel, so no baggage.
And it's grep-able too.

Your demo diff looks good to me, and I think it noticeably improves things
and is worth doing. One tiny tweak first:

The struct member should be named .page, rather than .pages. That's because
pages in -mm usually refers to "struct page **", but here you've got a
struct page *.  Notice how the various odd things get better in the patch
if you change it to .page:

...
> +static inline atomic_t *folio_mapcount_ptr(struct folio *folio)
>   {
> -	return &page[1].compound_mapcount;
> +	return &folio->pages[1].compound_mapcount;

See, this diff is changing from "page" to "pages", but we don't
especially desire that, *and* it's arguably even more readable like
this anyway:

	return &folio->page[1].compound_mapcount;

and...

...
> @@ -166,16 +167,16 @@ void __dump_page(struct page *page, const char *reason)
>   out_mapping:
>   	BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
>   
> -	pr_warn("%sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags,
> -		page_cma ? " CMA" : "");
> +	pr_warn("%sflags: %#lx(%pGp)%s\n", type, folio->pages->flags,
> +			&(folio->pages->flags), page_cma ? " CMA" : "");
>   
>   hex_only:
>   	print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32,
>   			sizeof(unsigned long), page,
>   			sizeof(struct page), false);
> -	if (head != page)
> +	if (folio->pages != page)

This one in particular gets a lot better:

	if (folio->page != page)

...is much more natural.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-11-13 19:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-13  4:46 Are THPs the right model for the pagecache? Matthew Wilcox
2020-11-13  6:39 ` John Hubbard
2020-11-13 12:38   ` Matthew Wilcox
2020-11-13 17:44     ` Matthew Wilcox
2020-11-13 19:44       ` John Hubbard
2020-11-13  7:08 ` Hugh Dickins
2020-11-13 15:19 ` Zi Yan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.