All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/14] Rearrange struct page
@ 2018-04-18 18:48 Matthew Wilcox
  2018-04-18 18:48 ` [PATCH v3 01/14] s390: Use _refcount for pgtables Matthew Wilcox
                   ` (13 more replies)
  0 siblings, 14 replies; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-18 18:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

From: Matthew Wilcox <mawilcox@microsoft.com>

This is actually the combination of two previously posted series.
Since they both deal with rearranging struct page and the second series
depends on the first, I thought it best to combine them.

The overall motivation is to make it easier for people to use the space
in struct page if they've allocated it for their own purposes.  By the
end of the series, we end up with five consecutive words which can be
used almost arbitrarily by the owner.

Highlights:
 - slub's counters no longer share space with _refcount.
 - slub's freelist+counters are now naturally dword aligned.
 - It's now more obvious what fields in struct page are used by which
   owners (some owners still take advantage of the union aliasing).
 - deferred_list now really exists in struct page instead of just a
   comment.
 - slub loses a parameter to a lot of functions.

Matthew Wilcox (14):
  s390: Use _refcount for pgtables
  mm: Split page_type out from _mapcount
  mm: Mark pages in use for page tables
  mm: Switch s_mem and slab_cache in struct page
  mm: Move 'private' union within struct page
  mm: Move _refcount out of struct page union
  slub: Remove page->counters
  mm: Combine first three unions in struct page
  mm: Use page->deferred_list
  mm: Move lru union within struct page
  mm: Combine first two unions in struct page
  mm: Improve struct page documentation
  slab,slub: Remove rcu_head size checks
  slub: Remove kmem_slab_cache->reserved

 arch/s390/mm/pgalloc.c                 |  21 +--
 fs/proc/page.c                         |   2 +
 include/linux/mm.h                     |   2 +
 include/linux/mm_types.h               | 216 +++++++++++--------------
 include/linux/page-flags.h             |  51 +++---
 include/linux/slub_def.h               |   1 -
 include/uapi/linux/kernel-page-flags.h |   2 +-
 kernel/crash_core.c                    |   1 +
 mm/huge_memory.c                       |   7 +-
 mm/page_alloc.c                        |  17 +-
 mm/slab.c                              |   2 -
 mm/slub.c                              | 137 +++++++---------
 scripts/tags.sh                        |   6 +-
 tools/vm/page-types.c                  |   1 +
 14 files changed, 215 insertions(+), 251 deletions(-)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v3 01/14] s390: Use _refcount for pgtables
  2018-04-18 18:48 [PATCH v3 00/14] Rearrange struct page Matthew Wilcox
@ 2018-04-18 18:48 ` Matthew Wilcox
  2018-04-18 18:49 ` [PATCH v3 02/14] mm: Split page_type out from _mapcount Matthew Wilcox
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-18 18:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

From: Matthew Wilcox <mawilcox@microsoft.com>

s390 borrows the storage used for _mapcount in struct page in order to
account whether the bottom or top half is being used for 2kB page
tables.  I want to use that for something else, so use the top byte of
_refcount instead of the bottom byte of _mapcount.  _refcount may
temporarily be incremented by other CPUs that see a stale pointer to
this page in the page cache, but each CPU can only increment it by one,
and there are no systems with 2^24 CPUs today, so they will not change
the upper byte of _refcount.  We do have to be a little careful not to
lose any of their writes (as they will subsequently decrement the
counter).

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 arch/s390/mm/pgalloc.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 562f72955956..84bd6329a88d 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -190,14 +190,15 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
 		if (!list_empty(&mm->context.pgtable_list)) {
 			page = list_first_entry(&mm->context.pgtable_list,
 						struct page, lru);
-			mask = atomic_read(&page->_mapcount);
+			mask = atomic_read(&page->_refcount) >> 24;
 			mask = (mask | (mask >> 4)) & 3;
 			if (mask != 3) {
 				table = (unsigned long *) page_to_phys(page);
 				bit = mask & 1;		/* =1 -> second 2K */
 				if (bit)
 					table += PTRS_PER_PTE;
-				atomic_xor_bits(&page->_mapcount, 1U << bit);
+				atomic_xor_bits(&page->_refcount,
+							1U << (bit + 24));
 				list_del(&page->lru);
 			}
 		}
@@ -218,12 +219,12 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
 	table = (unsigned long *) page_to_phys(page);
 	if (mm_alloc_pgste(mm)) {
 		/* Return 4K page table with PGSTEs */
-		atomic_set(&page->_mapcount, 3);
+		atomic_xor_bits(&page->_refcount, 3 << 24);
 		memset64((u64 *)table, _PAGE_INVALID, PTRS_PER_PTE);
 		memset64((u64 *)table + PTRS_PER_PTE, 0, PTRS_PER_PTE);
 	} else {
 		/* Return the first 2K fragment of the page */
-		atomic_set(&page->_mapcount, 1);
+		atomic_xor_bits(&page->_refcount, 1 << 24);
 		memset64((u64 *)table, _PAGE_INVALID, 2 * PTRS_PER_PTE);
 		spin_lock_bh(&mm->context.lock);
 		list_add(&page->lru, &mm->context.pgtable_list);
@@ -242,7 +243,8 @@ void page_table_free(struct mm_struct *mm, unsigned long *table)
 		/* Free 2K page table fragment of a 4K page */
 		bit = (__pa(table) & ~PAGE_MASK)/(PTRS_PER_PTE*sizeof(pte_t));
 		spin_lock_bh(&mm->context.lock);
-		mask = atomic_xor_bits(&page->_mapcount, 1U << bit);
+		mask = atomic_xor_bits(&page->_refcount, 1U << (bit + 24));
+		mask >>= 24;
 		if (mask & 3)
 			list_add(&page->lru, &mm->context.pgtable_list);
 		else
@@ -253,7 +255,6 @@ void page_table_free(struct mm_struct *mm, unsigned long *table)
 	}
 
 	pgtable_page_dtor(page);
-	atomic_set(&page->_mapcount, -1);
 	__free_page(page);
 }
 
@@ -274,7 +275,8 @@ void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table,
 	}
 	bit = (__pa(table) & ~PAGE_MASK) / (PTRS_PER_PTE*sizeof(pte_t));
 	spin_lock_bh(&mm->context.lock);
-	mask = atomic_xor_bits(&page->_mapcount, 0x11U << bit);
+	mask = atomic_xor_bits(&page->_refcount, 0x11U << (bit + 24));
+	mask >>= 24;
 	if (mask & 3)
 		list_add_tail(&page->lru, &mm->context.pgtable_list);
 	else
@@ -296,12 +298,13 @@ static void __tlb_remove_table(void *_table)
 		break;
 	case 1:		/* lower 2K of a 4K page table */
 	case 2:		/* higher 2K of a 4K page table */
-		if (atomic_xor_bits(&page->_mapcount, mask << 4) != 0)
+		mask = atomic_xor_bits(&page->_refcount, mask << (4 + 24));
+		mask >>= 24;
+		if (mask != 0)
 			break;
 		/* fallthrough */
 	case 3:		/* 4K page table with pgstes */
 		pgtable_page_dtor(page);
-		atomic_set(&page->_mapcount, -1);
 		__free_page(page);
 		break;
 	}
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 02/14] mm: Split page_type out from _mapcount
  2018-04-18 18:48 [PATCH v3 00/14] Rearrange struct page Matthew Wilcox
  2018-04-18 18:48 ` [PATCH v3 01/14] s390: Use _refcount for pgtables Matthew Wilcox
@ 2018-04-18 18:49 ` Matthew Wilcox
  2018-04-19  9:04   ` Vlastimil Babka
  2018-04-20 15:17   ` Christopher Lameter
  2018-04-18 18:49 ` [PATCH v3 03/14] mm: Mark pages in use for page tables Matthew Wilcox
                   ` (11 subsequent siblings)
  13 siblings, 2 replies; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-18 18:49 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

From: Matthew Wilcox <mawilcox@microsoft.com>

We're already using a union of many fields here, so stop abusing the
_mapcount and make page_type its own field.  That implies renaming some
of the machinery that creates PageBuddy, PageBalloon and PageKmemcg;
bring back the PG_buddy, PG_balloon and PG_kmemcg names.

As suggested by Kirill, make page_type a bitmask.  Because it starts out
life as -1 (thanks to sharing the storage with _mapcount), setting a
page flag means clearing the appropriate bit.  This gives us space for
probably twenty or so extra bits (depending how paranoid we want to be
about _mapcount underflow).

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm_types.h   | 13 ++++++-----
 include/linux/page-flags.h | 45 ++++++++++++++++++++++----------------
 kernel/crash_core.c        |  1 +
 mm/page_alloc.c            | 13 +++++------
 scripts/tags.sh            |  6 ++---
 5 files changed, 43 insertions(+), 35 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 21612347d311..41828fb34860 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -96,6 +96,14 @@ struct page {
 	};
 
 	union {
+		/*
+		 * If the page is neither PageSlab nor mappable to userspace,
+		 * the value stored here may help determine what this page
+		 * is used for.  See page-flags.h for a list of page types
+		 * which are currently stored here.
+		 */
+		unsigned int page_type;
+
 		_slub_counter_t counters;
 		unsigned int active;		/* SLAB */
 		struct {			/* SLUB */
@@ -109,11 +117,6 @@ struct page {
 			/*
 			 * Count of ptes mapped in mms, to show when
 			 * page is mapped & limit reverse map searches.
-			 *
-			 * Extra information about page type may be
-			 * stored here for pages that are never mapped,
-			 * in which case the value MUST BE <= -2.
-			 * See page-flags.h for more details.
 			 */
 			atomic_t _mapcount;
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e34a27727b9a..8c25b28a35aa 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -642,49 +642,56 @@ PAGEFLAG_FALSE(DoubleMap)
 #endif
 
 /*
- * For pages that are never mapped to userspace, page->mapcount may be
- * used for storing extra information about page type. Any value used
- * for this purpose must be <= -2, but it's better start not too close
- * to -2 so that an underflow of the page_mapcount() won't be mistaken
- * for a special page.
+ * For pages that are never mapped to userspace (and aren't PageSlab),
+ * page_type may be used.  Because it is initialised to -1, we invert the
+ * sense of the bit, so __SetPageFoo *clears* the bit used for PageFoo, and
+ * __ClearPageFoo *sets* the bit used for PageFoo.  We reserve a few high and
+ * low bits so that an underflow or overflow of page_mapcount() won't be
+ * mistaken for a page type value.
  */
-#define PAGE_MAPCOUNT_OPS(uname, lname)					\
+
+#define PAGE_TYPE_BASE	0xf0000000
+/* Reserve		0x0000007f to catch underflows of page_mapcount */
+#define PG_buddy	0x00000080
+#define PG_balloon	0x00000100
+#define PG_kmemcg	0x00000200
+
+#define PageType(page, flag)						\
+	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
+
+#define PAGE_TYPE_OPS(uname, lname)					\
 static __always_inline int Page##uname(struct page *page)		\
 {									\
-	return atomic_read(&page->_mapcount) ==				\
-				PAGE_##lname##_MAPCOUNT_VALUE;		\
+	return PageType(page, PG_##lname);				\
 }									\
 static __always_inline void __SetPage##uname(struct page *page)		\
 {									\
-	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);	\
-	atomic_set(&page->_mapcount, PAGE_##lname##_MAPCOUNT_VALUE);	\
+	VM_BUG_ON_PAGE(!PageType(page, 0), page);			\
+	page->page_type &= ~PG_##lname;					\
 }									\
 static __always_inline void __ClearPage##uname(struct page *page)	\
 {									\
 	VM_BUG_ON_PAGE(!Page##uname(page), page);			\
-	atomic_set(&page->_mapcount, -1);				\
+	page->page_type |= PG_##lname;					\
 }
 
 /*
- * PageBuddy() indicate that the page is free and in the buddy system
+ * PageBuddy() indicates that the page is free and in the buddy system
  * (see mm/page_alloc.c).
  */
-#define PAGE_BUDDY_MAPCOUNT_VALUE		(-128)
-PAGE_MAPCOUNT_OPS(Buddy, BUDDY)
+PAGE_TYPE_OPS(Buddy, buddy)
 
 /*
- * PageBalloon() is set on pages that are on the balloon page list
+ * PageBalloon() is true for pages that are on the balloon page list
  * (see mm/balloon_compaction.c).
  */
-#define PAGE_BALLOON_MAPCOUNT_VALUE		(-256)
-PAGE_MAPCOUNT_OPS(Balloon, BALLOON)
+PAGE_TYPE_OPS(Balloon, balloon)
 
 /*
  * If kmemcg is enabled, the buddy allocator will set PageKmemcg() on
  * pages allocated with __GFP_ACCOUNT. It gets cleared on page free.
  */
-#define PAGE_KMEMCG_MAPCOUNT_VALUE		(-512)
-PAGE_MAPCOUNT_OPS(Kmemcg, KMEMCG)
+PAGE_TYPE_OPS(Kmemcg, kmemcg)
 
 extern bool is_free_buddy_page(struct page *page);
 
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index f7674d676889..b66aced5e8c2 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -460,6 +460,7 @@ static int __init crash_save_vmcoreinfo_init(void)
 	VMCOREINFO_NUMBER(PG_hwpoison);
 #endif
 	VMCOREINFO_NUMBER(PG_head_mask);
+#define PAGE_BUDDY_MAPCOUNT_VALUE	(~PG_buddy)
 	VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
 #ifdef CONFIG_HUGETLB_PAGE
 	VMCOREINFO_NUMBER(HUGETLB_PAGE_DTOR);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7eebd6925b10..88e817d7ccef 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -706,16 +706,14 @@ static inline void rmv_page_order(struct page *page)
 
 /*
  * This function checks whether a page is free && is the buddy
- * we can do coalesce a page and its buddy if
+ * we can coalesce a page and its buddy if
  * (a) the buddy is not in a hole (check before calling!) &&
  * (b) the buddy is in the buddy system &&
  * (c) a page and its buddy have the same order &&
  * (d) a page and its buddy are in the same zone.
  *
- * For recording whether a page is in the buddy system, we set ->_mapcount
- * PAGE_BUDDY_MAPCOUNT_VALUE.
- * Setting, clearing, and testing _mapcount PAGE_BUDDY_MAPCOUNT_VALUE is
- * serialized by zone->lock.
+ * For recording whether a page is in the buddy system, we set PageBuddy.
+ * Setting, clearing, and testing PageBuddy is serialized by zone->lock.
  *
  * For recording page's order, we use page_private(page).
  */
@@ -760,9 +758,8 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
  * as necessary, plus some accounting needed to play nicely with other
  * parts of the VM system.
  * At each level, we keep a list of pages, which are heads of continuous
- * free pages of length of (1 << order) and marked with _mapcount
- * PAGE_BUDDY_MAPCOUNT_VALUE. Page's order is recorded in page_private(page)
- * field.
+ * free pages of length of (1 << order) and marked with PageBuddy.
+ * Page's order is recorded in page_private(page) field.
  * So when we are allocating or freeing one, we can derive the state of the
  * other.  That is, if we allocate a small block, and both were
  * free, the remainder of the region must be split into blocks.
diff --git a/scripts/tags.sh b/scripts/tags.sh
index 78e546ff689c..8c3ae36d4ea8 100755
--- a/scripts/tags.sh
+++ b/scripts/tags.sh
@@ -188,9 +188,9 @@ regex_c=(
 	'/\<CLEARPAGEFLAG_NOOP(\([[:alnum:]_]*\).*/ClearPage\1/'
 	'/\<__CLEARPAGEFLAG_NOOP(\([[:alnum:]_]*\).*/__ClearPage\1/'
 	'/\<TESTCLEARFLAG_FALSE(\([[:alnum:]_]*\).*/TestClearPage\1/'
-	'/^PAGE_MAPCOUNT_OPS(\([[:alnum:]_]*\).*/Page\1/'
-	'/^PAGE_MAPCOUNT_OPS(\([[:alnum:]_]*\).*/__SetPage\1/'
-	'/^PAGE_MAPCOUNT_OPS(\([[:alnum:]_]*\).*/__ClearPage\1/'
+	'/^PAGE_TYPE_OPS(\([[:alnum:]_]*\).*/Page\1/'
+	'/^PAGE_TYPE_OPS(\([[:alnum:]_]*\).*/__SetPage\1/'
+	'/^PAGE_TYPE_OPS(\([[:alnum:]_]*\).*/__ClearPage\1/'
 	'/^TASK_PFA_TEST([^,]*, *\([[:alnum:]_]*\))/task_\1/'
 	'/^TASK_PFA_SET([^,]*, *\([[:alnum:]_]*\))/task_set_\1/'
 	'/^TASK_PFA_CLEAR([^,]*, *\([[:alnum:]_]*\))/task_clear_\1/'
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 03/14] mm: Mark pages in use for page tables
  2018-04-18 18:48 [PATCH v3 00/14] Rearrange struct page Matthew Wilcox
  2018-04-18 18:48 ` [PATCH v3 01/14] s390: Use _refcount for pgtables Matthew Wilcox
  2018-04-18 18:49 ` [PATCH v3 02/14] mm: Split page_type out from _mapcount Matthew Wilcox
@ 2018-04-18 18:49 ` Matthew Wilcox
  2018-04-19  9:30   ` Vlastimil Babka
  2018-04-18 18:49 ` [PATCH v3 04/14] mm: Switch s_mem and slab_cache in struct page Matthew Wilcox
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-18 18:49 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

From: Matthew Wilcox <mawilcox@microsoft.com>

Define a new PageTable bit in the page_type and use it to mark pages in
use as page tables.  This can be helpful when debugging crashdumps or
analysing memory fragmentation.  Add a KPF flag to report these pages
to userspace and update page-types.c to interpret that flag.

Note that only pages currently accounted as NR_PAGETABLES are tracked
as PageTable; this does not include pgd/p4d/pud/pmd pages.  Those will
be the subject of a later patch.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 fs/proc/page.c                         | 2 ++
 include/linux/mm.h                     | 2 ++
 include/linux/page-flags.h             | 6 ++++++
 include/uapi/linux/kernel-page-flags.h | 2 +-
 tools/vm/page-types.c                  | 1 +
 5 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/proc/page.c b/fs/proc/page.c
index 1491918a33c3..792c78a49174 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -154,6 +154,8 @@ u64 stable_page_flags(struct page *page)
 
 	if (PageBalloon(page))
 		u |= 1 << KPF_BALLOON;
+	if (PageTable(page))
+		u |= 1 << KPF_PGTABLE;
 
 	if (page_is_idle(page))
 		u |= 1 << KPF_IDLE;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 974e8f8ffe03..5c6069219425 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1819,6 +1819,7 @@ static inline bool pgtable_page_ctor(struct page *page)
 {
 	if (!ptlock_init(page))
 		return false;
+	__SetPageTable(page);
 	inc_zone_page_state(page, NR_PAGETABLE);
 	return true;
 }
@@ -1826,6 +1827,7 @@ static inline bool pgtable_page_ctor(struct page *page)
 static inline void pgtable_page_dtor(struct page *page)
 {
 	pte_lock_deinit(page);
+	__ClearPageTable(page);
 	dec_zone_page_state(page, NR_PAGETABLE);
 }
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 8c25b28a35aa..901943e4754b 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -655,6 +655,7 @@ PAGEFLAG_FALSE(DoubleMap)
 #define PG_buddy	0x00000080
 #define PG_balloon	0x00000100
 #define PG_kmemcg	0x00000200
+#define PG_table	0x00000400
 
 #define PageType(page, flag)						\
 	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
@@ -693,6 +694,11 @@ PAGE_TYPE_OPS(Balloon, balloon)
  */
 PAGE_TYPE_OPS(Kmemcg, kmemcg)
 
+/*
+ * Marks pages in use as page tables.
+ */
+PAGE_TYPE_OPS(Table, table)
+
 extern bool is_free_buddy_page(struct page *page);
 
 __PAGEFLAG(Isolated, isolated, PF_ANY);
diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h
index fa139841ec18..21b9113c69da 100644
--- a/include/uapi/linux/kernel-page-flags.h
+++ b/include/uapi/linux/kernel-page-flags.h
@@ -35,6 +35,6 @@
 #define KPF_BALLOON		23
 #define KPF_ZERO_PAGE		24
 #define KPF_IDLE		25
-
+#define KPF_PGTABLE		26
 
 #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
index a8783f48f77f..cce853dca691 100644
--- a/tools/vm/page-types.c
+++ b/tools/vm/page-types.c
@@ -131,6 +131,7 @@ static const char * const page_flag_names[] = {
 	[KPF_KSM]		= "x:ksm",
 	[KPF_THP]		= "t:thp",
 	[KPF_BALLOON]		= "o:balloon",
+	[KPF_PGTABLE]		= "g:pgtable",
 	[KPF_ZERO_PAGE]		= "z:zero_page",
 	[KPF_IDLE]              = "i:idle_page",
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 04/14] mm: Switch s_mem and slab_cache in struct page
  2018-04-18 18:48 [PATCH v3 00/14] Rearrange struct page Matthew Wilcox
                   ` (2 preceding siblings ...)
  2018-04-18 18:49 ` [PATCH v3 03/14] mm: Mark pages in use for page tables Matthew Wilcox
@ 2018-04-18 18:49 ` Matthew Wilcox
  2018-04-19 11:06   ` Vlastimil Babka
  2018-04-18 18:49 ` [PATCH v3 05/14] mm: Move 'private' union within " Matthew Wilcox
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-18 18:49 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

From: Matthew Wilcox <mawilcox@microsoft.com>

slub now needs to set page->mapping to NULL as it frees the page, just
like slab does.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/mm_types.h | 4 ++--
 mm/slub.c                | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 41828fb34860..e97a310a6abe 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -83,7 +83,7 @@ struct page {
 		/* See page-flags.h for the definition of PAGE_MAPPING_FLAGS */
 		struct address_space *mapping;
 
-		void *s_mem;			/* slab first object */
+		struct kmem_cache *slab_cache;	/* SL[AU]B: Pointer to slab */
 		atomic_t compound_mapcount;	/* first tail page */
 		/* page_deferred_list().next	 -- second tail page */
 	};
@@ -194,7 +194,7 @@ struct page {
 		spinlock_t ptl;
 #endif
 #endif
-		struct kmem_cache *slab_cache;	/* SL[AU]B: Pointer to slab */
+		void *s_mem;			/* slab first object */
 	};
 
 #ifdef CONFIG_MEMCG
diff --git a/mm/slub.c b/mm/slub.c
index 099925cf456a..27b6ba1c116a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1690,6 +1690,7 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
 	__ClearPageSlab(page);
 
 	page_mapcount_reset(page);
+	page->mapping = NULL;
 	if (current->reclaim_state)
 		current->reclaim_state->reclaimed_slab += pages;
 	memcg_uncharge_slab(page, order, s);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 05/14] mm: Move 'private' union within struct page
  2018-04-18 18:48 [PATCH v3 00/14] Rearrange struct page Matthew Wilcox
                   ` (3 preceding siblings ...)
  2018-04-18 18:49 ` [PATCH v3 04/14] mm: Switch s_mem and slab_cache in struct page Matthew Wilcox
@ 2018-04-18 18:49 ` Matthew Wilcox
  2018-04-19 11:31   ` Vlastimil Babka
                     ` (2 more replies)
  2018-04-18 18:49 ` [PATCH v3 06/14] mm: Move _refcount out of struct page union Matthew Wilcox
                   ` (8 subsequent siblings)
  13 siblings, 3 replies; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-18 18:49 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

From: Matthew Wilcox <mawilcox@microsoft.com>

By moving page->private to the fourth word of struct page, we can put
the SLUB counters in the same word as SLAB's s_mem and still do the
cmpxchg_double trick.  Now the SLUB counters no longer overlap with
the refcount.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/mm_types.h | 54 ++++++++++++++++++----------------------
 1 file changed, 24 insertions(+), 30 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index e97a310a6abe..e83fef8c74d9 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -65,14 +65,8 @@ struct hmm;
  */
 #ifdef CONFIG_HAVE_ALIGNED_STRUCT_PAGE
 #define _struct_page_alignment	__aligned(2 * sizeof(unsigned long))
-#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE)
-#define _slub_counter_t		unsigned long
-#else
-#define _slub_counter_t		unsigned int
-#endif
 #else /* !CONFIG_HAVE_ALIGNED_STRUCT_PAGE */
 #define _struct_page_alignment
-#define _slub_counter_t		unsigned int
 #endif /* !CONFIG_HAVE_ALIGNED_STRUCT_PAGE */
 
 struct page {
@@ -95,6 +89,30 @@ struct page {
 		/* page_deferred_list().prev	-- second tail page */
 	};
 
+	union {
+		/*
+		 * Mapping-private opaque data:
+		 * Usually used for buffer_heads if PagePrivate
+		 * Used for swp_entry_t if PageSwapCache
+		 * Indicates order in the buddy system if PageBuddy
+		 */
+		unsigned long private;
+#if USE_SPLIT_PTE_PTLOCKS
+#if ALLOC_SPLIT_PTLOCKS
+		spinlock_t *ptl;
+#else
+		spinlock_t ptl;
+#endif
+#endif
+		void *s_mem;			/* slab first object */
+		unsigned long counters;		/* SLUB */
+		struct {			/* SLUB */
+			unsigned inuse:16;
+			unsigned objects:15;
+			unsigned frozen:1;
+		};
+	};
+
 	union {
 		/*
 		 * If the page is neither PageSlab nor mappable to userspace,
@@ -104,13 +122,7 @@ struct page {
 		 */
 		unsigned int page_type;
 
-		_slub_counter_t counters;
 		unsigned int active;		/* SLAB */
-		struct {			/* SLUB */
-			unsigned inuse:16;
-			unsigned objects:15;
-			unsigned frozen:1;
-		};
 		int units;			/* SLOB */
 
 		struct {			/* Page cache */
@@ -179,24 +191,6 @@ struct page {
 #endif
 	};
 
-	union {
-		/*
-		 * Mapping-private opaque data:
-		 * Usually used for buffer_heads if PagePrivate
-		 * Used for swp_entry_t if PageSwapCache
-		 * Indicates order in the buddy system if PageBuddy
-		 */
-		unsigned long private;
-#if USE_SPLIT_PTE_PTLOCKS
-#if ALLOC_SPLIT_PTLOCKS
-		spinlock_t *ptl;
-#else
-		spinlock_t ptl;
-#endif
-#endif
-		void *s_mem;			/* slab first object */
-	};
-
 #ifdef CONFIG_MEMCG
 	struct mem_cgroup *mem_cgroup;
 #endif
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 06/14] mm: Move _refcount out of struct page union
  2018-04-18 18:48 [PATCH v3 00/14] Rearrange struct page Matthew Wilcox
                   ` (4 preceding siblings ...)
  2018-04-18 18:49 ` [PATCH v3 05/14] mm: Move 'private' union within " Matthew Wilcox
@ 2018-04-18 18:49 ` Matthew Wilcox
  2018-04-19 11:37   ` Vlastimil Babka
  2018-04-30  9:40   ` Kirill A. Shutemov
  2018-04-18 18:49 ` [PATCH v3 07/14] slub: Remove page->counters Matthew Wilcox
                   ` (7 subsequent siblings)
  13 siblings, 2 replies; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-18 18:49 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

From: Matthew Wilcox <mawilcox@microsoft.com>

Keeping the refcount in the union only encourages people to put
something else in the union which will overlap with _refcount and
eventually explode messily.  pahole reports no fields change location.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/mm_types.h | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index e83fef8c74d9..9c048a512695 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -113,7 +113,13 @@ struct page {
 		};
 	};
 
-	union {
+	union {		/* This union is 4 bytes in size. */
+		/*
+		 * If the page can be mapped to userspace, encodes the number
+		 * of times this page is referenced by a page table.
+		 */
+		atomic_t _mapcount;
+
 		/*
 		 * If the page is neither PageSlab nor mappable to userspace,
 		 * the value stored here may help determine what this page
@@ -124,22 +130,11 @@ struct page {
 
 		unsigned int active;		/* SLAB */
 		int units;			/* SLOB */
-
-		struct {			/* Page cache */
-			/*
-			 * Count of ptes mapped in mms, to show when
-			 * page is mapped & limit reverse map searches.
-			 */
-			atomic_t _mapcount;
-
-			/*
-			 * Usage count, *USE WRAPPER FUNCTION* when manual
-			 * accounting. See page_ref.h
-			 */
-			atomic_t _refcount;
-		};
 	};
 
+	/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
+	atomic_t _refcount;
+
 	/*
 	 * WARNING: bit 0 of the first word encode PageTail(). That means
 	 * the rest users of the storage space MUST NOT use the bit to
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 07/14] slub: Remove page->counters
  2018-04-18 18:48 [PATCH v3 00/14] Rearrange struct page Matthew Wilcox
                   ` (5 preceding siblings ...)
  2018-04-18 18:49 ` [PATCH v3 06/14] mm: Move _refcount out of struct page union Matthew Wilcox
@ 2018-04-18 18:49 ` Matthew Wilcox
  2018-04-19 13:42   ` Vlastimil Babka
  2018-04-18 18:49 ` [PATCH v3 08/14] mm: Combine first three unions in struct page Matthew Wilcox
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-18 18:49 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

From: Matthew Wilcox <mawilcox@microsoft.com>

Use page->private instead, now that these two fields are in the same
location.  Include a compile-time assert that the fields don't get out
of sync.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/mm_types.h |  5 ++-
 mm/slub.c                | 68 ++++++++++++++++++----------------------
 2 files changed, 33 insertions(+), 40 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 9c048a512695..04d9dc442029 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -65,9 +65,9 @@ struct hmm;
  */
 #ifdef CONFIG_HAVE_ALIGNED_STRUCT_PAGE
 #define _struct_page_alignment	__aligned(2 * sizeof(unsigned long))
-#else /* !CONFIG_HAVE_ALIGNED_STRUCT_PAGE */
+#else
 #define _struct_page_alignment
-#endif /* !CONFIG_HAVE_ALIGNED_STRUCT_PAGE */
+#endif
 
 struct page {
 	/* First double word block */
@@ -105,7 +105,6 @@ struct page {
 #endif
 #endif
 		void *s_mem;			/* slab first object */
-		unsigned long counters;		/* SLUB */
 		struct {			/* SLUB */
 			unsigned inuse:16;
 			unsigned objects:15;
diff --git a/mm/slub.c b/mm/slub.c
index 27b6ba1c116a..f2f64568b25e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -55,8 +55,9 @@
  *   have the ability to do a cmpxchg_double. It only protects the second
  *   double word in the page struct. Meaning
  *	A. page->freelist	-> List of object free in a page
- *	B. page->counters	-> Counters of objects
- *	C. page->frozen		-> frozen state
+ *	B. page->inuse		-> Number of objects in use
+ *	C. page->objects	-> Number of objects in page
+ *	D. page->frozen		-> frozen state
  *
  *   If a slab is frozen then it is exempt from list management. It is not
  *   on any list. The processor that froze the slab is the one who can
@@ -358,17 +359,10 @@ static __always_inline void slab_unlock(struct page *page)
 
 static inline void set_page_slub_counters(struct page *page, unsigned long counters_new)
 {
-	struct page tmp;
-	tmp.counters = counters_new;
-	/*
-	 * page->counters can cover frozen/inuse/objects as well
-	 * as page->_refcount.  If we assign to ->counters directly
-	 * we run the risk of losing updates to page->_refcount, so
-	 * be careful and only assign to the fields we need.
-	 */
-	page->frozen  = tmp.frozen;
-	page->inuse   = tmp.inuse;
-	page->objects = tmp.objects;
+	BUILD_BUG_ON(offsetof(struct page, freelist) + sizeof(void *) !=
+			offsetof(struct page, private));
+	BUILD_BUG_ON(offsetof(struct page, freelist) % (2 * sizeof(void *)));
+	page->private = counters_new;
 }
 
 /* Interrupts must be disabled (for the fallback code to work right) */
@@ -381,7 +375,7 @@ static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct page *page
 #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
     defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
-		if (cmpxchg_double(&page->freelist, &page->counters,
+		if (cmpxchg_double(&page->freelist, &page->private,
 				   freelist_old, counters_old,
 				   freelist_new, counters_new))
 			return true;
@@ -390,7 +384,7 @@ static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct page *page
 	{
 		slab_lock(page);
 		if (page->freelist == freelist_old &&
-					page->counters == counters_old) {
+					page->private == counters_old) {
 			page->freelist = freelist_new;
 			set_page_slub_counters(page, counters_new);
 			slab_unlock(page);
@@ -417,7 +411,7 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
 #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
     defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
-		if (cmpxchg_double(&page->freelist, &page->counters,
+		if (cmpxchg_double(&page->freelist, &page->private,
 				   freelist_old, counters_old,
 				   freelist_new, counters_new))
 			return true;
@@ -429,7 +423,7 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
 		local_irq_save(flags);
 		slab_lock(page);
 		if (page->freelist == freelist_old &&
-					page->counters == counters_old) {
+					page->private == counters_old) {
 			page->freelist = freelist_new;
 			set_page_slub_counters(page, counters_new);
 			slab_unlock(page);
@@ -1788,8 +1782,8 @@ static inline void *acquire_slab(struct kmem_cache *s,
 	 * per cpu allocation list.
 	 */
 	freelist = page->freelist;
-	counters = page->counters;
-	new.counters = counters;
+	counters = page->private;
+	new.private = counters;
 	*objects = new.objects - new.inuse;
 	if (mode) {
 		new.inuse = page->objects;
@@ -1803,7 +1797,7 @@ static inline void *acquire_slab(struct kmem_cache *s,
 
 	if (!__cmpxchg_double_slab(s, page,
 			freelist, counters,
-			new.freelist, new.counters,
+			new.freelist, new.private,
 			"acquire_slab"))
 		return NULL;
 
@@ -2050,15 +2044,15 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page,
 
 		do {
 			prior = page->freelist;
-			counters = page->counters;
+			counters = page->private;
 			set_freepointer(s, freelist, prior);
-			new.counters = counters;
+			new.private = counters;
 			new.inuse--;
 			VM_BUG_ON(!new.frozen);
 
 		} while (!__cmpxchg_double_slab(s, page,
 			prior, counters,
-			freelist, new.counters,
+			freelist, new.private,
 			"drain percpu freelist"));
 
 		freelist = nextfree;
@@ -2081,11 +2075,11 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page,
 redo:
 
 	old.freelist = page->freelist;
-	old.counters = page->counters;
+	old.private = page->private;
 	VM_BUG_ON(!old.frozen);
 
 	/* Determine target state of the slab */
-	new.counters = old.counters;
+	new.private = old.private;
 	if (freelist) {
 		new.inuse--;
 		set_freepointer(s, freelist, old.freelist);
@@ -2146,8 +2140,8 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page,
 
 	l = m;
 	if (!__cmpxchg_double_slab(s, page,
-				old.freelist, old.counters,
-				new.freelist, new.counters,
+				old.freelist, old.private,
+				new.freelist, new.private,
 				"unfreezing slab"))
 		goto redo;
 
@@ -2196,17 +2190,17 @@ static void unfreeze_partials(struct kmem_cache *s,
 		do {
 
 			old.freelist = page->freelist;
-			old.counters = page->counters;
+			old.private = page->private;
 			VM_BUG_ON(!old.frozen);
 
-			new.counters = old.counters;
+			new.private = old.private;
 			new.freelist = old.freelist;
 
 			new.frozen = 0;
 
 		} while (!__cmpxchg_double_slab(s, page,
-				old.freelist, old.counters,
-				new.freelist, new.counters,
+				old.freelist, old.private,
+				new.freelist, new.private,
 				"unfreezing slab"));
 
 		if (unlikely(!new.inuse && n->nr_partial >= s->min_partial)) {
@@ -2495,9 +2489,9 @@ static inline void *get_freelist(struct kmem_cache *s, struct page *page)
 
 	do {
 		freelist = page->freelist;
-		counters = page->counters;
+		counters = page->private;
 
-		new.counters = counters;
+		new.private = counters;
 		VM_BUG_ON(!new.frozen);
 
 		new.inuse = page->objects;
@@ -2505,7 +2499,7 @@ static inline void *get_freelist(struct kmem_cache *s, struct page *page)
 
 	} while (!__cmpxchg_double_slab(s, page,
 		freelist, counters,
-		NULL, new.counters,
+		NULL, new.private,
 		"get_freelist"));
 
 	return freelist;
@@ -2830,9 +2824,9 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
 			n = NULL;
 		}
 		prior = page->freelist;
-		counters = page->counters;
+		counters = page->private;
 		set_freepointer(s, tail, prior);
-		new.counters = counters;
+		new.private = counters;
 		was_frozen = new.frozen;
 		new.inuse -= cnt;
 		if ((!new.inuse || !prior) && !was_frozen) {
@@ -2865,7 +2859,7 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
 
 	} while (!cmpxchg_double_slab(s, page,
 		prior, counters,
-		head, new.counters,
+		head, new.private,
 		"__slab_free"));
 
 	if (likely(!n)) {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 08/14] mm: Combine first three unions in struct page
  2018-04-18 18:48 [PATCH v3 00/14] Rearrange struct page Matthew Wilcox
                   ` (6 preceding siblings ...)
  2018-04-18 18:49 ` [PATCH v3 07/14] slub: Remove page->counters Matthew Wilcox
@ 2018-04-18 18:49 ` Matthew Wilcox
  2018-04-19 13:46   ` Vlastimil Babka
  2018-04-30  9:42   ` Kirill A. Shutemov
  2018-04-18 18:49 ` [PATCH v3 09/14] mm: Use page->deferred_list Matthew Wilcox
                   ` (5 subsequent siblings)
  13 siblings, 2 replies; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-18 18:49 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

From: Matthew Wilcox <mawilcox@microsoft.com>

By combining these three one-word unions into one three-word union,
we make it easier for users to add their own multi-word fields to struct
page, as well as making it obvious that SLUB needs to keep its double-word
alignment for its freelist & counters.

No field moves position; verified with pahole.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/mm_types.h | 65 ++++++++++++++++++++--------------------
 1 file changed, 32 insertions(+), 33 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 04d9dc442029..39521b8385c1 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -70,45 +70,44 @@ struct hmm;
 #endif
 
 struct page {
-	/* First double word block */
 	unsigned long flags;		/* Atomic flags, some possibly
 					 * updated asynchronously */
-	union {
-		/* See page-flags.h for the definition of PAGE_MAPPING_FLAGS */
-		struct address_space *mapping;
-
-		struct kmem_cache *slab_cache;	/* SL[AU]B: Pointer to slab */
+	union {		/* This union is three words (12/24 bytes) in size */
+		struct {	/* Page cache and anonymous pages */
+			/* See page-flags.h for PAGE_MAPPING_FLAGS */
+			struct address_space *mapping;
+			pgoff_t index;		/* Our offset within mapping. */
+			/**
+			 * @private: Mapping-private opaque data.
+			 * Usually used for buffer_heads if PagePrivate.
+			 * Used for swp_entry_t if PageSwapCache.
+			 * Indicates order in the buddy system if PageBuddy.
+			 */
+			unsigned long private;
+		};
+		struct {	/* slab and slob */
+			struct kmem_cache *slab_cache;
+			void *freelist;		/* first free object */
+			void *s_mem;		/* first object */
+		};
+		struct {	/* slub also uses some of the slab fields */
+			struct kmem_cache *slub_cache;
+			/* Double-word boundary */
+			void *slub_freelist;
+			unsigned inuse:16;
+			unsigned objects:15;
+			unsigned frozen:1;
+		};
 		atomic_t compound_mapcount;	/* first tail page */
-		/* page_deferred_list().next	 -- second tail page */
-	};
-
-	/* Second double word */
-	union {
-		pgoff_t index;		/* Our offset within mapping. */
-		void *freelist;		/* sl[aou]b first free object */
-		/* page_deferred_list().prev	-- second tail page */
-	};
-
-	union {
-		/*
-		 * Mapping-private opaque data:
-		 * Usually used for buffer_heads if PagePrivate
-		 * Used for swp_entry_t if PageSwapCache
-		 * Indicates order in the buddy system if PageBuddy
-		 */
-		unsigned long private;
-#if USE_SPLIT_PTE_PTLOCKS
+		struct list_head deferred_list; /* second tail page */
+		struct {	/* Page table pages */
+			unsigned long _ptl_pad_1;
+			unsigned long _ptl_pad_2;
 #if ALLOC_SPLIT_PTLOCKS
-		spinlock_t *ptl;
+			spinlock_t *ptl;
 #else
-		spinlock_t ptl;
-#endif
+			spinlock_t ptl;
 #endif
-		void *s_mem;			/* slab first object */
-		struct {			/* SLUB */
-			unsigned inuse:16;
-			unsigned objects:15;
-			unsigned frozen:1;
 		};
 	};
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 09/14] mm: Use page->deferred_list
  2018-04-18 18:48 [PATCH v3 00/14] Rearrange struct page Matthew Wilcox
                   ` (7 preceding siblings ...)
  2018-04-18 18:49 ` [PATCH v3 08/14] mm: Combine first three unions in struct page Matthew Wilcox
@ 2018-04-18 18:49 ` Matthew Wilcox
  2018-04-19 13:23   ` Vlastimil Babka
  2018-04-30  9:43   ` Kirill A. Shutemov
  2018-04-18 18:49 ` [PATCH v3 10/14] mm: Move lru union within struct page Matthew Wilcox
                   ` (4 subsequent siblings)
  13 siblings, 2 replies; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-18 18:49 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

From: Matthew Wilcox <mawilcox@microsoft.com>

Now that we can represent the location of 'deferred_list' in C instead
of comments, make use of that ability.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 mm/huge_memory.c | 7 ++-----
 mm/page_alloc.c  | 2 +-
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 14ed6ee5e02f..55ad852fbf17 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -483,11 +483,8 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
 
 static inline struct list_head *page_deferred_list(struct page *page)
 {
-	/*
-	 * ->lru in the tail pages is occupied by compound_head.
-	 * Let's use ->mapping + ->index in the second tail page as list_head.
-	 */
-	return (struct list_head *)&page[2].mapping;
+	/* ->lru in the tail pages is occupied by compound_head. */
+	return &page[2].deferred_list;
 }
 
 void prep_transhuge_page(struct page *page)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 88e817d7ccef..18720eccbce1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -953,7 +953,7 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
 	case 2:
 		/*
 		 * the second tail page: ->mapping is
-		 * page_deferred_list().next -- ignore value.
+		 * deferred_list.next -- ignore value.
 		 */
 		break;
 	default:
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 10/14] mm: Move lru union within struct page
  2018-04-18 18:48 [PATCH v3 00/14] Rearrange struct page Matthew Wilcox
                   ` (8 preceding siblings ...)
  2018-04-18 18:49 ` [PATCH v3 09/14] mm: Use page->deferred_list Matthew Wilcox
@ 2018-04-18 18:49 ` Matthew Wilcox
  2018-04-19 13:56   ` Vlastimil Babka
  2018-04-30  9:44   ` Kirill A. Shutemov
  2018-04-18 18:49 ` [PATCH v3 11/14] mm: Combine first two unions in " Matthew Wilcox
                   ` (3 subsequent siblings)
  13 siblings, 2 replies; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-18 18:49 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

From: Matthew Wilcox <mawilcox@microsoft.com>

Since the LRU is two words, this does not affect the double-word
alignment of SLUB's freelist.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/mm_types.h | 102 +++++++++++++++++++--------------------
 1 file changed, 51 insertions(+), 51 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 39521b8385c1..230d473f16da 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -72,6 +72,57 @@ struct hmm;
 struct page {
 	unsigned long flags;		/* Atomic flags, some possibly
 					 * updated asynchronously */
+	/*
+	 * WARNING: bit 0 of the first word encode PageTail(). That means
+	 * the rest users of the storage space MUST NOT use the bit to
+	 * avoid collision and false-positive PageTail().
+	 */
+	union {
+		struct list_head lru;	/* Pageout list, eg. active_list
+					 * protected by zone_lru_lock !
+					 * Can be used as a generic list
+					 * by the page owner.
+					 */
+		struct dev_pagemap *pgmap; /* ZONE_DEVICE pages are never on an
+					    * lru or handled by a slab
+					    * allocator, this points to the
+					    * hosting device page map.
+					    */
+		struct {		/* slub per cpu partial pages */
+			struct page *next;	/* Next partial slab */
+#ifdef CONFIG_64BIT
+			int pages;	/* Nr of partial slabs left */
+			int pobjects;	/* Approximate # of objects */
+#else
+			short int pages;
+			short int pobjects;
+#endif
+		};
+
+		struct rcu_head rcu_head;	/* Used by SLAB
+						 * when destroying via RCU
+						 */
+		/* Tail pages of compound page */
+		struct {
+			unsigned long compound_head; /* If bit zero is set */
+
+			/* First tail page only */
+			unsigned char compound_dtor;
+			unsigned char compound_order;
+			/* two/six bytes available here */
+		};
+
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
+		struct {
+			unsigned long __pad;	/* do not overlay pmd_huge_pte
+						 * with compound_head to avoid
+						 * possible bit 0 collision.
+						 */
+			pgtable_t pmd_huge_pte; /* protected by page->ptl */
+		};
+#endif
+	};
+
 	union {		/* This union is three words (12/24 bytes) in size */
 		struct {	/* Page cache and anonymous pages */
 			/* See page-flags.h for PAGE_MAPPING_FLAGS */
@@ -133,57 +184,6 @@ struct page {
 	/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
 	atomic_t _refcount;
 
-	/*
-	 * WARNING: bit 0 of the first word encode PageTail(). That means
-	 * the rest users of the storage space MUST NOT use the bit to
-	 * avoid collision and false-positive PageTail().
-	 */
-	union {
-		struct list_head lru;	/* Pageout list, eg. active_list
-					 * protected by zone_lru_lock !
-					 * Can be used as a generic list
-					 * by the page owner.
-					 */
-		struct dev_pagemap *pgmap; /* ZONE_DEVICE pages are never on an
-					    * lru or handled by a slab
-					    * allocator, this points to the
-					    * hosting device page map.
-					    */
-		struct {		/* slub per cpu partial pages */
-			struct page *next;	/* Next partial slab */
-#ifdef CONFIG_64BIT
-			int pages;	/* Nr of partial slabs left */
-			int pobjects;	/* Approximate # of objects */
-#else
-			short int pages;
-			short int pobjects;
-#endif
-		};
-
-		struct rcu_head rcu_head;	/* Used by SLAB
-						 * when destroying via RCU
-						 */
-		/* Tail pages of compound page */
-		struct {
-			unsigned long compound_head; /* If bit zero is set */
-
-			/* First tail page only */
-			unsigned char compound_dtor;
-			unsigned char compound_order;
-			/* two/six bytes available here */
-		};
-
-#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
-		struct {
-			unsigned long __pad;	/* do not overlay pmd_huge_pte
-						 * with compound_head to avoid
-						 * possible bit 0 collision.
-						 */
-			pgtable_t pmd_huge_pte; /* protected by page->ptl */
-		};
-#endif
-	};
-
 #ifdef CONFIG_MEMCG
 	struct mem_cgroup *mem_cgroup;
 #endif
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 11/14] mm: Combine first two unions in struct page
  2018-04-18 18:48 [PATCH v3 00/14] Rearrange struct page Matthew Wilcox
                   ` (9 preceding siblings ...)
  2018-04-18 18:49 ` [PATCH v3 10/14] mm: Move lru union within struct page Matthew Wilcox
@ 2018-04-18 18:49 ` Matthew Wilcox
  2018-04-19 14:03   ` Vlastimil Babka
  2018-04-30  9:47   ` Kirill A. Shutemov
  2018-04-18 18:49 ` [PATCH v3 12/14] mm: Improve struct page documentation Matthew Wilcox
                   ` (2 subsequent siblings)
  13 siblings, 2 replies; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-18 18:49 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

From: Matthew Wilcox <mawilcox@microsoft.com>

This gives us five words of space in a single union in struct page.
The compound_mapcount moves position (from offset 24 to offset 20)
on 64-bit systems, but that does not seem likely to cause any trouble.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/mm_types.h | 96 ++++++++++++++++++----------------------
 mm/page_alloc.c          |  2 +-
 2 files changed, 45 insertions(+), 53 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 230d473f16da..080ea97ad444 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -73,58 +73,19 @@ struct page {
 	unsigned long flags;		/* Atomic flags, some possibly
 					 * updated asynchronously */
 	/*
-	 * WARNING: bit 0 of the first word encode PageTail(). That means
-	 * the rest users of the storage space MUST NOT use the bit to
+	 * Five words (20/40 bytes) are available in this union.
+	 * WARNING: bit 0 of the first word is used for PageTail(). That
+	 * means the other users of this union MUST NOT use the bit to
 	 * avoid collision and false-positive PageTail().
 	 */
 	union {
-		struct list_head lru;	/* Pageout list, eg. active_list
-					 * protected by zone_lru_lock !
-					 * Can be used as a generic list
-					 * by the page owner.
-					 */
-		struct dev_pagemap *pgmap; /* ZONE_DEVICE pages are never on an
-					    * lru or handled by a slab
-					    * allocator, this points to the
-					    * hosting device page map.
-					    */
-		struct {		/* slub per cpu partial pages */
-			struct page *next;	/* Next partial slab */
-#ifdef CONFIG_64BIT
-			int pages;	/* Nr of partial slabs left */
-			int pobjects;	/* Approximate # of objects */
-#else
-			short int pages;
-			short int pobjects;
-#endif
-		};
-
-		struct rcu_head rcu_head;	/* Used by SLAB
-						 * when destroying via RCU
-						 */
-		/* Tail pages of compound page */
-		struct {
-			unsigned long compound_head; /* If bit zero is set */
-
-			/* First tail page only */
-			unsigned char compound_dtor;
-			unsigned char compound_order;
-			/* two/six bytes available here */
-		};
-
-#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
-		struct {
-			unsigned long __pad;	/* do not overlay pmd_huge_pte
-						 * with compound_head to avoid
-						 * possible bit 0 collision.
-						 */
-			pgtable_t pmd_huge_pte; /* protected by page->ptl */
-		};
-#endif
-	};
-
-	union {		/* This union is three words (12/24 bytes) in size */
 		struct {	/* Page cache and anonymous pages */
+			/**
+			 * @lru: Pageout list, eg. active_list protected by
+			 * zone_lru_lock.  Sometimes used as a generic list
+			 * by the page owner.
+			 */
+			struct list_head lru;
 			/* See page-flags.h for PAGE_MAPPING_FLAGS */
 			struct address_space *mapping;
 			pgoff_t index;		/* Our offset within mapping. */
@@ -137,11 +98,20 @@ struct page {
 			unsigned long private;
 		};
 		struct {	/* slab and slob */
+			struct list_head slab_list;
 			struct kmem_cache *slab_cache;
 			void *freelist;		/* first free object */
 			void *s_mem;		/* first object */
 		};
 		struct {	/* slub also uses some of the slab fields */
+			struct page *next;	/* Next partial slab */
+#ifdef CONFIG_64BIT
+			int pages;	/* Nr of partial slabs left */
+			int pobjects;	/* Approximate # of objects */
+#else
+			short int pages;
+			short int pobjects;
+#endif
 			struct kmem_cache *slub_cache;
 			/* Double-word boundary */
 			void *slub_freelist;
@@ -149,17 +119,39 @@ struct page {
 			unsigned objects:15;
 			unsigned frozen:1;
 		};
-		atomic_t compound_mapcount;	/* first tail page */
-		struct list_head deferred_list; /* second tail page */
+		struct {	/* Tail pages of compound page */
+			unsigned long compound_head;	/* Bit zero is set */
+
+			/* First tail page only */
+			unsigned char compound_dtor;
+			unsigned char compound_order;
+			atomic_t compound_mapcount;
+		};
+		struct {	/* Second tail page of compound page */
+			unsigned long _compound_pad_1;	/* compound_head */
+			unsigned long _compound_pad_2;
+			struct list_head deferred_list;
+		};
 		struct {	/* Page table pages */
-			unsigned long _ptl_pad_1;
-			unsigned long _ptl_pad_2;
+			unsigned long _pt_pad_1;	/* compound_head */
+			pgtable_t pmd_huge_pte; /* protected by page->ptl */
+			unsigned long _pt_pad_2;
+			unsigned long _pt_pad_3;
 #if ALLOC_SPLIT_PTLOCKS
 			spinlock_t *ptl;
 #else
 			spinlock_t ptl;
 #endif
 		};
+
+		/** @rcu_head: You can use this to free a page by RCU. */
+		struct rcu_head rcu_head;
+
+		/**
+		 * @pgmap: For ZONE_DEVICE pages, this points to the hosting
+		 * device page map.
+		 */
+		struct dev_pagemap *pgmap;
 	};
 
 	union {		/* This union is 4 bytes in size. */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 18720eccbce1..d1e4df7d57bf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -944,7 +944,7 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
 	}
 	switch (page - head_page) {
 	case 1:
-		/* the first tail page: ->mapping is compound_mapcount() */
+		/* the first tail page: ->mapping may be compound_mapcount() */
 		if (unlikely(compound_mapcount(page))) {
 			bad_page(page, "nonzero compound_mapcount", 0);
 			goto out;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 12/14] mm: Improve struct page documentation
  2018-04-18 18:48 [PATCH v3 00/14] Rearrange struct page Matthew Wilcox
                   ` (10 preceding siblings ...)
  2018-04-18 18:49 ` [PATCH v3 11/14] mm: Combine first two unions in " Matthew Wilcox
@ 2018-04-18 18:49 ` Matthew Wilcox
  2018-04-18 23:32   ` Randy Dunlap
  2018-04-18 18:49 ` [PATCH v3 13/14] slab,slub: Remove rcu_head size checks Matthew Wilcox
  2018-04-18 18:49 ` [PATCH v3 14/14] slub: Remove kmem_cache->reserved Matthew Wilcox
  13 siblings, 1 reply; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-18 18:49 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

From: Matthew Wilcox <mawilcox@microsoft.com>

Rewrite the documentation to describe what you can use in struct
page rather than what you can't.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/mm_types.h | 40 +++++++++++++++++++---------------------
 1 file changed, 19 insertions(+), 21 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 080ea97ad444..13c25b16913d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -33,29 +33,27 @@ struct hmm;
  * it to keep track of whatever it is we are using the page for at the
  * moment. Note that we have no way to track which tasks are using
  * a page, though if it is a pagecache page, rmap structures can tell us
- * who is mapping it. If you allocate the page using alloc_pages(), you
- * can use some of the space in struct page for your own purposes.
+ * who is mapping it.
  *
- * Pages that were once in the page cache may be found under the RCU lock
- * even after they have been recycled to a different purpose.  The page
- * cache reads and writes some of the fields in struct page to pin the
- * page before checking that it's still in the page cache.  It is vital
- * that all users of struct page:
- * 1. Use the first word as PageFlags.
- * 2. Clear or preserve bit 0 of page->compound_head.  It is used as
- *    PageTail for compound pages, and the page cache must not see false
- *    positives.  Some users put a pointer here (guaranteed to be at least
- *    4-byte aligned), other users avoid using the field altogether.
- * 3. page->_refcount must either not be used, or must be used in such a
- *    way that other CPUs temporarily incrementing and then decrementing the
- *    refcount does not cause problems.  On receiving the page from
- *    alloc_pages(), the refcount will be positive.
- * 4. Either preserve page->_mapcount or restore it to -1 before freeing it.
+ * If you allocate the page using alloc_pages(), you can use some of the
+ * space in struct page for your own purposes.  The five words in the first
+ * union are available, except for bit 0 of the first word which must be
+ * kept clear.  Many users use this word to store a pointer to an object
+ * which is guaranteed to be aligned.  If you use the same storage as
+ * page->mapping, you must restore it to NULL before freeing the page.
  *
- * If you allocate pages of order > 0, you can use the fields in the struct
- * page associated with each page, but bear in mind that the pages may have
- * been inserted individually into the page cache, so you must use the above
- * four fields in a compatible way for each struct page.
+ * If your page will not be mapped to userspace, you can also use the 4
+ * bytes in the second union, but you must call page_mapcount_reset()
+ * before freeing it.
+ *
+ * If you want to use the refcount field, it must be used in such a way
+ * that other CPUs temporarily incrementing and then decrementing the
+ * refcount does not cause problems.  On receiving the page from
+ * alloc_pages(), the refcount will be positive.
+ *
+ * If you allocate pages of order > 0, you can use some of the fields
+ * in each subpage, but you may need to restore some of their values
+ * afterwards.
  *
  * SLUB uses cmpxchg_double() to atomically update its freelist and
  * counters.  That requires that freelist & counters be adjacent and
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 13/14] slab,slub: Remove rcu_head size checks
  2018-04-18 18:48 [PATCH v3 00/14] Rearrange struct page Matthew Wilcox
                   ` (11 preceding siblings ...)
  2018-04-18 18:49 ` [PATCH v3 12/14] mm: Improve struct page documentation Matthew Wilcox
@ 2018-04-18 18:49 ` Matthew Wilcox
  2018-04-18 18:49 ` [PATCH v3 14/14] slub: Remove kmem_cache->reserved Matthew Wilcox
  13 siblings, 0 replies; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-18 18:49 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

From: Matthew Wilcox <mawilcox@microsoft.com>

rcu_head may now grow larger than list_head without affecting slab or
slub.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 mm/slab.c |  2 --
 mm/slub.c | 27 ++-------------------------
 2 files changed, 2 insertions(+), 27 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index e387a17d6d56..e6ab1327db25 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1235,8 +1235,6 @@ void __init kmem_cache_init(void)
 {
 	int i;
 
-	BUILD_BUG_ON(sizeof(((struct page *)NULL)->lru) <
-					sizeof(struct rcu_head));
 	kmem_cache = &kmem_cache_boot;
 
 	if (!IS_ENABLED(CONFIG_NUMA) || num_possible_nodes() == 1)
diff --git a/mm/slub.c b/mm/slub.c
index f2f64568b25e..8af8ce91062a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1691,17 +1691,9 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
 	__free_pages(page, order);
 }
 
-#define need_reserve_slab_rcu						\
-	(sizeof(((struct page *)NULL)->lru) < sizeof(struct rcu_head))
-
 static void rcu_free_slab(struct rcu_head *h)
 {
-	struct page *page;
-
-	if (need_reserve_slab_rcu)
-		page = virt_to_head_page(h);
-	else
-		page = container_of((struct list_head *)h, struct page, lru);
+	struct page *page = container_of(h, struct page, rcu_head);
 
 	__free_slab(page->slab_cache, page);
 }
@@ -1709,19 +1701,7 @@ static void rcu_free_slab(struct rcu_head *h)
 static void free_slab(struct kmem_cache *s, struct page *page)
 {
 	if (unlikely(s->flags & SLAB_TYPESAFE_BY_RCU)) {
-		struct rcu_head *head;
-
-		if (need_reserve_slab_rcu) {
-			int order = compound_order(page);
-			int offset = (PAGE_SIZE << order) - s->reserved;
-
-			VM_BUG_ON(s->reserved != sizeof(*head));
-			head = page_address(page) + offset;
-		} else {
-			head = &page->rcu_head;
-		}
-
-		call_rcu(head, rcu_free_slab);
+		call_rcu(&page->rcu_head, rcu_free_slab);
 	} else
 		__free_slab(s, page);
 }
@@ -3588,9 +3568,6 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
 	s->random = get_random_long();
 #endif
 
-	if (need_reserve_slab_rcu && (s->flags & SLAB_TYPESAFE_BY_RCU))
-		s->reserved = sizeof(struct rcu_head);
-
 	if (!calculate_sizes(s, -1))
 		goto error;
 	if (disable_higher_order_debug) {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 14/14] slub: Remove kmem_cache->reserved
  2018-04-18 18:48 [PATCH v3 00/14] Rearrange struct page Matthew Wilcox
                   ` (12 preceding siblings ...)
  2018-04-18 18:49 ` [PATCH v3 13/14] slab,slub: Remove rcu_head size checks Matthew Wilcox
@ 2018-04-18 18:49 ` Matthew Wilcox
  13 siblings, 0 replies; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-18 18:49 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

From: Matthew Wilcox <mawilcox@microsoft.com>

The reserved field was only used for embedding an rcu_head in the data
structure.  With the previous commit, we no longer need it.  That lets
us remove the 'reserved' argument to a lot of functions.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/slub_def.h |  1 -
 mm/slub.c                | 41 ++++++++++++++++++++--------------------
 2 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 3773e26c08c1..09fa2c6f0e68 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -101,7 +101,6 @@ struct kmem_cache {
 	void (*ctor)(void *);
 	unsigned int inuse;		/* Offset to metadata */
 	unsigned int align;		/* Alignment */
-	unsigned int reserved;		/* Reserved bytes at the end of slabs */
 	unsigned int red_left_pad;	/* Left redzone padding size */
 	const char *name;	/* Name (only for display!) */
 	struct list_head list;	/* List of slab caches */
diff --git a/mm/slub.c b/mm/slub.c
index 8af8ce91062a..53aa459d2343 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -317,16 +317,16 @@ static inline unsigned int slab_index(void *p, struct kmem_cache *s, void *addr)
 	return (p - addr) / s->size;
 }
 
-static inline unsigned int order_objects(unsigned int order, unsigned int size, unsigned int reserved)
+static inline unsigned int order_objects(unsigned int order, unsigned int size)
 {
-	return (((unsigned int)PAGE_SIZE << order) - reserved) / size;
+	return ((unsigned int)PAGE_SIZE << order) / size;
 }
 
 static inline struct kmem_cache_order_objects oo_make(unsigned int order,
-		unsigned int size, unsigned int reserved)
+		unsigned int size)
 {
 	struct kmem_cache_order_objects x = {
-		(order << OO_SHIFT) + order_objects(order, size, reserved)
+		(order << OO_SHIFT) + order_objects(order, size)
 	};
 
 	return x;
@@ -841,7 +841,7 @@ static int slab_pad_check(struct kmem_cache *s, struct page *page)
 		return 1;
 
 	start = page_address(page);
-	length = (PAGE_SIZE << compound_order(page)) - s->reserved;
+	length = PAGE_SIZE << compound_order(page);
 	end = start + length;
 	remainder = length % s->size;
 	if (!remainder)
@@ -930,7 +930,7 @@ static int check_slab(struct kmem_cache *s, struct page *page)
 		return 0;
 	}
 
-	maxobj = order_objects(compound_order(page), s->size, s->reserved);
+	maxobj = order_objects(compound_order(page), s->size);
 	if (page->objects > maxobj) {
 		slab_err(s, page, "objects %u > max %u",
 			page->objects, maxobj);
@@ -980,7 +980,7 @@ static int on_freelist(struct kmem_cache *s, struct page *page, void *search)
 		nr++;
 	}
 
-	max_objects = order_objects(compound_order(page), s->size, s->reserved);
+	max_objects = order_objects(compound_order(page), s->size);
 	if (max_objects > MAX_OBJS_PER_PAGE)
 		max_objects = MAX_OBJS_PER_PAGE;
 
@@ -3198,21 +3198,21 @@ static unsigned int slub_min_objects;
  */
 static inline unsigned int slab_order(unsigned int size,
 		unsigned int min_objects, unsigned int max_order,
-		unsigned int fract_leftover, unsigned int reserved)
+		unsigned int fract_leftover)
 {
 	unsigned int min_order = slub_min_order;
 	unsigned int order;
 
-	if (order_objects(min_order, size, reserved) > MAX_OBJS_PER_PAGE)
+	if (order_objects(min_order, size) > MAX_OBJS_PER_PAGE)
 		return get_order(size * MAX_OBJS_PER_PAGE) - 1;
 
-	for (order = max(min_order, (unsigned int)get_order(min_objects * size + reserved));
+	for (order = max(min_order, (unsigned int)get_order(min_objects * size));
 			order <= max_order; order++) {
 
 		unsigned int slab_size = (unsigned int)PAGE_SIZE << order;
 		unsigned int rem;
 
-		rem = (slab_size - reserved) % size;
+		rem = slab_size % size;
 
 		if (rem <= slab_size / fract_leftover)
 			break;
@@ -3221,7 +3221,7 @@ static inline unsigned int slab_order(unsigned int size,
 	return order;
 }
 
-static inline int calculate_order(unsigned int size, unsigned int reserved)
+static inline int calculate_order(unsigned int size)
 {
 	unsigned int order;
 	unsigned int min_objects;
@@ -3238,7 +3238,7 @@ static inline int calculate_order(unsigned int size, unsigned int reserved)
 	min_objects = slub_min_objects;
 	if (!min_objects)
 		min_objects = 4 * (fls(nr_cpu_ids) + 1);
-	max_objects = order_objects(slub_max_order, size, reserved);
+	max_objects = order_objects(slub_max_order, size);
 	min_objects = min(min_objects, max_objects);
 
 	while (min_objects > 1) {
@@ -3247,7 +3247,7 @@ static inline int calculate_order(unsigned int size, unsigned int reserved)
 		fraction = 16;
 		while (fraction >= 4) {
 			order = slab_order(size, min_objects,
-					slub_max_order, fraction, reserved);
+					slub_max_order, fraction);
 			if (order <= slub_max_order)
 				return order;
 			fraction /= 2;
@@ -3259,14 +3259,14 @@ static inline int calculate_order(unsigned int size, unsigned int reserved)
 	 * We were unable to place multiple objects in a slab. Now
 	 * lets see if we can place a single object there.
 	 */
-	order = slab_order(size, 1, slub_max_order, 1, reserved);
+	order = slab_order(size, 1, slub_max_order, 1);
 	if (order <= slub_max_order)
 		return order;
 
 	/*
 	 * Doh this slab cannot be placed using slub_max_order.
 	 */
-	order = slab_order(size, 1, MAX_ORDER, 1, reserved);
+	order = slab_order(size, 1, MAX_ORDER, 1);
 	if (order < MAX_ORDER)
 		return order;
 	return -ENOSYS;
@@ -3534,7 +3534,7 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
 	if (forced_order >= 0)
 		order = forced_order;
 	else
-		order = calculate_order(size, s->reserved);
+		order = calculate_order(size);
 
 	if ((int)order < 0)
 		return 0;
@@ -3552,8 +3552,8 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
 	/*
 	 * Determine the number of objects per slab
 	 */
-	s->oo = oo_make(order, size, s->reserved);
-	s->min = oo_make(get_order(size), size, s->reserved);
+	s->oo = oo_make(order, size);
+	s->min = oo_make(get_order(size), size);
 	if (oo_objects(s->oo) > oo_objects(s->max))
 		s->max = s->oo;
 
@@ -3563,7 +3563,6 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
 static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
 {
 	s->flags = kmem_cache_flags(s->size, flags, s->name, s->ctor);
-	s->reserved = 0;
 #ifdef CONFIG_SLAB_FREELIST_HARDENED
 	s->random = get_random_long();
 #endif
@@ -5107,7 +5106,7 @@ SLAB_ATTR_RO(destroy_by_rcu);
 
 static ssize_t reserved_show(struct kmem_cache *s, char *buf)
 {
-	return sprintf(buf, "%u\n", s->reserved);
+	return sprintf(buf, "0\n");
 }
 SLAB_ATTR_RO(reserved);
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 12/14] mm: Improve struct page documentation
  2018-04-18 18:49 ` [PATCH v3 12/14] mm: Improve struct page documentation Matthew Wilcox
@ 2018-04-18 23:32   ` Randy Dunlap
  2018-04-18 23:43     ` Matthew Wilcox
  0 siblings, 1 reply; 43+ messages in thread
From: Randy Dunlap @ 2018-04-18 23:32 UTC (permalink / raw)
  To: Matthew Wilcox, linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

On 04/18/18 11:49, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> Rewrite the documentation to describe what you can use in struct
> page rather than what you can't.
> 
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
> ---
>  include/linux/mm_types.h | 40 +++++++++++++++++++---------------------
>  1 file changed, 19 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 080ea97ad444..13c25b16913d 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -33,29 +33,27 @@ struct hmm;
>   * it to keep track of whatever it is we are using the page for at the
>   * moment. Note that we have no way to track which tasks are using
>   * a page, though if it is a pagecache page, rmap structures can tell us
> - * who is mapping it. If you allocate the page using alloc_pages(), you
> - * can use some of the space in struct page for your own purposes.
> + * who is mapping it.
>   *
> - * Pages that were once in the page cache may be found under the RCU lock
> - * even after they have been recycled to a different purpose.  The page
> - * cache reads and writes some of the fields in struct page to pin the
> - * page before checking that it's still in the page cache.  It is vital
> - * that all users of struct page:
> - * 1. Use the first word as PageFlags.
> - * 2. Clear or preserve bit 0 of page->compound_head.  It is used as
> - *    PageTail for compound pages, and the page cache must not see false
> - *    positives.  Some users put a pointer here (guaranteed to be at least
> - *    4-byte aligned), other users avoid using the field altogether.
> - * 3. page->_refcount must either not be used, or must be used in such a
> - *    way that other CPUs temporarily incrementing and then decrementing the
> - *    refcount does not cause problems.  On receiving the page from
> - *    alloc_pages(), the refcount will be positive.
> - * 4. Either preserve page->_mapcount or restore it to -1 before freeing it.
> + * If you allocate the page using alloc_pages(), you can use some of the
> + * space in struct page for your own purposes.  The five words in the first

Using "first union" here...

> + * union are available, except for bit 0 of the first word which must be
> + * kept clear.  Many users use this word to store a pointer to an object
> + * which is guaranteed to be aligned.  If you use the same storage as
> + * page->mapping, you must restore it to NULL before freeing the page.
>   *
> - * If you allocate pages of order > 0, you can use the fields in the struct
> - * page associated with each page, but bear in mind that the pages may have
> - * been inserted individually into the page cache, so you must use the above
> - * four fields in a compatible way for each struct page.
> + * If your page will not be mapped to userspace, you can also use the 4
> + * bytes in the second union, but you must call page_mapcount_reset()

and "second union" here bother me, but it looks like they are anonymous.

I'm concerned about someone other than you modifying struct page at some
later time.  If these unions were named (and you could use that name here
instead of "first" or "second"), then there would be less chance for that
next person to miss modifying that comment or it just becoming stale.


Reviewed-by: Randy Dunlap <rdunlap@infradead.org>

> + * before freeing it.
> + *
> + * If you want to use the refcount field, it must be used in such a way
> + * that other CPUs temporarily incrementing and then decrementing the
> + * refcount does not cause problems.  On receiving the page from
> + * alloc_pages(), the refcount will be positive.
> + *
> + * If you allocate pages of order > 0, you can use some of the fields
> + * in each subpage, but you may need to restore some of their values
> + * afterwards.
>   *
>   * SLUB uses cmpxchg_double() to atomically update its freelist and
>   * counters.  That requires that freelist & counters be adjacent and

-- 
~Randy

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 12/14] mm: Improve struct page documentation
  2018-04-18 23:32   ` Randy Dunlap
@ 2018-04-18 23:43     ` Matthew Wilcox
  0 siblings, 0 replies; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-18 23:43 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

On Wed, Apr 18, 2018 at 04:32:27PM -0700, Randy Dunlap wrote:
> > + * If you allocate the page using alloc_pages(), you can use some of the
> > + * space in struct page for your own purposes.  The five words in the first
> 
> Using "first union" here...
> 
> > + * If your page will not be mapped to userspace, you can also use the 4
> > + * bytes in the second union, but you must call page_mapcount_reset()
> 
> and "second union" here bother me, but it looks like they are anonymous.
> 
> I'm concerned about someone other than you modifying struct page at some
> later time.  If these unions were named (and you could use that name here
> instead of "first" or "second"), then there would be less chance for that
> next person to miss modifying that comment or it just becoming stale.

Yeah, it bothers me too.  I was first bothered by this when writing the
patch descriptions for the earlier patches in the series "Combine first
three unions in struct page" and "Combine first two unions in struct
page" really suck as patch descriptions.  But I couldn't come up with
anything better, so here we are ...

If we name the union, then either we have to put in some grotty macros
or change every instance of page->mapping to page->u1.mapping (repeat
ad nauseam).  I mean, I'd rather leave the unions anonymous and name
the structs (but again, I don't want to rename every user).

We can put a comment on the union and name them that way, but I
don't even know what to call them.  "main union" "auxiliary union".
"secondary union".  I don't know.

> Reviewed-by: Randy Dunlap <rdunlap@infradead.org>

Thanks.  I also did some kernel-doc'ifying of some other comments earlier
in the series.  I'm sure they could be improved.  And there's a whole
bunch of comments which aren't in kernel-doc format and might or might
not want to be.

(eg: do we want to comment _refcount?  Other than to tell people to not
use it?)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 02/14] mm: Split page_type out from _mapcount
  2018-04-18 18:49 ` [PATCH v3 02/14] mm: Split page_type out from _mapcount Matthew Wilcox
@ 2018-04-19  9:04   ` Vlastimil Babka
  2018-04-19 11:16     ` Matthew Wilcox
  2018-04-20 15:17   ` Christopher Lameter
  1 sibling, 1 reply; 43+ messages in thread
From: Vlastimil Babka @ 2018-04-19  9:04 UTC (permalink / raw)
  To: Matthew Wilcox, linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg

On 04/18/2018 08:49 PM, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> We're already using a union of many fields here, so stop abusing the
> _mapcount and make page_type its own field.  That implies renaming some
> of the machinery that creates PageBuddy, PageBalloon and PageKmemcg;
> bring back the PG_buddy, PG_balloon and PG_kmemcg names.
> 
> As suggested by Kirill, make page_type a bitmask.  Because it starts out
> life as -1 (thanks to sharing the storage with _mapcount), setting a
> page flag means clearing the appropriate bit.  This gives us space for
> probably twenty or so extra bits (depending how paranoid we want to be
> about _mapcount underflow).
> 
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  include/linux/mm_types.h   | 13 ++++++-----
>  include/linux/page-flags.h | 45 ++++++++++++++++++++++----------------
>  kernel/crash_core.c        |  1 +
>  mm/page_alloc.c            | 13 +++++------
>  scripts/tags.sh            |  6 ++---
>  5 files changed, 43 insertions(+), 35 deletions(-)

...

> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index e34a27727b9a..8c25b28a35aa 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -642,49 +642,56 @@ PAGEFLAG_FALSE(DoubleMap)
>  #endif
>  
>  /*
> - * For pages that are never mapped to userspace, page->mapcount may be
> - * used for storing extra information about page type. Any value used
> - * for this purpose must be <= -2, but it's better start not too close
> - * to -2 so that an underflow of the page_mapcount() won't be mistaken
> - * for a special page.
> + * For pages that are never mapped to userspace (and aren't PageSlab),
> + * page_type may be used.  Because it is initialised to -1, we invert the
> + * sense of the bit, so __SetPageFoo *clears* the bit used for PageFoo, and
> + * __ClearPageFoo *sets* the bit used for PageFoo.  We reserve a few high and
> + * low bits so that an underflow or overflow of page_mapcount() won't be
> + * mistaken for a page type value.
>   */
> -#define PAGE_MAPCOUNT_OPS(uname, lname)					\
> +
> +#define PAGE_TYPE_BASE	0xf0000000
> +/* Reserve		0x0000007f to catch underflows of page_mapcount */
> +#define PG_buddy	0x00000080
> +#define PG_balloon	0x00000100
> +#define PG_kmemcg	0x00000200
> +
> +#define PageType(page, flag)						\
> +	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
> +
> +#define PAGE_TYPE_OPS(uname, lname)					\
>  static __always_inline int Page##uname(struct page *page)		\
>  {									\
> -	return atomic_read(&page->_mapcount) ==				\
> -				PAGE_##lname##_MAPCOUNT_VALUE;		\
> +	return PageType(page, PG_##lname);				\
>  }									\
>  static __always_inline void __SetPage##uname(struct page *page)		\
>  {									\
> -	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);	\
> -	atomic_set(&page->_mapcount, PAGE_##lname##_MAPCOUNT_VALUE);	\
> +	VM_BUG_ON_PAGE(!PageType(page, 0), page);			\

I think this debug test does less than you expect? IIUC you want to
check that no type is yet set, but this will only trigger if something
cleared one of the bits in top 0xf byte of PAGE_TYPE_BASE?
Just keep the comparison to -1 then?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 03/14] mm: Mark pages in use for page tables
  2018-04-18 18:49 ` [PATCH v3 03/14] mm: Mark pages in use for page tables Matthew Wilcox
@ 2018-04-19  9:30   ` Vlastimil Babka
  0 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2018-04-19  9:30 UTC (permalink / raw)
  To: Matthew Wilcox, linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg

On 04/18/2018 08:49 PM, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> Define a new PageTable bit in the page_type and use it to mark pages in
> use as page tables.  This can be helpful when debugging crashdumps or

Yep, once I've added such flag for debugging myself :)

> analysing memory fragmentation.  Add a KPF flag to report these pages
> to userspace and update page-types.c to interpret that flag.
> 
> Note that only pages currently accounted as NR_PAGETABLES are tracked
> as PageTable; this does not include pgd/p4d/pud/pmd pages.  Those will
> be the subject of a later patch.
> 
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 04/14] mm: Switch s_mem and slab_cache in struct page
  2018-04-18 18:49 ` [PATCH v3 04/14] mm: Switch s_mem and slab_cache in struct page Matthew Wilcox
@ 2018-04-19 11:06   ` Vlastimil Babka
  2018-04-19 11:19     ` Matthew Wilcox
  0 siblings, 1 reply; 43+ messages in thread
From: Vlastimil Babka @ 2018-04-19 11:06 UTC (permalink / raw)
  To: Matthew Wilcox, linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg

On 04/18/2018 08:49 PM, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>

More rationale? Such as "This will allow us to ... later in the series"?

> slub now needs to set page->mapping to NULL as it frees the page, just
> like slab does.

I wonder if they should be touching the mapping field, and rather not
the slab_cache field, with a comment why it has to be NULLed?

> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
> ---
>  include/linux/mm_types.h | 4 ++--
>  mm/slub.c                | 1 +
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 41828fb34860..e97a310a6abe 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -83,7 +83,7 @@ struct page {
>  		/* See page-flags.h for the definition of PAGE_MAPPING_FLAGS */
>  		struct address_space *mapping;
>  
> -		void *s_mem;			/* slab first object */
> +		struct kmem_cache *slab_cache;	/* SL[AU]B: Pointer to slab */
>  		atomic_t compound_mapcount;	/* first tail page */
>  		/* page_deferred_list().next	 -- second tail page */
>  	};
> @@ -194,7 +194,7 @@ struct page {
>  		spinlock_t ptl;
>  #endif
>  #endif
> -		struct kmem_cache *slab_cache;	/* SL[AU]B: Pointer to slab */
> +		void *s_mem;			/* slab first object */
>  	};
>  
>  #ifdef CONFIG_MEMCG
> diff --git a/mm/slub.c b/mm/slub.c
> index 099925cf456a..27b6ba1c116a 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1690,6 +1690,7 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
>  	__ClearPageSlab(page);
>  
>  	page_mapcount_reset(page);
> +	page->mapping = NULL;
>  	if (current->reclaim_state)
>  		current->reclaim_state->reclaimed_slab += pages;
>  	memcg_uncharge_slab(page, order, s);
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 02/14] mm: Split page_type out from _mapcount
  2018-04-19  9:04   ` Vlastimil Babka
@ 2018-04-19 11:16     ` Matthew Wilcox
  0 siblings, 0 replies; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-19 11:16 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg

On Thu, Apr 19, 2018 at 11:04:23AM +0200, Vlastimil Babka wrote:
> On 04/18/2018 08:49 PM, Matthew Wilcox wrote:
> > From: Matthew Wilcox <mawilcox@microsoft.com>
> > 
> > We're already using a union of many fields here, so stop abusing the
> > _mapcount and make page_type its own field.  That implies renaming some
> > of the machinery that creates PageBuddy, PageBalloon and PageKmemcg;
> > bring back the PG_buddy, PG_balloon and PG_kmemcg names.
> > 
> > As suggested by Kirill, make page_type a bitmask.  Because it starts out
> > life as -1 (thanks to sharing the storage with _mapcount), setting a
> > page flag means clearing the appropriate bit.  This gives us space for
> > probably twenty or so extra bits (depending how paranoid we want to be
> > about _mapcount underflow).
> > 
> > Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
> > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  include/linux/mm_types.h   | 13 ++++++-----
> >  include/linux/page-flags.h | 45 ++++++++++++++++++++++----------------
> >  kernel/crash_core.c        |  1 +
> >  mm/page_alloc.c            | 13 +++++------
> >  scripts/tags.sh            |  6 ++---
> >  5 files changed, 43 insertions(+), 35 deletions(-)
> 
> ...
> 
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index e34a27727b9a..8c25b28a35aa 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -642,49 +642,56 @@ PAGEFLAG_FALSE(DoubleMap)
> >  #endif
> >  
> >  /*
> > - * For pages that are never mapped to userspace, page->mapcount may be
> > - * used for storing extra information about page type. Any value used
> > - * for this purpose must be <= -2, but it's better start not too close
> > - * to -2 so that an underflow of the page_mapcount() won't be mistaken
> > - * for a special page.
> > + * For pages that are never mapped to userspace (and aren't PageSlab),
> > + * page_type may be used.  Because it is initialised to -1, we invert the
> > + * sense of the bit, so __SetPageFoo *clears* the bit used for PageFoo, and
> > + * __ClearPageFoo *sets* the bit used for PageFoo.  We reserve a few high and
> > + * low bits so that an underflow or overflow of page_mapcount() won't be
> > + * mistaken for a page type value.
> >   */
> > -#define PAGE_MAPCOUNT_OPS(uname, lname)					\
> > +
> > +#define PAGE_TYPE_BASE	0xf0000000
> > +/* Reserve		0x0000007f to catch underflows of page_mapcount */
> > +#define PG_buddy	0x00000080
> > +#define PG_balloon	0x00000100
> > +#define PG_kmemcg	0x00000200
> > +
> > +#define PageType(page, flag)						\
> > +	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
> > +
> > +#define PAGE_TYPE_OPS(uname, lname)					\
> >  static __always_inline int Page##uname(struct page *page)		\
> >  {									\
> > -	return atomic_read(&page->_mapcount) ==				\
> > -				PAGE_##lname##_MAPCOUNT_VALUE;		\
> > +	return PageType(page, PG_##lname);				\
> >  }									\
> >  static __always_inline void __SetPage##uname(struct page *page)		\
> >  {									\
> > -	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);	\
> > -	atomic_set(&page->_mapcount, PAGE_##lname##_MAPCOUNT_VALUE);	\
> > +	VM_BUG_ON_PAGE(!PageType(page, 0), page);			\
> 
> I think this debug test does less than you expect? IIUC you want to
> check that no type is yet set, but this will only trigger if something
> cleared one of the bits in top 0xf byte of PAGE_TYPE_BASE?
> Just keep the comparison to -1 then?

With this patchset, it becomes possible to set more than one of the
PageTye bits.  It doesn't make sense to set PageBuddy and PageKmemcg,
but maybe it makes sense to set PageKmemcg and PageTable?

So yes, I weakened this test, but I did so deliberately.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 04/14] mm: Switch s_mem and slab_cache in struct page
  2018-04-19 11:06   ` Vlastimil Babka
@ 2018-04-19 11:19     ` Matthew Wilcox
  0 siblings, 0 replies; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-19 11:19 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg

On Thu, Apr 19, 2018 at 01:06:30PM +0200, Vlastimil Babka wrote:
> On 04/18/2018 08:49 PM, Matthew Wilcox wrote:
> > From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> More rationale? Such as "This will allow us to ... later in the series"?

Sure.  Probably the best rationale at this point is that it'll allow us
to move slub's counters into a union with s_mem later in the series.

> > slub now needs to set page->mapping to NULL as it frees the page, just
> > like slab does.
> 
> I wonder if they should be touching the mapping field, and rather not
> the slab_cache field, with a comment why it has to be NULLed?

I add that to the documentation at the end of the series:

 * If you allocate the page using alloc_pages(), you can use some of the
 * space in struct page for your own purposes.  The five words in the first
 * union are available, except for bit 0 of the first word which must be
 * kept clear.  Many users use this word to store a pointer to an object
 * which is guaranteed to be aligned.  If you use the same storage as
 * page->mapping, you must restore it to NULL before freeing the page.

Thanks for your review!

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 05/14] mm: Move 'private' union within struct page
  2018-04-18 18:49 ` [PATCH v3 05/14] mm: Move 'private' union within " Matthew Wilcox
@ 2018-04-19 11:31   ` Vlastimil Babka
  2018-04-20 15:25   ` Christopher Lameter
  2018-04-30  9:38   ` Kirill A. Shutemov
  2 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2018-04-19 11:31 UTC (permalink / raw)
  To: Matthew Wilcox, linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg

On 04/18/2018 08:49 PM, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> By moving page->private to the fourth word of struct page, we can put
> the SLUB counters in the same word as SLAB's s_mem and still do the
> cmpxchg_double trick.  Now the SLUB counters no longer overlap with
> the refcount.
> 
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 06/14] mm: Move _refcount out of struct page union
  2018-04-18 18:49 ` [PATCH v3 06/14] mm: Move _refcount out of struct page union Matthew Wilcox
@ 2018-04-19 11:37   ` Vlastimil Babka
  2018-04-30  9:40   ` Kirill A. Shutemov
  1 sibling, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2018-04-19 11:37 UTC (permalink / raw)
  To: Matthew Wilcox, linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Pekka Enberg

On 04/18/2018 08:49 PM, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> Keeping the refcount in the union only encourages people to put
> something else in the union which will overlap with _refcount and
> eventually explode messily.  pahole reports no fields change location.
> 
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 09/14] mm: Use page->deferred_list
  2018-04-18 18:49 ` [PATCH v3 09/14] mm: Use page->deferred_list Matthew Wilcox
@ 2018-04-19 13:23   ` Vlastimil Babka
  2018-04-30  9:43   ` Kirill A. Shutemov
  1 sibling, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2018-04-19 13:23 UTC (permalink / raw)
  To: Matthew Wilcox, linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg

On 04/18/2018 08:49 PM, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> Now that we can represent the location of 'deferred_list' in C instead
> of comments, make use of that ability.
> 
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/huge_memory.c | 7 ++-----
>  mm/page_alloc.c  | 2 +-
>  2 files changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 14ed6ee5e02f..55ad852fbf17 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -483,11 +483,8 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
>  
>  static inline struct list_head *page_deferred_list(struct page *page)
>  {
> -	/*
> -	 * ->lru in the tail pages is occupied by compound_head.
> -	 * Let's use ->mapping + ->index in the second tail page as list_head.
> -	 */
> -	return (struct list_head *)&page[2].mapping;
> +	/* ->lru in the tail pages is occupied by compound_head. */
> +	return &page[2].deferred_list;
>  }
>  
>  void prep_transhuge_page(struct page *page)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 88e817d7ccef..18720eccbce1 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -953,7 +953,7 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
>  	case 2:
>  		/*
>  		 * the second tail page: ->mapping is
> -		 * page_deferred_list().next -- ignore value.
> +		 * deferred_list.next -- ignore value.
>  		 */
>  		break;
>  	default:
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v3 07/14] slub: Remove page->counters
  2018-04-18 18:49 ` [PATCH v3 07/14] slub: Remove page->counters Matthew Wilcox
@ 2018-04-19 13:42   ` Vlastimil Babka
  2018-04-19 14:23     ` Matthew Wilcox
  0 siblings, 1 reply; 43+ messages in thread
From: Vlastimil Babka @ 2018-04-19 13:42 UTC (permalink / raw)
  To: Matthew Wilcox, linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg

On 04/18/2018 08:49 PM, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> Use page->private instead, now that these two fields are in the same
> location.  Include a compile-time assert that the fields don't get out
> of sync.
> 
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>

Why not retain a small union of "counters" and inuse/objects/frozens
within the SLUB's sub-structure? IMHO it would be more obvious and
reduce churn?

...

> @@ -358,17 +359,10 @@ static __always_inline void slab_unlock(struct page *page)
>  
>  static inline void set_page_slub_counters(struct page *page, unsigned long counters_new)
>  {
> -	struct page tmp;
> -	tmp.counters = counters_new;
> -	/*
> -	 * page->counters can cover frozen/inuse/objects as well
> -	 * as page->_refcount.  If we assign to ->counters directly
> -	 * we run the risk of losing updates to page->_refcount, so
> -	 * be careful and only assign to the fields we need.
> -	 */
> -	page->frozen  = tmp.frozen;
> -	page->inuse   = tmp.inuse;
> -	page->objects = tmp.objects;

BTW was this ever safe to begin with? IIRC bitfields are frowned upon as
a potential RMW. Or is there still at least guarantee the RMW happens
only within the 32bit struct and not the whole 64bit word, which used to
include also _refcount?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v3 08/14] mm: Combine first three unions in struct page
  2018-04-18 18:49 ` [PATCH v3 08/14] mm: Combine first three unions in struct page Matthew Wilcox
@ 2018-04-19 13:46   ` Vlastimil Babka
  2018-04-19 14:08     ` Matthew Wilcox
  2018-04-30  9:42   ` Kirill A. Shutemov
  1 sibling, 1 reply; 43+ messages in thread
From: Vlastimil Babka @ 2018-04-19 13:46 UTC (permalink / raw)
  To: Matthew Wilcox, linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg

On 04/18/2018 08:49 PM, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> By combining these three one-word unions into one three-word union,
> we make it easier for users to add their own multi-word fields to struct
> page, as well as making it obvious that SLUB needs to keep its double-word
> alignment for its freelist & counters.
> 
> No field moves position; verified with pahole.
> 
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
> ---
>  include/linux/mm_types.h | 65 ++++++++++++++++++++--------------------
>  1 file changed, 32 insertions(+), 33 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 04d9dc442029..39521b8385c1 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -70,45 +70,44 @@ struct hmm;
>  #endif
>  
>  struct page {
> -	/* First double word block */
>  	unsigned long flags;		/* Atomic flags, some possibly
>  					 * updated asynchronously */
> -	union {
> -		/* See page-flags.h for the definition of PAGE_MAPPING_FLAGS */
> -		struct address_space *mapping;
> -
> -		struct kmem_cache *slab_cache;	/* SL[AU]B: Pointer to slab */
> +	union {		/* This union is three words (12/24 bytes) in size */
> +		struct {	/* Page cache and anonymous pages */
> +			/* See page-flags.h for PAGE_MAPPING_FLAGS */
> +			struct address_space *mapping;
> +			pgoff_t index;		/* Our offset within mapping. */
> +			/**
> +			 * @private: Mapping-private opaque data.
> +			 * Usually used for buffer_heads if PagePrivate.
> +			 * Used for swp_entry_t if PageSwapCache.
> +			 * Indicates order in the buddy system if PageBuddy.
> +			 */
> +			unsigned long private;
> +		};
> +		struct {	/* slab and slob */
> +			struct kmem_cache *slab_cache;
> +			void *freelist;		/* first free object */
> +			void *s_mem;		/* first object */
> +		};
> +		struct {	/* slub also uses some of the slab fields */
> +			struct kmem_cache *slub_cache;
> +			/* Double-word boundary */
> +			void *slub_freelist;

Is slub going to switch to use those? Or maybe this is an overkill and
we could merge the two sl*b structs and just have an union for s_mem and
the 3 counters?

> +			unsigned inuse:16;
> +			unsigned objects:15;
> +			unsigned frozen:1;
> +		};

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 10/14] mm: Move lru union within struct page
  2018-04-18 18:49 ` [PATCH v3 10/14] mm: Move lru union within struct page Matthew Wilcox
@ 2018-04-19 13:56   ` Vlastimil Babka
  2018-04-30  9:44   ` Kirill A. Shutemov
  1 sibling, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2018-04-19 13:56 UTC (permalink / raw)
  To: Matthew Wilcox, linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg

On 04/18/2018 08:49 PM, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> Since the LRU is two words, this does not affect the double-word
> alignment of SLUB's freelist.
> 
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  include/linux/mm_types.h | 102 +++++++++++++++++++--------------------
>  1 file changed, 51 insertions(+), 51 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 39521b8385c1..230d473f16da 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -72,6 +72,57 @@ struct hmm;
>  struct page {
>  	unsigned long flags;		/* Atomic flags, some possibly
>  					 * updated asynchronously */
> +	/*
> +	 * WARNING: bit 0 of the first word encode PageTail(). That means
> +	 * the rest users of the storage space MUST NOT use the bit to
> +	 * avoid collision and false-positive PageTail().
> +	 */
> +	union {
> +		struct list_head lru;	/* Pageout list, eg. active_list
> +					 * protected by zone_lru_lock !
> +					 * Can be used as a generic list
> +					 * by the page owner.
> +					 */
> +		struct dev_pagemap *pgmap; /* ZONE_DEVICE pages are never on an
> +					    * lru or handled by a slab
> +					    * allocator, this points to the
> +					    * hosting device page map.
> +					    */
> +		struct {		/* slub per cpu partial pages */
> +			struct page *next;	/* Next partial slab */
> +#ifdef CONFIG_64BIT
> +			int pages;	/* Nr of partial slabs left */
> +			int pobjects;	/* Approximate # of objects */
> +#else
> +			short int pages;
> +			short int pobjects;
> +#endif
> +		};
> +
> +		struct rcu_head rcu_head;	/* Used by SLAB
> +						 * when destroying via RCU
> +						 */
> +		/* Tail pages of compound page */
> +		struct {
> +			unsigned long compound_head; /* If bit zero is set */
> +
> +			/* First tail page only */
> +			unsigned char compound_dtor;
> +			unsigned char compound_order;
> +			/* two/six bytes available here */
> +		};
> +
> +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
> +		struct {
> +			unsigned long __pad;	/* do not overlay pmd_huge_pte
> +						 * with compound_head to avoid
> +						 * possible bit 0 collision.
> +						 */
> +			pgtable_t pmd_huge_pte; /* protected by page->ptl */
> +		};
> +#endif
> +	};
> +
>  	union {		/* This union is three words (12/24 bytes) in size */
>  		struct {	/* Page cache and anonymous pages */
>  			/* See page-flags.h for PAGE_MAPPING_FLAGS */
> @@ -133,57 +184,6 @@ struct page {
>  	/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
>  	atomic_t _refcount;
>  
> -	/*
> -	 * WARNING: bit 0 of the first word encode PageTail(). That means
> -	 * the rest users of the storage space MUST NOT use the bit to
> -	 * avoid collision and false-positive PageTail().
> -	 */
> -	union {
> -		struct list_head lru;	/* Pageout list, eg. active_list
> -					 * protected by zone_lru_lock !
> -					 * Can be used as a generic list
> -					 * by the page owner.
> -					 */
> -		struct dev_pagemap *pgmap; /* ZONE_DEVICE pages are never on an
> -					    * lru or handled by a slab
> -					    * allocator, this points to the
> -					    * hosting device page map.
> -					    */
> -		struct {		/* slub per cpu partial pages */
> -			struct page *next;	/* Next partial slab */
> -#ifdef CONFIG_64BIT
> -			int pages;	/* Nr of partial slabs left */
> -			int pobjects;	/* Approximate # of objects */
> -#else
> -			short int pages;
> -			short int pobjects;
> -#endif
> -		};
> -
> -		struct rcu_head rcu_head;	/* Used by SLAB
> -						 * when destroying via RCU
> -						 */
> -		/* Tail pages of compound page */
> -		struct {
> -			unsigned long compound_head; /* If bit zero is set */
> -
> -			/* First tail page only */
> -			unsigned char compound_dtor;
> -			unsigned char compound_order;
> -			/* two/six bytes available here */
> -		};
> -
> -#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
> -		struct {
> -			unsigned long __pad;	/* do not overlay pmd_huge_pte
> -						 * with compound_head to avoid
> -						 * possible bit 0 collision.
> -						 */
> -			pgtable_t pmd_huge_pte; /* protected by page->ptl */
> -		};
> -#endif
> -	};
> -
>  #ifdef CONFIG_MEMCG
>  	struct mem_cgroup *mem_cgroup;
>  #endif
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 11/14] mm: Combine first two unions in struct page
  2018-04-18 18:49 ` [PATCH v3 11/14] mm: Combine first two unions in " Matthew Wilcox
@ 2018-04-19 14:03   ` Vlastimil Babka
  2018-04-30  9:47   ` Kirill A. Shutemov
  1 sibling, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2018-04-19 14:03 UTC (permalink / raw)
  To: Matthew Wilcox, linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg

On 04/18/2018 08:49 PM, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> This gives us five words of space in a single union in struct page.
> The compound_mapcount moves position (from offset 24 to offset 20)
> on 64-bit systems, but that does not seem likely to cause any trouble.
> 
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 08/14] mm: Combine first three unions in struct page
  2018-04-19 13:46   ` Vlastimil Babka
@ 2018-04-19 14:08     ` Matthew Wilcox
  0 siblings, 0 replies; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-19 14:08 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg

On Thu, Apr 19, 2018 at 03:46:42PM +0200, Vlastimil Babka wrote:
> > +		struct {	/* slab and slob */
> > +			struct kmem_cache *slab_cache;
> > +			void *freelist;		/* first free object */
> > +			void *s_mem;		/* first object */
> > +		};
> > +		struct {	/* slub also uses some of the slab fields */
> > +			struct kmem_cache *slub_cache;
> > +			/* Double-word boundary */
> > +			void *slub_freelist;
> 
> Is slub going to switch to use those? Or maybe this is an overkill and
> we could merge the two sl*b structs and just have an union for s_mem and
> the 3 counters?

It ends up looking pretty cruddy if you do that:

		struct {
			union {
				struct list_head slab_list;
				struct {
					struct page *next;
					unsigned int pobjects;
					unsigned int pages;
				};
			};
			struct kmem_cache *slab_cache;
			void *freelist;		/* first free object */
			union {
				void *s_mem;		/* first object */
				struct {
					unsigned inuse:16;
					unsigned objects:15;
					unsigned frozen:1;
				};
			};
		};

At least I don't enjoy the five layers of indentation ...

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 07/14] slub: Remove page->counters
  2018-04-19 13:42   ` Vlastimil Babka
@ 2018-04-19 14:23     ` Matthew Wilcox
  0 siblings, 0 replies; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-19 14:23 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg

On Thu, Apr 19, 2018 at 03:42:37PM +0200, Vlastimil Babka wrote:
> On 04/18/2018 08:49 PM, Matthew Wilcox wrote:
> > From: Matthew Wilcox <mawilcox@microsoft.com>
> > 
> > Use page->private instead, now that these two fields are in the same
> > location.  Include a compile-time assert that the fields don't get out
> > of sync.
> > 
> > Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
> 
> Why not retain a small union of "counters" and inuse/objects/frozens
> within the SLUB's sub-structure? IMHO it would be more obvious and
> reduce churn?

Could do.  Same issues with five layers of indentation though.
If there's consensus that that's a better way to go, then I'll redo
the series to look that way.

There is a way to reduce the indentation ... but we'd have to compile
with -fms-extensions (or -fplan9-extensions, but that wasn't added until
gcc 4.6, whereas -fms-extensions was added back in the egcs days).

-fms-extensions lets you do this:

struct a { int b; int c; };
struct d { struct a; int e; };
int init(struct d *);

int main(void)
{
	struct d d;
	init(&d);
	return d.b + d.c + d.e;
}

so we could then:

struct slub_counters {
	union {
		unsigned long counters;
		struct {
			unsigned inuse:16;
			unsigned objects:15;
			unsigned frozen:1;
		};
	};
};

struct page {
	union {
		struct {
			union {
				void *s_mem;
				struct slub_counters;

Given my employer, a request to add -fms-extensions to the Makefile
might be regarded with a certain amount of suspicion ;-)

> > @@ -358,17 +359,10 @@ static __always_inline void slab_unlock(struct page *page)
> >  
> >  static inline void set_page_slub_counters(struct page *page, unsigned long counters_new)
> >  {
> > -	struct page tmp;
> > -	tmp.counters = counters_new;
> > -	/*
> > -	 * page->counters can cover frozen/inuse/objects as well
> > -	 * as page->_refcount.  If we assign to ->counters directly
> > -	 * we run the risk of losing updates to page->_refcount, so
> > -	 * be careful and only assign to the fields we need.
> > -	 */
> > -	page->frozen  = tmp.frozen;
> > -	page->inuse   = tmp.inuse;
> > -	page->objects = tmp.objects;
> 
> BTW was this ever safe to begin with? IIRC bitfields are frowned upon as
> a potential RMW. Or is there still at least guarantee the RMW happens
> only within the 32bit struct and not the whole 64bit word, which used to
> include also _refcount?

I prefer not to think about it.  Indeed, I don't think that doing
page->tmp = tmp; where both are 32-bit quantities is guaranteed to not
do an RMW.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 02/14] mm: Split page_type out from _mapcount
  2018-04-18 18:49 ` [PATCH v3 02/14] mm: Split page_type out from _mapcount Matthew Wilcox
  2018-04-19  9:04   ` Vlastimil Babka
@ 2018-04-20 15:17   ` Christopher Lameter
  2018-04-20 20:43     ` Matthew Wilcox
  1 sibling, 1 reply; 43+ messages in thread
From: Christopher Lameter @ 2018-04-20 15:17 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Lai Jiangshan, Pekka Enberg, Vlastimil Babka

On Wed, 18 Apr 2018, Matthew Wilcox wrote:

> As suggested by Kirill, make page_type a bitmask.  Because it starts out
> life as -1 (thanks to sharing the storage with _mapcount), setting a
> page flag means clearing the appropriate bit.  This gives us space for
> probably twenty or so extra bits (depending how paranoid we want to be
> about _mapcount underflow).

Could we use bits in the page->flags for this? We could remove the node or
something else from page->flags. And the slab bit could also be part of
the page type.

The page field handling gets more and more bizarre.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 05/14] mm: Move 'private' union within struct page
  2018-04-18 18:49 ` [PATCH v3 05/14] mm: Move 'private' union within " Matthew Wilcox
  2018-04-19 11:31   ` Vlastimil Babka
@ 2018-04-20 15:25   ` Christopher Lameter
  2018-04-20 20:27     ` Matthew Wilcox
  2018-04-30  9:38   ` Kirill A. Shutemov
  2 siblings, 1 reply; 43+ messages in thread
From: Christopher Lameter @ 2018-04-20 15:25 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Lai Jiangshan, Pekka Enberg, Vlastimil Babka

On Wed, 18 Apr 2018, Matthew Wilcox wrote:

> @@ -95,6 +89,30 @@ struct page {
>  		/* page_deferred_list().prev	-- second tail page */
>  	};
>
> +	union {
> +		/*
> +		 * Mapping-private opaque data:
> +		 * Usually used for buffer_heads if PagePrivate
> +		 * Used for swp_entry_t if PageSwapCache
> +		 * Indicates order in the buddy system if PageBuddy
> +		 */
> +		unsigned long private;
> +#if USE_SPLIT_PTE_PTLOCKS
> +#if ALLOC_SPLIT_PTLOCKS
> +		spinlock_t *ptl;
> +#else
> +		spinlock_t ptl;

^^^^ This used to be defined at the end of the struct so that you could
have larger structs for spinlocks here (debugging and some such thing).

Could this not misalign the rest?


> +#endif
> +#endif
> +		void *s_mem;			/* slab first object */
> +		unsigned long counters;		/* SLUB */
> +		struct {			/* SLUB */
> +			unsigned inuse:16;
> +			unsigned objects:15;
> +			unsigned frozen:1;
> +		};
> +	};
> +
>  	union {

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 05/14] mm: Move 'private' union within struct page
  2018-04-20 15:25   ` Christopher Lameter
@ 2018-04-20 20:27     ` Matthew Wilcox
  0 siblings, 0 replies; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-20 20:27 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Lai Jiangshan, Pekka Enberg, Vlastimil Babka

On Fri, Apr 20, 2018 at 10:25:14AM -0500, Christopher Lameter wrote:
> On Wed, 18 Apr 2018, Matthew Wilcox wrote:
> > +	union {
> > +		unsigned long private;
> > +#if USE_SPLIT_PTE_PTLOCKS
> > +#if ALLOC_SPLIT_PTLOCKS
> > +		spinlock_t *ptl;
> > +#else
> > +		spinlock_t ptl;
> 
> ^^^^ This used to be defined at the end of the struct so that you could
> have larger structs for spinlocks here (debugging and some such thing).
> 
> Could this not misalign the rest?

Nope:

include/linux/mm_types_task.h:#define ALLOC_SPLIT_PTLOCKS       (SPINLOCK_SIZE > BITS_PER_LONG/8)

So we'll only store a spinlock here if it's <= sizeof(long); otherwise
we'll store a pointer here.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 02/14] mm: Split page_type out from _mapcount
  2018-04-20 15:17   ` Christopher Lameter
@ 2018-04-20 20:43     ` Matthew Wilcox
  0 siblings, 0 replies; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-20 20:43 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Lai Jiangshan, Pekka Enberg, Vlastimil Babka

On Fri, Apr 20, 2018 at 10:17:32AM -0500, Christopher Lameter wrote:
> On Wed, 18 Apr 2018, Matthew Wilcox wrote:
> 
> > As suggested by Kirill, make page_type a bitmask.  Because it starts out
> > life as -1 (thanks to sharing the storage with _mapcount), setting a
> > page flag means clearing the appropriate bit.  This gives us space for
> > probably twenty or so extra bits (depending how paranoid we want to be
> > about _mapcount underflow).
> 
> Could we use bits in the page->flags for this? We could remove the node or
> something else from page->flags. And the slab bit could also be part of
> the page type.
> 
> The page field handling gets more and more bizarre.

I don't think I'm making it any more bizarre than it already is :-)

Indeed, I think this patch makes it *more* sane.  Before this patch, there
are three magic values that might sometimes be stored in ->_mapcount.
After this patch, that's split into its own field, and the magic values
are actually flags, so they can be combined.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 05/14] mm: Move 'private' union within struct page
  2018-04-18 18:49 ` [PATCH v3 05/14] mm: Move 'private' union within " Matthew Wilcox
  2018-04-19 11:31   ` Vlastimil Babka
  2018-04-20 15:25   ` Christopher Lameter
@ 2018-04-30  9:38   ` Kirill A. Shutemov
  2 siblings, 0 replies; 43+ messages in thread
From: Kirill A. Shutemov @ 2018-04-30  9:38 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

On Wed, Apr 18, 2018 at 11:49:03AM -0700, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> By moving page->private to the fourth word of struct page, we can put
> the SLUB counters in the same word as SLAB's s_mem and still do the
> cmpxchg_double trick.  Now the SLUB counters no longer overlap with
> the refcount.
> 
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 06/14] mm: Move _refcount out of struct page union
  2018-04-18 18:49 ` [PATCH v3 06/14] mm: Move _refcount out of struct page union Matthew Wilcox
  2018-04-19 11:37   ` Vlastimil Babka
@ 2018-04-30  9:40   ` Kirill A. Shutemov
  1 sibling, 0 replies; 43+ messages in thread
From: Kirill A. Shutemov @ 2018-04-30  9:40 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

On Wed, Apr 18, 2018 at 11:49:04AM -0700, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> Keeping the refcount in the union only encourages people to put
> something else in the union which will overlap with _refcount and
> eventually explode messily.  pahole reports no fields change location.
> 
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 08/14] mm: Combine first three unions in struct page
  2018-04-18 18:49 ` [PATCH v3 08/14] mm: Combine first three unions in struct page Matthew Wilcox
  2018-04-19 13:46   ` Vlastimil Babka
@ 2018-04-30  9:42   ` Kirill A. Shutemov
  1 sibling, 0 replies; 43+ messages in thread
From: Kirill A. Shutemov @ 2018-04-30  9:42 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

On Wed, Apr 18, 2018 at 11:49:06AM -0700, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> By combining these three one-word unions into one three-word union,
> we make it easier for users to add their own multi-word fields to struct
> page, as well as making it obvious that SLUB needs to keep its double-word
> alignment for its freelist & counters.
> 
> No field moves position; verified with pahole.
> 
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 09/14] mm: Use page->deferred_list
  2018-04-18 18:49 ` [PATCH v3 09/14] mm: Use page->deferred_list Matthew Wilcox
  2018-04-19 13:23   ` Vlastimil Babka
@ 2018-04-30  9:43   ` Kirill A. Shutemov
  1 sibling, 0 replies; 43+ messages in thread
From: Kirill A. Shutemov @ 2018-04-30  9:43 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

On Wed, Apr 18, 2018 at 11:49:07AM -0700, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> Now that we can represent the location of 'deferred_list' in C instead
> of comments, make use of that ability.
> 
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 10/14] mm: Move lru union within struct page
  2018-04-18 18:49 ` [PATCH v3 10/14] mm: Move lru union within struct page Matthew Wilcox
  2018-04-19 13:56   ` Vlastimil Babka
@ 2018-04-30  9:44   ` Kirill A. Shutemov
  1 sibling, 0 replies; 43+ messages in thread
From: Kirill A. Shutemov @ 2018-04-30  9:44 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

On Wed, Apr 18, 2018 at 11:49:08AM -0700, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> Since the LRU is two words, this does not affect the double-word
> alignment of SLUB's freelist.
> 
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 11/14] mm: Combine first two unions in struct page
  2018-04-18 18:49 ` [PATCH v3 11/14] mm: Combine first two unions in " Matthew Wilcox
  2018-04-19 14:03   ` Vlastimil Babka
@ 2018-04-30  9:47   ` Kirill A. Shutemov
  2018-04-30 12:42     ` Matthew Wilcox
  1 sibling, 1 reply; 43+ messages in thread
From: Kirill A. Shutemov @ 2018-04-30  9:47 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

On Wed, Apr 18, 2018 at 11:49:09AM -0700, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> This gives us five words of space in a single union in struct page.
> The compound_mapcount moves position (from offset 24 to offset 20)
> on 64-bit systems, but that does not seem likely to cause any trouble.

Yeah, it should be fine.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 11/14] mm: Combine first two unions in struct page
  2018-04-30  9:47   ` Kirill A. Shutemov
@ 2018-04-30 12:42     ` Matthew Wilcox
  2018-04-30 13:12       ` Kirill A. Shutemov
  0 siblings, 1 reply; 43+ messages in thread
From: Matthew Wilcox @ 2018-04-30 12:42 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

On Mon, Apr 30, 2018 at 12:47:04PM +0300, Kirill A. Shutemov wrote:
> On Wed, Apr 18, 2018 at 11:49:09AM -0700, Matthew Wilcox wrote:
> > From: Matthew Wilcox <mawilcox@microsoft.com>
> > 
> > This gives us five words of space in a single union in struct page.
> > The compound_mapcount moves position (from offset 24 to offset 20)
> > on 64-bit systems, but that does not seem likely to cause any trouble.
> 
> Yeah, it should be fine.
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

I was wondering if it might make sense to make compound_mapcount an
atomic_long_t.  It'd guarantee no overflow, and prevent the location
from moving.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 11/14] mm: Combine first two unions in struct page
  2018-04-30 12:42     ` Matthew Wilcox
@ 2018-04-30 13:12       ` Kirill A. Shutemov
  0 siblings, 0 replies; 43+ messages in thread
From: Kirill A. Shutemov @ 2018-04-30 13:12 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Kirill A. Shutemov, linux-mm, Matthew Wilcox, Andrew Morton,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka

On Mon, Apr 30, 2018 at 12:42:16PM +0000, Matthew Wilcox wrote:
> On Mon, Apr 30, 2018 at 12:47:04PM +0300, Kirill A. Shutemov wrote:
> > On Wed, Apr 18, 2018 at 11:49:09AM -0700, Matthew Wilcox wrote:
> > > From: Matthew Wilcox <mawilcox@microsoft.com>
> > > 
> > > This gives us five words of space in a single union in struct page.
> > > The compound_mapcount moves position (from offset 24 to offset 20)
> > > on 64-bit systems, but that does not seem likely to cause any trouble.
> > 
> > Yeah, it should be fine.
> > 
> > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> 
> I was wondering if it might make sense to make compound_mapcount an
> atomic_long_t.  It'd guarantee no overflow, and prevent the location
> from moving.

It would only make sense if we change mapcount too.

I mean, what the point if the first split_huge_pmd() will cause the
overflow.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2018-04-30 13:12 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-18 18:48 [PATCH v3 00/14] Rearrange struct page Matthew Wilcox
2018-04-18 18:48 ` [PATCH v3 01/14] s390: Use _refcount for pgtables Matthew Wilcox
2018-04-18 18:49 ` [PATCH v3 02/14] mm: Split page_type out from _mapcount Matthew Wilcox
2018-04-19  9:04   ` Vlastimil Babka
2018-04-19 11:16     ` Matthew Wilcox
2018-04-20 15:17   ` Christopher Lameter
2018-04-20 20:43     ` Matthew Wilcox
2018-04-18 18:49 ` [PATCH v3 03/14] mm: Mark pages in use for page tables Matthew Wilcox
2018-04-19  9:30   ` Vlastimil Babka
2018-04-18 18:49 ` [PATCH v3 04/14] mm: Switch s_mem and slab_cache in struct page Matthew Wilcox
2018-04-19 11:06   ` Vlastimil Babka
2018-04-19 11:19     ` Matthew Wilcox
2018-04-18 18:49 ` [PATCH v3 05/14] mm: Move 'private' union within " Matthew Wilcox
2018-04-19 11:31   ` Vlastimil Babka
2018-04-20 15:25   ` Christopher Lameter
2018-04-20 20:27     ` Matthew Wilcox
2018-04-30  9:38   ` Kirill A. Shutemov
2018-04-18 18:49 ` [PATCH v3 06/14] mm: Move _refcount out of struct page union Matthew Wilcox
2018-04-19 11:37   ` Vlastimil Babka
2018-04-30  9:40   ` Kirill A. Shutemov
2018-04-18 18:49 ` [PATCH v3 07/14] slub: Remove page->counters Matthew Wilcox
2018-04-19 13:42   ` Vlastimil Babka
2018-04-19 14:23     ` Matthew Wilcox
2018-04-18 18:49 ` [PATCH v3 08/14] mm: Combine first three unions in struct page Matthew Wilcox
2018-04-19 13:46   ` Vlastimil Babka
2018-04-19 14:08     ` Matthew Wilcox
2018-04-30  9:42   ` Kirill A. Shutemov
2018-04-18 18:49 ` [PATCH v3 09/14] mm: Use page->deferred_list Matthew Wilcox
2018-04-19 13:23   ` Vlastimil Babka
2018-04-30  9:43   ` Kirill A. Shutemov
2018-04-18 18:49 ` [PATCH v3 10/14] mm: Move lru union within struct page Matthew Wilcox
2018-04-19 13:56   ` Vlastimil Babka
2018-04-30  9:44   ` Kirill A. Shutemov
2018-04-18 18:49 ` [PATCH v3 11/14] mm: Combine first two unions in " Matthew Wilcox
2018-04-19 14:03   ` Vlastimil Babka
2018-04-30  9:47   ` Kirill A. Shutemov
2018-04-30 12:42     ` Matthew Wilcox
2018-04-30 13:12       ` Kirill A. Shutemov
2018-04-18 18:49 ` [PATCH v3 12/14] mm: Improve struct page documentation Matthew Wilcox
2018-04-18 23:32   ` Randy Dunlap
2018-04-18 23:43     ` Matthew Wilcox
2018-04-18 18:49 ` [PATCH v3 13/14] slab,slub: Remove rcu_head size checks Matthew Wilcox
2018-04-18 18:49 ` [PATCH v3 14/14] slub: Remove kmem_cache->reserved Matthew Wilcox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.