All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/16] Rearrange struct page
@ 2018-04-30 20:22 Matthew Wilcox
  2018-04-30 20:22 ` [PATCH v4 01/16] s390: Use _refcount for pgtables Matthew Wilcox
                   ` (15 more replies)
  0 siblings, 16 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

As presented at LSFMM last week, this patch-set rearranges struct page
to give more usable space to users who have allocated a struct page for
their own purposes.  For a graphical view of before-and-after, see the first
two tabs of https://docs.google.com/spreadsheets/d/1tvCszs_7FXrjei9_mtFiKV6nW1FLnYyvPvW-qNZhdog/edit?usp=sharing

Highlights:
 - slub's counters no longer share space with _refcount.
 - slub's freelist+counters are now naturally dword aligned.
 - It's now more obvious what fields in struct page are used by which
   owners (some owners still take advantage of the union aliasing).
 - deferred_list now really exists in struct page instead of just a
   comment.
 - slub loses a parameter to a lot of functions.
 - Several hidden uses of struct page are now documented in code.

Changes v3 -> v4:

 - Added acks/reviews from Kirill & Randy
 - Removed call to page_mapcount_reset from slub since it no longer uses
   mapcount union.
 - Add pt_mm and hmm_data to struct page

Matthew Wilcox (16):
  s390: Use _refcount for pgtables
  mm: Split page_type out from _mapcount
  mm: Mark pages in use for page tables
  mm: Switch s_mem and slab_cache in struct page
  mm: Move 'private' union within struct page
  mm: Move _refcount out of struct page union
  slub: Remove page->counters
  mm: Combine first three unions in struct page
  mm: Use page->deferred_list
  mm: Move lru union within struct page
  mm: Combine first two unions in struct page
  mm: Improve struct page documentation
  mm: Add pt_mm to struct page
  mm: Add hmm_data to struct page
  slab,slub: Remove rcu_head size checks
  slub: Remove kmem_cache->reserved

 arch/s390/mm/pgalloc.c                 |  21 ++-
 arch/x86/mm/pgtable.c                  |   5 +-
 fs/proc/page.c                         |   2 +
 include/linux/hmm.h                    |   8 +-
 include/linux/mm.h                     |   2 +
 include/linux/mm_types.h               | 218 ++++++++++++-------------
 include/linux/page-flags.h             |  51 +++---
 include/linux/slub_def.h               |   1 -
 include/uapi/linux/kernel-page-flags.h |   2 +-
 kernel/crash_core.c                    |   1 +
 mm/huge_memory.c                       |   7 +-
 mm/page_alloc.c                        |  17 +-
 mm/slab.c                              |   2 -
 mm/slub.c                              | 138 ++++++----------
 scripts/tags.sh                        |   6 +-
 tools/vm/page-types.c                  |   1 +
 16 files changed, 222 insertions(+), 260 deletions(-)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v4 01/16] s390: Use _refcount for pgtables
  2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
  2018-04-30 20:22 ` [PATCH v4 02/16] mm: Split page_type out from _mapcount Matthew Wilcox
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

s390 borrows the storage used for _mapcount in struct page in order to
account whether the bottom or top half is being used for 2kB page
tables.  I want to use that for something else, so use the top byte of
_refcount instead of the bottom byte of _mapcount.  _refcount may
temporarily be incremented by other CPUs that see a stale pointer to
this page in the page cache, but each CPU can only increment it by one,
and there are no systems with 2^24 CPUs today, so they will not change
the upper byte of _refcount.  We do have to be a little careful not to
lose any of their writes (as they will subsequently decrement the
counter).

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 arch/s390/mm/pgalloc.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 562f72955956..84bd6329a88d 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -190,14 +190,15 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
 		if (!list_empty(&mm->context.pgtable_list)) {
 			page = list_first_entry(&mm->context.pgtable_list,
 						struct page, lru);
-			mask = atomic_read(&page->_mapcount);
+			mask = atomic_read(&page->_refcount) >> 24;
 			mask = (mask | (mask >> 4)) & 3;
 			if (mask != 3) {
 				table = (unsigned long *) page_to_phys(page);
 				bit = mask & 1;		/* =1 -> second 2K */
 				if (bit)
 					table += PTRS_PER_PTE;
-				atomic_xor_bits(&page->_mapcount, 1U << bit);
+				atomic_xor_bits(&page->_refcount,
+							1U << (bit + 24));
 				list_del(&page->lru);
 			}
 		}
@@ -218,12 +219,12 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
 	table = (unsigned long *) page_to_phys(page);
 	if (mm_alloc_pgste(mm)) {
 		/* Return 4K page table with PGSTEs */
-		atomic_set(&page->_mapcount, 3);
+		atomic_xor_bits(&page->_refcount, 3 << 24);
 		memset64((u64 *)table, _PAGE_INVALID, PTRS_PER_PTE);
 		memset64((u64 *)table + PTRS_PER_PTE, 0, PTRS_PER_PTE);
 	} else {
 		/* Return the first 2K fragment of the page */
-		atomic_set(&page->_mapcount, 1);
+		atomic_xor_bits(&page->_refcount, 1 << 24);
 		memset64((u64 *)table, _PAGE_INVALID, 2 * PTRS_PER_PTE);
 		spin_lock_bh(&mm->context.lock);
 		list_add(&page->lru, &mm->context.pgtable_list);
@@ -242,7 +243,8 @@ void page_table_free(struct mm_struct *mm, unsigned long *table)
 		/* Free 2K page table fragment of a 4K page */
 		bit = (__pa(table) & ~PAGE_MASK)/(PTRS_PER_PTE*sizeof(pte_t));
 		spin_lock_bh(&mm->context.lock);
-		mask = atomic_xor_bits(&page->_mapcount, 1U << bit);
+		mask = atomic_xor_bits(&page->_refcount, 1U << (bit + 24));
+		mask >>= 24;
 		if (mask & 3)
 			list_add(&page->lru, &mm->context.pgtable_list);
 		else
@@ -253,7 +255,6 @@ void page_table_free(struct mm_struct *mm, unsigned long *table)
 	}
 
 	pgtable_page_dtor(page);
-	atomic_set(&page->_mapcount, -1);
 	__free_page(page);
 }
 
@@ -274,7 +275,8 @@ void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table,
 	}
 	bit = (__pa(table) & ~PAGE_MASK) / (PTRS_PER_PTE*sizeof(pte_t));
 	spin_lock_bh(&mm->context.lock);
-	mask = atomic_xor_bits(&page->_mapcount, 0x11U << bit);
+	mask = atomic_xor_bits(&page->_refcount, 0x11U << (bit + 24));
+	mask >>= 24;
 	if (mask & 3)
 		list_add_tail(&page->lru, &mm->context.pgtable_list);
 	else
@@ -296,12 +298,13 @@ static void __tlb_remove_table(void *_table)
 		break;
 	case 1:		/* lower 2K of a 4K page table */
 	case 2:		/* higher 2K of a 4K page table */
-		if (atomic_xor_bits(&page->_mapcount, mask << 4) != 0)
+		mask = atomic_xor_bits(&page->_refcount, mask << (4 + 24));
+		mask >>= 24;
+		if (mask != 0)
 			break;
 		/* fallthrough */
 	case 3:		/* 4K page table with pgstes */
 		pgtable_page_dtor(page);
-		atomic_set(&page->_mapcount, -1);
 		__free_page(page);
 		break;
 	}
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 02/16] mm: Split page_type out from _mapcount
  2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
  2018-04-30 20:22 ` [PATCH v4 01/16] s390: Use _refcount for pgtables Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
  2018-04-30 20:22 ` [PATCH v4 03/16] mm: Mark pages in use for page tables Matthew Wilcox
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

We're already using a union of many fields here, so stop abusing the
_mapcount and make page_type its own field.  That implies renaming some
of the machinery that creates PageBuddy, PageBalloon and PageKmemcg;
bring back the PG_buddy, PG_balloon and PG_kmemcg names.

As suggested by Kirill, make page_type a bitmask.  Because it starts out
life as -1 (thanks to sharing the storage with _mapcount), setting a
page flag means clearing the appropriate bit.  This gives us space for
probably twenty or so extra bits (depending how paranoid we want to be
about _mapcount underflow).

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm_types.h   | 13 ++++++-----
 include/linux/page-flags.h | 45 ++++++++++++++++++++++----------------
 kernel/crash_core.c        |  1 +
 mm/page_alloc.c            | 13 +++++------
 scripts/tags.sh            |  6 ++---
 5 files changed, 43 insertions(+), 35 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 21612347d311..41828fb34860 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -96,6 +96,14 @@ struct page {
 	};
 
 	union {
+		/*
+		 * If the page is neither PageSlab nor mappable to userspace,
+		 * the value stored here may help determine what this page
+		 * is used for.  See page-flags.h for a list of page types
+		 * which are currently stored here.
+		 */
+		unsigned int page_type;
+
 		_slub_counter_t counters;
 		unsigned int active;		/* SLAB */
 		struct {			/* SLUB */
@@ -109,11 +117,6 @@ struct page {
 			/*
 			 * Count of ptes mapped in mms, to show when
 			 * page is mapped & limit reverse map searches.
-			 *
-			 * Extra information about page type may be
-			 * stored here for pages that are never mapped,
-			 * in which case the value MUST BE <= -2.
-			 * See page-flags.h for more details.
 			 */
 			atomic_t _mapcount;
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e34a27727b9a..8c25b28a35aa 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -642,49 +642,56 @@ PAGEFLAG_FALSE(DoubleMap)
 #endif
 
 /*
- * For pages that are never mapped to userspace, page->mapcount may be
- * used for storing extra information about page type. Any value used
- * for this purpose must be <= -2, but it's better start not too close
- * to -2 so that an underflow of the page_mapcount() won't be mistaken
- * for a special page.
+ * For pages that are never mapped to userspace (and aren't PageSlab),
+ * page_type may be used.  Because it is initialised to -1, we invert the
+ * sense of the bit, so __SetPageFoo *clears* the bit used for PageFoo, and
+ * __ClearPageFoo *sets* the bit used for PageFoo.  We reserve a few high and
+ * low bits so that an underflow or overflow of page_mapcount() won't be
+ * mistaken for a page type value.
  */
-#define PAGE_MAPCOUNT_OPS(uname, lname)					\
+
+#define PAGE_TYPE_BASE	0xf0000000
+/* Reserve		0x0000007f to catch underflows of page_mapcount */
+#define PG_buddy	0x00000080
+#define PG_balloon	0x00000100
+#define PG_kmemcg	0x00000200
+
+#define PageType(page, flag)						\
+	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
+
+#define PAGE_TYPE_OPS(uname, lname)					\
 static __always_inline int Page##uname(struct page *page)		\
 {									\
-	return atomic_read(&page->_mapcount) ==				\
-				PAGE_##lname##_MAPCOUNT_VALUE;		\
+	return PageType(page, PG_##lname);				\
 }									\
 static __always_inline void __SetPage##uname(struct page *page)		\
 {									\
-	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);	\
-	atomic_set(&page->_mapcount, PAGE_##lname##_MAPCOUNT_VALUE);	\
+	VM_BUG_ON_PAGE(!PageType(page, 0), page);			\
+	page->page_type &= ~PG_##lname;					\
 }									\
 static __always_inline void __ClearPage##uname(struct page *page)	\
 {									\
 	VM_BUG_ON_PAGE(!Page##uname(page), page);			\
-	atomic_set(&page->_mapcount, -1);				\
+	page->page_type |= PG_##lname;					\
 }
 
 /*
- * PageBuddy() indicate that the page is free and in the buddy system
+ * PageBuddy() indicates that the page is free and in the buddy system
  * (see mm/page_alloc.c).
  */
-#define PAGE_BUDDY_MAPCOUNT_VALUE		(-128)
-PAGE_MAPCOUNT_OPS(Buddy, BUDDY)
+PAGE_TYPE_OPS(Buddy, buddy)
 
 /*
- * PageBalloon() is set on pages that are on the balloon page list
+ * PageBalloon() is true for pages that are on the balloon page list
  * (see mm/balloon_compaction.c).
  */
-#define PAGE_BALLOON_MAPCOUNT_VALUE		(-256)
-PAGE_MAPCOUNT_OPS(Balloon, BALLOON)
+PAGE_TYPE_OPS(Balloon, balloon)
 
 /*
  * If kmemcg is enabled, the buddy allocator will set PageKmemcg() on
  * pages allocated with __GFP_ACCOUNT. It gets cleared on page free.
  */
-#define PAGE_KMEMCG_MAPCOUNT_VALUE		(-512)
-PAGE_MAPCOUNT_OPS(Kmemcg, KMEMCG)
+PAGE_TYPE_OPS(Kmemcg, kmemcg)
 
 extern bool is_free_buddy_page(struct page *page);
 
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index f7674d676889..b66aced5e8c2 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -460,6 +460,7 @@ static int __init crash_save_vmcoreinfo_init(void)
 	VMCOREINFO_NUMBER(PG_hwpoison);
 #endif
 	VMCOREINFO_NUMBER(PG_head_mask);
+#define PAGE_BUDDY_MAPCOUNT_VALUE	(~PG_buddy)
 	VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
 #ifdef CONFIG_HUGETLB_PAGE
 	VMCOREINFO_NUMBER(HUGETLB_PAGE_DTOR);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7eebd6925b10..88e817d7ccef 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -706,16 +706,14 @@ static inline void rmv_page_order(struct page *page)
 
 /*
  * This function checks whether a page is free && is the buddy
- * we can do coalesce a page and its buddy if
+ * we can coalesce a page and its buddy if
  * (a) the buddy is not in a hole (check before calling!) &&
  * (b) the buddy is in the buddy system &&
  * (c) a page and its buddy have the same order &&
  * (d) a page and its buddy are in the same zone.
  *
- * For recording whether a page is in the buddy system, we set ->_mapcount
- * PAGE_BUDDY_MAPCOUNT_VALUE.
- * Setting, clearing, and testing _mapcount PAGE_BUDDY_MAPCOUNT_VALUE is
- * serialized by zone->lock.
+ * For recording whether a page is in the buddy system, we set PageBuddy.
+ * Setting, clearing, and testing PageBuddy is serialized by zone->lock.
  *
  * For recording page's order, we use page_private(page).
  */
@@ -760,9 +758,8 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
  * as necessary, plus some accounting needed to play nicely with other
  * parts of the VM system.
  * At each level, we keep a list of pages, which are heads of continuous
- * free pages of length of (1 << order) and marked with _mapcount
- * PAGE_BUDDY_MAPCOUNT_VALUE. Page's order is recorded in page_private(page)
- * field.
+ * free pages of length of (1 << order) and marked with PageBuddy.
+ * Page's order is recorded in page_private(page) field.
  * So when we are allocating or freeing one, we can derive the state of the
  * other.  That is, if we allocate a small block, and both were
  * free, the remainder of the region must be split into blocks.
diff --git a/scripts/tags.sh b/scripts/tags.sh
index 78e546ff689c..8c3ae36d4ea8 100755
--- a/scripts/tags.sh
+++ b/scripts/tags.sh
@@ -188,9 +188,9 @@ regex_c=(
 	'/\<CLEARPAGEFLAG_NOOP(\([[:alnum:]_]*\).*/ClearPage\1/'
 	'/\<__CLEARPAGEFLAG_NOOP(\([[:alnum:]_]*\).*/__ClearPage\1/'
 	'/\<TESTCLEARFLAG_FALSE(\([[:alnum:]_]*\).*/TestClearPage\1/'
-	'/^PAGE_MAPCOUNT_OPS(\([[:alnum:]_]*\).*/Page\1/'
-	'/^PAGE_MAPCOUNT_OPS(\([[:alnum:]_]*\).*/__SetPage\1/'
-	'/^PAGE_MAPCOUNT_OPS(\([[:alnum:]_]*\).*/__ClearPage\1/'
+	'/^PAGE_TYPE_OPS(\([[:alnum:]_]*\).*/Page\1/'
+	'/^PAGE_TYPE_OPS(\([[:alnum:]_]*\).*/__SetPage\1/'
+	'/^PAGE_TYPE_OPS(\([[:alnum:]_]*\).*/__ClearPage\1/'
 	'/^TASK_PFA_TEST([^,]*, *\([[:alnum:]_]*\))/task_\1/'
 	'/^TASK_PFA_SET([^,]*, *\([[:alnum:]_]*\))/task_set_\1/'
 	'/^TASK_PFA_CLEAR([^,]*, *\([[:alnum:]_]*\))/task_clear_\1/'
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 03/16] mm: Mark pages in use for page tables
  2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
  2018-04-30 20:22 ` [PATCH v4 01/16] s390: Use _refcount for pgtables Matthew Wilcox
  2018-04-30 20:22 ` [PATCH v4 02/16] mm: Split page_type out from _mapcount Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
  2018-04-30 20:22 ` [PATCH v4 04/16] mm: Switch s_mem and slab_cache in struct page Matthew Wilcox
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

Define a new PageTable bit in the page_type and use it to mark pages in
use as page tables.  This can be helpful when debugging crashdumps or
analysing memory fragmentation.  Add a KPF flag to report these pages
to userspace and update page-types.c to interpret that flag.

Note that only pages currently accounted as NR_PAGETABLES are tracked
as PageTable; this does not include pgd/p4d/pud/pmd pages.  Those will
be the subject of a later patch.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
 fs/proc/page.c                         | 2 ++
 include/linux/mm.h                     | 2 ++
 include/linux/page-flags.h             | 6 ++++++
 include/uapi/linux/kernel-page-flags.h | 2 +-
 tools/vm/page-types.c                  | 1 +
 5 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/proc/page.c b/fs/proc/page.c
index 1491918a33c3..792c78a49174 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -154,6 +154,8 @@ u64 stable_page_flags(struct page *page)
 
 	if (PageBalloon(page))
 		u |= 1 << KPF_BALLOON;
+	if (PageTable(page))
+		u |= 1 << KPF_PGTABLE;
 
 	if (page_is_idle(page))
 		u |= 1 << KPF_IDLE;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 974e8f8ffe03..5c6069219425 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1819,6 +1819,7 @@ static inline bool pgtable_page_ctor(struct page *page)
 {
 	if (!ptlock_init(page))
 		return false;
+	__SetPageTable(page);
 	inc_zone_page_state(page, NR_PAGETABLE);
 	return true;
 }
@@ -1826,6 +1827,7 @@ static inline bool pgtable_page_ctor(struct page *page)
 static inline void pgtable_page_dtor(struct page *page)
 {
 	pte_lock_deinit(page);
+	__ClearPageTable(page);
 	dec_zone_page_state(page, NR_PAGETABLE);
 }
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 8c25b28a35aa..901943e4754b 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -655,6 +655,7 @@ PAGEFLAG_FALSE(DoubleMap)
 #define PG_buddy	0x00000080
 #define PG_balloon	0x00000100
 #define PG_kmemcg	0x00000200
+#define PG_table	0x00000400
 
 #define PageType(page, flag)						\
 	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
@@ -693,6 +694,11 @@ PAGE_TYPE_OPS(Balloon, balloon)
  */
 PAGE_TYPE_OPS(Kmemcg, kmemcg)
 
+/*
+ * Marks pages in use as page tables.
+ */
+PAGE_TYPE_OPS(Table, table)
+
 extern bool is_free_buddy_page(struct page *page);
 
 __PAGEFLAG(Isolated, isolated, PF_ANY);
diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h
index fa139841ec18..21b9113c69da 100644
--- a/include/uapi/linux/kernel-page-flags.h
+++ b/include/uapi/linux/kernel-page-flags.h
@@ -35,6 +35,6 @@
 #define KPF_BALLOON		23
 #define KPF_ZERO_PAGE		24
 #define KPF_IDLE		25
-
+#define KPF_PGTABLE		26
 
 #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
index a8783f48f77f..cce853dca691 100644
--- a/tools/vm/page-types.c
+++ b/tools/vm/page-types.c
@@ -131,6 +131,7 @@ static const char * const page_flag_names[] = {
 	[KPF_KSM]		= "x:ksm",
 	[KPF_THP]		= "t:thp",
 	[KPF_BALLOON]		= "o:balloon",
+	[KPF_PGTABLE]		= "g:pgtable",
 	[KPF_ZERO_PAGE]		= "z:zero_page",
 	[KPF_IDLE]              = "i:idle_page",
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 04/16] mm: Switch s_mem and slab_cache in struct page
  2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
                   ` (2 preceding siblings ...)
  2018-04-30 20:22 ` [PATCH v4 03/16] mm: Mark pages in use for page tables Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
  2018-04-30 20:22 ` [PATCH v4 05/16] mm: Move 'private' union within " Matthew Wilcox
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

This will allow us to store slub's counters in the same bits as slab's
s_mem.  slub now needs to set page->mapping to NULL as it frees the page,
just like slab does.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/mm_types.h | 4 ++--
 mm/slub.c                | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 41828fb34860..e97a310a6abe 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -83,7 +83,7 @@ struct page {
 		/* See page-flags.h for the definition of PAGE_MAPPING_FLAGS */
 		struct address_space *mapping;
 
-		void *s_mem;			/* slab first object */
+		struct kmem_cache *slab_cache;	/* SL[AU]B: Pointer to slab */
 		atomic_t compound_mapcount;	/* first tail page */
 		/* page_deferred_list().next	 -- second tail page */
 	};
@@ -194,7 +194,7 @@ struct page {
 		spinlock_t ptl;
 #endif
 #endif
-		struct kmem_cache *slab_cache;	/* SL[AU]B: Pointer to slab */
+		void *s_mem;			/* slab first object */
 	};
 
 #ifdef CONFIG_MEMCG
diff --git a/mm/slub.c b/mm/slub.c
index e938184ac847..7fc13c46e975 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1690,6 +1690,7 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
 	__ClearPageSlab(page);
 
 	page_mapcount_reset(page);
+	page->mapping = NULL;
 	if (current->reclaim_state)
 		current->reclaim_state->reclaimed_slab += pages;
 	memcg_uncharge_slab(page, order, s);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 05/16] mm: Move 'private' union within struct page
  2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
                   ` (3 preceding siblings ...)
  2018-04-30 20:22 ` [PATCH v4 04/16] mm: Switch s_mem and slab_cache in struct page Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
  2018-04-30 20:22 ` [PATCH v4 06/16] mm: Move _refcount out of struct page union Matthew Wilcox
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

By moving page->private to the fourth word of struct page, we can put
the SLUB counters in the same word as SLAB's s_mem and still do the
cmpxchg_double trick.  Now the SLUB counters no longer overlap with the
mapcount or refcount so we can drop the call to page_mapcount_reset().

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm_types.h | 56 ++++++++++++++++++----------------------
 mm/slub.c                |  1 -
 2 files changed, 25 insertions(+), 32 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index e97a310a6abe..23378a789af4 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -65,15 +65,9 @@ struct hmm;
  */
 #ifdef CONFIG_HAVE_ALIGNED_STRUCT_PAGE
 #define _struct_page_alignment	__aligned(2 * sizeof(unsigned long))
-#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE)
-#define _slub_counter_t		unsigned long
 #else
-#define _slub_counter_t		unsigned int
-#endif
-#else /* !CONFIG_HAVE_ALIGNED_STRUCT_PAGE */
 #define _struct_page_alignment
-#define _slub_counter_t		unsigned int
-#endif /* !CONFIG_HAVE_ALIGNED_STRUCT_PAGE */
+#endif
 
 struct page {
 	/* First double word block */
@@ -95,6 +89,30 @@ struct page {
 		/* page_deferred_list().prev	-- second tail page */
 	};
 
+	union {
+		/*
+		 * Mapping-private opaque data:
+		 * Usually used for buffer_heads if PagePrivate
+		 * Used for swp_entry_t if PageSwapCache
+		 * Indicates order in the buddy system if PageBuddy
+		 */
+		unsigned long private;
+#if USE_SPLIT_PTE_PTLOCKS
+#if ALLOC_SPLIT_PTLOCKS
+		spinlock_t *ptl;
+#else
+		spinlock_t ptl;
+#endif
+#endif
+		void *s_mem;			/* slab first object */
+		unsigned long counters;		/* SLUB */
+		struct {			/* SLUB */
+			unsigned inuse:16;
+			unsigned objects:15;
+			unsigned frozen:1;
+		};
+	};
+
 	union {
 		/*
 		 * If the page is neither PageSlab nor mappable to userspace,
@@ -104,13 +122,7 @@ struct page {
 		 */
 		unsigned int page_type;
 
-		_slub_counter_t counters;
 		unsigned int active;		/* SLAB */
-		struct {			/* SLUB */
-			unsigned inuse:16;
-			unsigned objects:15;
-			unsigned frozen:1;
-		};
 		int units;			/* SLOB */
 
 		struct {			/* Page cache */
@@ -179,24 +191,6 @@ struct page {
 #endif
 	};
 
-	union {
-		/*
-		 * Mapping-private opaque data:
-		 * Usually used for buffer_heads if PagePrivate
-		 * Used for swp_entry_t if PageSwapCache
-		 * Indicates order in the buddy system if PageBuddy
-		 */
-		unsigned long private;
-#if USE_SPLIT_PTE_PTLOCKS
-#if ALLOC_SPLIT_PTLOCKS
-		spinlock_t *ptl;
-#else
-		spinlock_t ptl;
-#endif
-#endif
-		void *s_mem;			/* slab first object */
-	};
-
 #ifdef CONFIG_MEMCG
 	struct mem_cgroup *mem_cgroup;
 #endif
diff --git a/mm/slub.c b/mm/slub.c
index 7fc13c46e975..0b4b58740ed8 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1689,7 +1689,6 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
 	__ClearPageSlabPfmemalloc(page);
 	__ClearPageSlab(page);
 
-	page_mapcount_reset(page);
 	page->mapping = NULL;
 	if (current->reclaim_state)
 		current->reclaim_state->reclaimed_slab += pages;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 06/16] mm: Move _refcount out of struct page union
  2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
                   ` (4 preceding siblings ...)
  2018-04-30 20:22 ` [PATCH v4 05/16] mm: Move 'private' union within " Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
  2018-04-30 20:22 ` [PATCH v4 07/16] slub: Remove page->counters Matthew Wilcox
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

Keeping the refcount in the union only encourages people to put
something else in the union which will overlap with _refcount and
eventually explode messily.  pahole reports no fields change location.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm_types.h | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 23378a789af4..9828cd170251 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -113,7 +113,13 @@ struct page {
 		};
 	};
 
-	union {
+	union {		/* This union is 4 bytes in size. */
+		/*
+		 * If the page can be mapped to userspace, encodes the number
+		 * of times this page is referenced by a page table.
+		 */
+		atomic_t _mapcount;
+
 		/*
 		 * If the page is neither PageSlab nor mappable to userspace,
 		 * the value stored here may help determine what this page
@@ -124,22 +130,11 @@ struct page {
 
 		unsigned int active;		/* SLAB */
 		int units;			/* SLOB */
-
-		struct {			/* Page cache */
-			/*
-			 * Count of ptes mapped in mms, to show when
-			 * page is mapped & limit reverse map searches.
-			 */
-			atomic_t _mapcount;
-
-			/*
-			 * Usage count, *USE WRAPPER FUNCTION* when manual
-			 * accounting. See page_ref.h
-			 */
-			atomic_t _refcount;
-		};
 	};
 
+	/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
+	atomic_t _refcount;
+
 	/*
 	 * WARNING: bit 0 of the first word encode PageTail(). That means
 	 * the rest users of the storage space MUST NOT use the bit to
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 07/16] slub: Remove page->counters
  2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
                   ` (5 preceding siblings ...)
  2018-04-30 20:22 ` [PATCH v4 06/16] mm: Move _refcount out of struct page union Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
  2018-05-01 16:48   ` Christopher Lameter
  2018-04-30 20:22 ` [PATCH v4 08/16] mm: Combine first three unions in struct page Matthew Wilcox
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

Use page->private instead, now that these two fields are in the same
location.  Include a compile-time assert that the fields don't get out
of sync.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/mm_types.h |  1 -
 mm/slub.c                | 68 ++++++++++++++++++----------------------
 2 files changed, 31 insertions(+), 38 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 9828cd170251..04d9dc442029 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -105,7 +105,6 @@ struct page {
 #endif
 #endif
 		void *s_mem;			/* slab first object */
-		unsigned long counters;		/* SLUB */
 		struct {			/* SLUB */
 			unsigned inuse:16;
 			unsigned objects:15;
diff --git a/mm/slub.c b/mm/slub.c
index 0b4b58740ed8..04625e3dab13 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -55,8 +55,9 @@
  *   have the ability to do a cmpxchg_double. It only protects the second
  *   double word in the page struct. Meaning
  *	A. page->freelist	-> List of object free in a page
- *	B. page->counters	-> Counters of objects
- *	C. page->frozen		-> frozen state
+ *	B. page->inuse		-> Number of objects in use
+ *	C. page->objects	-> Number of objects in page
+ *	D. page->frozen		-> frozen state
  *
  *   If a slab is frozen then it is exempt from list management. It is not
  *   on any list. The processor that froze the slab is the one who can
@@ -358,17 +359,10 @@ static __always_inline void slab_unlock(struct page *page)
 
 static inline void set_page_slub_counters(struct page *page, unsigned long counters_new)
 {
-	struct page tmp;
-	tmp.counters = counters_new;
-	/*
-	 * page->counters can cover frozen/inuse/objects as well
-	 * as page->_refcount.  If we assign to ->counters directly
-	 * we run the risk of losing updates to page->_refcount, so
-	 * be careful and only assign to the fields we need.
-	 */
-	page->frozen  = tmp.frozen;
-	page->inuse   = tmp.inuse;
-	page->objects = tmp.objects;
+	BUILD_BUG_ON(offsetof(struct page, freelist) + sizeof(void *) !=
+			offsetof(struct page, private));
+	BUILD_BUG_ON(offsetof(struct page, freelist) % (2 * sizeof(void *)));
+	page->private = counters_new;
 }
 
 /* Interrupts must be disabled (for the fallback code to work right) */
@@ -381,7 +375,7 @@ static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct page *page
 #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
     defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
-		if (cmpxchg_double(&page->freelist, &page->counters,
+		if (cmpxchg_double(&page->freelist, &page->private,
 				   freelist_old, counters_old,
 				   freelist_new, counters_new))
 			return true;
@@ -390,7 +384,7 @@ static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct page *page
 	{
 		slab_lock(page);
 		if (page->freelist == freelist_old &&
-					page->counters == counters_old) {
+					page->private == counters_old) {
 			page->freelist = freelist_new;
 			set_page_slub_counters(page, counters_new);
 			slab_unlock(page);
@@ -417,7 +411,7 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
 #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
     defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
 	if (s->flags & __CMPXCHG_DOUBLE) {
-		if (cmpxchg_double(&page->freelist, &page->counters,
+		if (cmpxchg_double(&page->freelist, &page->private,
 				   freelist_old, counters_old,
 				   freelist_new, counters_new))
 			return true;
@@ -429,7 +423,7 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
 		local_irq_save(flags);
 		slab_lock(page);
 		if (page->freelist == freelist_old &&
-					page->counters == counters_old) {
+					page->private == counters_old) {
 			page->freelist = freelist_new;
 			set_page_slub_counters(page, counters_new);
 			slab_unlock(page);
@@ -1787,8 +1781,8 @@ static inline void *acquire_slab(struct kmem_cache *s,
 	 * per cpu allocation list.
 	 */
 	freelist = page->freelist;
-	counters = page->counters;
-	new.counters = counters;
+	counters = page->private;
+	new.private = counters;
 	*objects = new.objects - new.inuse;
 	if (mode) {
 		new.inuse = page->objects;
@@ -1802,7 +1796,7 @@ static inline void *acquire_slab(struct kmem_cache *s,
 
 	if (!__cmpxchg_double_slab(s, page,
 			freelist, counters,
-			new.freelist, new.counters,
+			new.freelist, new.private,
 			"acquire_slab"))
 		return NULL;
 
@@ -2049,15 +2043,15 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page,
 
 		do {
 			prior = page->freelist;
-			counters = page->counters;
+			counters = page->private;
 			set_freepointer(s, freelist, prior);
-			new.counters = counters;
+			new.private = counters;
 			new.inuse--;
 			VM_BUG_ON(!new.frozen);
 
 		} while (!__cmpxchg_double_slab(s, page,
 			prior, counters,
-			freelist, new.counters,
+			freelist, new.private,
 			"drain percpu freelist"));
 
 		freelist = nextfree;
@@ -2080,11 +2074,11 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page,
 redo:
 
 	old.freelist = page->freelist;
-	old.counters = page->counters;
+	old.private = page->private;
 	VM_BUG_ON(!old.frozen);
 
 	/* Determine target state of the slab */
-	new.counters = old.counters;
+	new.private = old.private;
 	if (freelist) {
 		new.inuse--;
 		set_freepointer(s, freelist, old.freelist);
@@ -2145,8 +2139,8 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page,
 
 	l = m;
 	if (!__cmpxchg_double_slab(s, page,
-				old.freelist, old.counters,
-				new.freelist, new.counters,
+				old.freelist, old.private,
+				new.freelist, new.private,
 				"unfreezing slab"))
 		goto redo;
 
@@ -2195,17 +2189,17 @@ static void unfreeze_partials(struct kmem_cache *s,
 		do {
 
 			old.freelist = page->freelist;
-			old.counters = page->counters;
+			old.private = page->private;
 			VM_BUG_ON(!old.frozen);
 
-			new.counters = old.counters;
+			new.private = old.private;
 			new.freelist = old.freelist;
 
 			new.frozen = 0;
 
 		} while (!__cmpxchg_double_slab(s, page,
-				old.freelist, old.counters,
-				new.freelist, new.counters,
+				old.freelist, old.private,
+				new.freelist, new.private,
 				"unfreezing slab"));
 
 		if (unlikely(!new.inuse && n->nr_partial >= s->min_partial)) {
@@ -2494,9 +2488,9 @@ static inline void *get_freelist(struct kmem_cache *s, struct page *page)
 
 	do {
 		freelist = page->freelist;
-		counters = page->counters;
+		counters = page->private;
 
-		new.counters = counters;
+		new.private = counters;
 		VM_BUG_ON(!new.frozen);
 
 		new.inuse = page->objects;
@@ -2504,7 +2498,7 @@ static inline void *get_freelist(struct kmem_cache *s, struct page *page)
 
 	} while (!__cmpxchg_double_slab(s, page,
 		freelist, counters,
-		NULL, new.counters,
+		NULL, new.private,
 		"get_freelist"));
 
 	return freelist;
@@ -2829,9 +2823,9 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
 			n = NULL;
 		}
 		prior = page->freelist;
-		counters = page->counters;
+		counters = page->private;
 		set_freepointer(s, tail, prior);
-		new.counters = counters;
+		new.private = counters;
 		was_frozen = new.frozen;
 		new.inuse -= cnt;
 		if ((!new.inuse || !prior) && !was_frozen) {
@@ -2864,7 +2858,7 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
 
 	} while (!cmpxchg_double_slab(s, page,
 		prior, counters,
-		head, new.counters,
+		head, new.private,
 		"__slab_free"));
 
 	if (likely(!n)) {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 08/16] mm: Combine first three unions in struct page
  2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
                   ` (6 preceding siblings ...)
  2018-04-30 20:22 ` [PATCH v4 07/16] slub: Remove page->counters Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
  2018-04-30 20:22 ` [PATCH v4 09/16] mm: Use page->deferred_list Matthew Wilcox
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

By combining these three one-word unions into one three-word union,
we make it easier for users to add their own multi-word fields to struct
page, as well as making it obvious that SLUB needs to keep its double-word
alignment for its freelist & counters.

No field moves position; verified with pahole.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm_types.h | 65 ++++++++++++++++++++--------------------
 1 file changed, 32 insertions(+), 33 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 04d9dc442029..f0ccb699641d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -70,45 +70,44 @@ struct hmm;
 #endif
 
 struct page {
-	/* First double word block */
 	unsigned long flags;		/* Atomic flags, some possibly
 					 * updated asynchronously */
-	union {
-		/* See page-flags.h for the definition of PAGE_MAPPING_FLAGS */
-		struct address_space *mapping;
-
-		struct kmem_cache *slab_cache;	/* SL[AU]B: Pointer to slab */
+	union {		/* This union is three words (12/24 bytes) in size */
+		struct {	/* Page cache and anonymous pages */
+			/* See page-flags.h for PAGE_MAPPING_FLAGS */
+			struct address_space *mapping;
+			pgoff_t index;		/* Our offset within mapping. */
+			/**
+			 * @private: Mapping-private opaque data.
+			 * Usually used for buffer_heads if PagePrivate.
+			 * Used for swp_entry_t if PageSwapCache.
+			 * Indicates order in the buddy system if PageBuddy.
+			 */
+			unsigned long private;
+		};
+		struct {	/* slab, slob and slub */
+			struct kmem_cache *slab_cache;	/* (slub) */
+			void *freelist;		/* first free object (slub) */
+			void *s_mem;		/* first object */
+		};
+		struct {	/* slub */
+			struct kmem_cache *slub_cache;	/* shared with slab */
+			/* Double-word boundary */
+			void *slub_freelist;		/* shared with slab */
+			unsigned inuse:16;
+			unsigned objects:15;
+			unsigned frozen:1;
+		};
 		atomic_t compound_mapcount;	/* first tail page */
-		/* page_deferred_list().next	 -- second tail page */
-	};
-
-	/* Second double word */
-	union {
-		pgoff_t index;		/* Our offset within mapping. */
-		void *freelist;		/* sl[aou]b first free object */
-		/* page_deferred_list().prev	-- second tail page */
-	};
-
-	union {
-		/*
-		 * Mapping-private opaque data:
-		 * Usually used for buffer_heads if PagePrivate
-		 * Used for swp_entry_t if PageSwapCache
-		 * Indicates order in the buddy system if PageBuddy
-		 */
-		unsigned long private;
-#if USE_SPLIT_PTE_PTLOCKS
+		struct list_head deferred_list; /* second tail page */
+		struct {	/* Page table pages */
+			unsigned long _pt_pad_2;	/* mapping */
+			unsigned long _pt_pad_3;
 #if ALLOC_SPLIT_PTLOCKS
-		spinlock_t *ptl;
+			spinlock_t *ptl;
 #else
-		spinlock_t ptl;
-#endif
+			spinlock_t ptl;
 #endif
-		void *s_mem;			/* slab first object */
-		struct {			/* SLUB */
-			unsigned inuse:16;
-			unsigned objects:15;
-			unsigned frozen:1;
 		};
 	};
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 09/16] mm: Use page->deferred_list
  2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
                   ` (7 preceding siblings ...)
  2018-04-30 20:22 ` [PATCH v4 08/16] mm: Combine first three unions in struct page Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
  2018-04-30 20:22 ` [PATCH v4 10/16] mm: Move lru union within struct page Matthew Wilcox
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

Now that we can represent the location of 'deferred_list' in C instead
of comments, make use of that ability.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c | 7 ++-----
 mm/page_alloc.c  | 2 +-
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a3a1815f8e11..cb0954a6de88 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -483,11 +483,8 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
 
 static inline struct list_head *page_deferred_list(struct page *page)
 {
-	/*
-	 * ->lru in the tail pages is occupied by compound_head.
-	 * Let's use ->mapping + ->index in the second tail page as list_head.
-	 */
-	return (struct list_head *)&page[2].mapping;
+	/* ->lru in the tail pages is occupied by compound_head. */
+	return &page[2].deferred_list;
 }
 
 void prep_transhuge_page(struct page *page)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 88e817d7ccef..18720eccbce1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -953,7 +953,7 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
 	case 2:
 		/*
 		 * the second tail page: ->mapping is
-		 * page_deferred_list().next -- ignore value.
+		 * deferred_list.next -- ignore value.
 		 */
 		break;
 	default:
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 10/16] mm: Move lru union within struct page
  2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
                   ` (8 preceding siblings ...)
  2018-04-30 20:22 ` [PATCH v4 09/16] mm: Use page->deferred_list Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
  2018-04-30 20:22 ` [PATCH v4 11/16] mm: Combine first two unions in " Matthew Wilcox
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

Since the LRU is two words, this does not affect the double-word
alignment of SLUB's freelist.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm_types.h | 102 +++++++++++++++++++--------------------
 1 file changed, 51 insertions(+), 51 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index f0ccb699641d..935944c7438d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -72,6 +72,57 @@ struct hmm;
 struct page {
 	unsigned long flags;		/* Atomic flags, some possibly
 					 * updated asynchronously */
+	/*
+	 * WARNING: bit 0 of the first word encode PageTail(). That means
+	 * the rest users of the storage space MUST NOT use the bit to
+	 * avoid collision and false-positive PageTail().
+	 */
+	union {
+		struct list_head lru;	/* Pageout list, eg. active_list
+					 * protected by zone_lru_lock !
+					 * Can be used as a generic list
+					 * by the page owner.
+					 */
+		struct dev_pagemap *pgmap; /* ZONE_DEVICE pages are never on an
+					    * lru or handled by a slab
+					    * allocator, this points to the
+					    * hosting device page map.
+					    */
+		struct {		/* slub per cpu partial pages */
+			struct page *next;	/* Next partial slab */
+#ifdef CONFIG_64BIT
+			int pages;	/* Nr of partial slabs left */
+			int pobjects;	/* Approximate # of objects */
+#else
+			short int pages;
+			short int pobjects;
+#endif
+		};
+
+		struct rcu_head rcu_head;	/* Used by SLAB
+						 * when destroying via RCU
+						 */
+		/* Tail pages of compound page */
+		struct {
+			unsigned long compound_head; /* If bit zero is set */
+
+			/* First tail page only */
+			unsigned char compound_dtor;
+			unsigned char compound_order;
+			/* two/six bytes available here */
+		};
+
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
+		struct {
+			unsigned long __pad;	/* do not overlay pmd_huge_pte
+						 * with compound_head to avoid
+						 * possible bit 0 collision.
+						 */
+			pgtable_t pmd_huge_pte; /* protected by page->ptl */
+		};
+#endif
+	};
+
 	union {		/* This union is three words (12/24 bytes) in size */
 		struct {	/* Page cache and anonymous pages */
 			/* See page-flags.h for PAGE_MAPPING_FLAGS */
@@ -133,57 +184,6 @@ struct page {
 	/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
 	atomic_t _refcount;
 
-	/*
-	 * WARNING: bit 0 of the first word encode PageTail(). That means
-	 * the rest users of the storage space MUST NOT use the bit to
-	 * avoid collision and false-positive PageTail().
-	 */
-	union {
-		struct list_head lru;	/* Pageout list, eg. active_list
-					 * protected by zone_lru_lock !
-					 * Can be used as a generic list
-					 * by the page owner.
-					 */
-		struct dev_pagemap *pgmap; /* ZONE_DEVICE pages are never on an
-					    * lru or handled by a slab
-					    * allocator, this points to the
-					    * hosting device page map.
-					    */
-		struct {		/* slub per cpu partial pages */
-			struct page *next;	/* Next partial slab */
-#ifdef CONFIG_64BIT
-			int pages;	/* Nr of partial slabs left */
-			int pobjects;	/* Approximate # of objects */
-#else
-			short int pages;
-			short int pobjects;
-#endif
-		};
-
-		struct rcu_head rcu_head;	/* Used by SLAB
-						 * when destroying via RCU
-						 */
-		/* Tail pages of compound page */
-		struct {
-			unsigned long compound_head; /* If bit zero is set */
-
-			/* First tail page only */
-			unsigned char compound_dtor;
-			unsigned char compound_order;
-			/* two/six bytes available here */
-		};
-
-#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
-		struct {
-			unsigned long __pad;	/* do not overlay pmd_huge_pte
-						 * with compound_head to avoid
-						 * possible bit 0 collision.
-						 */
-			pgtable_t pmd_huge_pte; /* protected by page->ptl */
-		};
-#endif
-	};
-
 #ifdef CONFIG_MEMCG
 	struct mem_cgroup *mem_cgroup;
 #endif
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 11/16] mm: Combine first two unions in struct page
  2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
                   ` (9 preceding siblings ...)
  2018-04-30 20:22 ` [PATCH v4 10/16] mm: Move lru union within struct page Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
  2018-04-30 20:22 ` [PATCH v4 12/16] mm: Improve struct page documentation Matthew Wilcox
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

This gives us five words of space in a single union in struct page.
The compound_mapcount moves position (from offset 24 to offset 20)
on 64-bit systems, but that does not seem likely to cause any trouble.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm_types.h | 92 ++++++++++++++++++----------------------
 mm/page_alloc.c          |  2 +-
 2 files changed, 43 insertions(+), 51 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 935944c7438d..1d1552767a89 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -73,58 +73,19 @@ struct page {
 	unsigned long flags;		/* Atomic flags, some possibly
 					 * updated asynchronously */
 	/*
-	 * WARNING: bit 0 of the first word encode PageTail(). That means
-	 * the rest users of the storage space MUST NOT use the bit to
+	 * Five words (20/40 bytes) are available in this union.
+	 * WARNING: bit 0 of the first word is used for PageTail(). That
+	 * means the other users of this union MUST NOT use the bit to
 	 * avoid collision and false-positive PageTail().
 	 */
 	union {
-		struct list_head lru;	/* Pageout list, eg. active_list
-					 * protected by zone_lru_lock !
-					 * Can be used as a generic list
-					 * by the page owner.
-					 */
-		struct dev_pagemap *pgmap; /* ZONE_DEVICE pages are never on an
-					    * lru or handled by a slab
-					    * allocator, this points to the
-					    * hosting device page map.
-					    */
-		struct {		/* slub per cpu partial pages */
-			struct page *next;	/* Next partial slab */
-#ifdef CONFIG_64BIT
-			int pages;	/* Nr of partial slabs left */
-			int pobjects;	/* Approximate # of objects */
-#else
-			short int pages;
-			short int pobjects;
-#endif
-		};
-
-		struct rcu_head rcu_head;	/* Used by SLAB
-						 * when destroying via RCU
-						 */
-		/* Tail pages of compound page */
-		struct {
-			unsigned long compound_head; /* If bit zero is set */
-
-			/* First tail page only */
-			unsigned char compound_dtor;
-			unsigned char compound_order;
-			/* two/six bytes available here */
-		};
-
-#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
-		struct {
-			unsigned long __pad;	/* do not overlay pmd_huge_pte
-						 * with compound_head to avoid
-						 * possible bit 0 collision.
-						 */
-			pgtable_t pmd_huge_pte; /* protected by page->ptl */
-		};
-#endif
-	};
-
-	union {		/* This union is three words (12/24 bytes) in size */
 		struct {	/* Page cache and anonymous pages */
+			/**
+			 * @lru: Pageout list, eg. active_list protected by
+			 * zone_lru_lock.  Sometimes used as a generic list
+			 * by the page owner.
+			 */
+			struct list_head lru;
 			/* See page-flags.h for PAGE_MAPPING_FLAGS */
 			struct address_space *mapping;
 			pgoff_t index;		/* Our offset within mapping. */
@@ -137,11 +98,20 @@ struct page {
 			unsigned long private;
 		};
 		struct {	/* slab, slob and slub */
+			struct list_head slab_list;	/* shared with lru */
 			struct kmem_cache *slab_cache;	/* (slub) */
 			void *freelist;		/* first free object (slub) */
 			void *s_mem;		/* first object */
 		};
 		struct {	/* slub */
+			struct page *next;	/* Next partial slab */
+#ifdef CONFIG_64BIT
+			int pages;	/* Nr of partial slabs left */
+			int pobjects;	/* Approximate # of objects */
+#else
+			short int pages;
+			short int pobjects;
+#endif
 			struct kmem_cache *slub_cache;	/* shared with slab */
 			/* Double-word boundary */
 			void *slub_freelist;		/* shared with slab */
@@ -149,9 +119,22 @@ struct page {
 			unsigned objects:15;
 			unsigned frozen:1;
 		};
-		atomic_t compound_mapcount;	/* first tail page */
-		struct list_head deferred_list; /* second tail page */
+		struct {	/* Tail pages of compound page */
+			unsigned long compound_head;	/* Bit zero is set */
+
+			/* First tail page only */
+			unsigned char compound_dtor;
+			unsigned char compound_order;
+			atomic_t compound_mapcount;
+		};
+		struct {	/* Second tail page of compound page */
+			unsigned long _compound_pad_1;	/* compound_head */
+			unsigned long _compound_pad_2;
+			struct list_head deferred_list;
+		};
 		struct {	/* Page table pages */
+			unsigned long _pt_pad_1;	/* compound_head */
+			pgtable_t pmd_huge_pte; /* protected by page->ptl */
 			unsigned long _pt_pad_2;	/* mapping */
 			unsigned long _pt_pad_3;
 #if ALLOC_SPLIT_PTLOCKS
@@ -160,6 +143,15 @@ struct page {
 			spinlock_t ptl;
 #endif
 		};
+
+		/** @rcu_head: You can use this to free a page by RCU. */
+		struct rcu_head rcu_head;
+
+		/**
+		 * @pgmap: For ZONE_DEVICE pages, this points to the hosting
+		 * device page map.
+		 */
+		struct dev_pagemap *pgmap;
 	};
 
 	union {		/* This union is 4 bytes in size. */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 18720eccbce1..d1e4df7d57bf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -944,7 +944,7 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
 	}
 	switch (page - head_page) {
 	case 1:
-		/* the first tail page: ->mapping is compound_mapcount() */
+		/* the first tail page: ->mapping may be compound_mapcount() */
 		if (unlikely(compound_mapcount(page))) {
 			bad_page(page, "nonzero compound_mapcount", 0);
 			goto out;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 12/16] mm: Improve struct page documentation
  2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
                   ` (10 preceding siblings ...)
  2018-04-30 20:22 ` [PATCH v4 11/16] mm: Combine first two unions in " Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
  2018-04-30 20:22 ` [PATCH v4 13/16] mm: Add pt_mm to struct page Matthew Wilcox
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

Rewrite the documentation to describe what you can use in struct
page rather than what you can't.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
---
 include/linux/mm_types.h | 40 +++++++++++++++++++---------------------
 1 file changed, 19 insertions(+), 21 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 1d1552767a89..e0e74e91f3e8 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -33,29 +33,27 @@ struct hmm;
  * it to keep track of whatever it is we are using the page for at the
  * moment. Note that we have no way to track which tasks are using
  * a page, though if it is a pagecache page, rmap structures can tell us
- * who is mapping it. If you allocate the page using alloc_pages(), you
- * can use some of the space in struct page for your own purposes.
+ * who is mapping it.
  *
- * Pages that were once in the page cache may be found under the RCU lock
- * even after they have been recycled to a different purpose.  The page
- * cache reads and writes some of the fields in struct page to pin the
- * page before checking that it's still in the page cache.  It is vital
- * that all users of struct page:
- * 1. Use the first word as PageFlags.
- * 2. Clear or preserve bit 0 of page->compound_head.  It is used as
- *    PageTail for compound pages, and the page cache must not see false
- *    positives.  Some users put a pointer here (guaranteed to be at least
- *    4-byte aligned), other users avoid using the field altogether.
- * 3. page->_refcount must either not be used, or must be used in such a
- *    way that other CPUs temporarily incrementing and then decrementing the
- *    refcount does not cause problems.  On receiving the page from
- *    alloc_pages(), the refcount will be positive.
- * 4. Either preserve page->_mapcount or restore it to -1 before freeing it.
+ * If you allocate the page using alloc_pages(), you can use some of the
+ * space in struct page for your own purposes.  The five words in the first
+ * union are available, except for bit 0 of the first word which must be
+ * kept clear.  Many users use this word to store a pointer to an object
+ * which is guaranteed to be aligned.  If you use the same storage as
+ * page->mapping, you must restore it to NULL before freeing the page.
  *
- * If you allocate pages of order > 0, you can use the fields in the struct
- * page associated with each page, but bear in mind that the pages may have
- * been inserted individually into the page cache, so you must use the above
- * four fields in a compatible way for each struct page.
+ * If your page will not be mapped to userspace, you can also use the 4
+ * bytes in the second union, but you must call page_mapcount_reset()
+ * before freeing it.
+ *
+ * If you want to use the refcount field, it must be used in such a way
+ * that other CPUs temporarily incrementing and then decrementing the
+ * refcount does not cause problems.  On receiving the page from
+ * alloc_pages(), the refcount will be positive.
+ *
+ * If you allocate pages of order > 0, you can use some of the fields
+ * in each subpage, but you may need to restore some of their values
+ * afterwards.
  *
  * SLUB uses cmpxchg_double() to atomically update its freelist and
  * counters.  That requires that freelist & counters be adjacent and
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 13/16] mm: Add pt_mm to struct page
  2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
                   ` (11 preceding siblings ...)
  2018-04-30 20:22 ` [PATCH v4 12/16] mm: Improve struct page documentation Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
  2018-05-02  8:12   ` Kirill A. Shutemov
  2018-04-30 20:22 ` [PATCH v4 14/16] mm: Add hmm_data " Matthew Wilcox
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

x86 overloads the page->index field to store a pointer to the mm_struct.
Rename this to pt_mm so it's visible to other users.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 arch/x86/mm/pgtable.c    | 5 ++---
 include/linux/mm_types.h | 2 +-
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ffc8c13c50e4..938dbcd46b97 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -114,13 +114,12 @@ static inline void pgd_list_del(pgd_t *pgd)
 
 static void pgd_set_mm(pgd_t *pgd, struct mm_struct *mm)
 {
-	BUILD_BUG_ON(sizeof(virt_to_page(pgd)->index) < sizeof(mm));
-	virt_to_page(pgd)->index = (pgoff_t)mm;
+	virt_to_page(pgd)->pt_mm = mm;
 }
 
 struct mm_struct *pgd_page_get_mm(struct page *page)
 {
-	return (struct mm_struct *)page->index;
+	return page->pt_mm;
 }
 
 static void pgd_ctor(struct mm_struct *mm, pgd_t *pgd)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index e0e74e91f3e8..0e6117123737 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -134,7 +134,7 @@ struct page {
 			unsigned long _pt_pad_1;	/* compound_head */
 			pgtable_t pmd_huge_pte; /* protected by page->ptl */
 			unsigned long _pt_pad_2;	/* mapping */
-			unsigned long _pt_pad_3;
+			struct mm_struct *pt_mm;
 #if ALLOC_SPLIT_PTLOCKS
 			spinlock_t *ptl;
 #else
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 14/16] mm: Add hmm_data to struct page
  2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
                   ` (12 preceding siblings ...)
  2018-04-30 20:22 ` [PATCH v4 13/16] mm: Add pt_mm to struct page Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
  2018-04-30 20:22 ` [PATCH v4 15/16] slab,slub: Remove rcu_head size checks Matthew Wilcox
  2018-04-30 20:22 ` [PATCH v4 16/16] slub: Remove kmem_cache->reserved Matthew Wilcox
  15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

Make hmm_data an explicit member of the struct page union.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/hmm.h      |  8 ++------
 include/linux/mm_types.h | 14 +++++++++-----
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 39988924de3a..91c1b2dccbbb 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -522,9 +522,7 @@ void hmm_devmem_remove(struct hmm_devmem *devmem);
 static inline void hmm_devmem_page_set_drvdata(struct page *page,
 					       unsigned long data)
 {
-	unsigned long *drvdata = (unsigned long *)&page->pgmap;
-
-	drvdata[1] = data;
+	page->hmm_data = data;
 }
 
 /*
@@ -535,9 +533,7 @@ static inline void hmm_devmem_page_set_drvdata(struct page *page,
  */
 static inline unsigned long hmm_devmem_page_get_drvdata(const struct page *page)
 {
-	const unsigned long *drvdata = (const unsigned long *)&page->pgmap;
-
-	return drvdata[1];
+	return page->hmm_data;
 }
 
 
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 0e6117123737..42619e16047f 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -145,11 +145,15 @@ struct page {
 		/** @rcu_head: You can use this to free a page by RCU. */
 		struct rcu_head rcu_head;
 
-		/**
-		 * @pgmap: For ZONE_DEVICE pages, this points to the hosting
-		 * device page map.
-		 */
-		struct dev_pagemap *pgmap;
+		struct {
+			/**
+			 * @pgmap: For ZONE_DEVICE pages, this points to the
+			 * hosting device page map.
+			 */
+			struct dev_pagemap *pgmap;
+			unsigned long hmm_data;
+			unsigned long _zd_pad_1;	/* uses mapping */
+		};
 	};
 
 	union {		/* This union is 4 bytes in size. */
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 15/16] slab,slub: Remove rcu_head size checks
  2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
                   ` (13 preceding siblings ...)
  2018-04-30 20:22 ` [PATCH v4 14/16] mm: Add hmm_data " Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
  2018-05-01 16:46   ` Christopher Lameter
  2018-04-30 20:22 ` [PATCH v4 16/16] slub: Remove kmem_cache->reserved Matthew Wilcox
  15 siblings, 1 reply; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

rcu_head may now grow larger than list_head without affecting slab or
slub.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 mm/slab.c |  2 --
 mm/slub.c | 27 ++-------------------------
 2 files changed, 2 insertions(+), 27 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index e387a17d6d56..e6ab1327db25 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1235,8 +1235,6 @@ void __init kmem_cache_init(void)
 {
 	int i;
 
-	BUILD_BUG_ON(sizeof(((struct page *)NULL)->lru) <
-					sizeof(struct rcu_head));
 	kmem_cache = &kmem_cache_boot;
 
 	if (!IS_ENABLED(CONFIG_NUMA) || num_possible_nodes() == 1)
diff --git a/mm/slub.c b/mm/slub.c
index 04625e3dab13..27cc2956acba 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1690,17 +1690,9 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
 	__free_pages(page, order);
 }
 
-#define need_reserve_slab_rcu						\
-	(sizeof(((struct page *)NULL)->lru) < sizeof(struct rcu_head))
-
 static void rcu_free_slab(struct rcu_head *h)
 {
-	struct page *page;
-
-	if (need_reserve_slab_rcu)
-		page = virt_to_head_page(h);
-	else
-		page = container_of((struct list_head *)h, struct page, lru);
+	struct page *page = container_of(h, struct page, rcu_head);
 
 	__free_slab(page->slab_cache, page);
 }
@@ -1708,19 +1700,7 @@ static void rcu_free_slab(struct rcu_head *h)
 static void free_slab(struct kmem_cache *s, struct page *page)
 {
 	if (unlikely(s->flags & SLAB_TYPESAFE_BY_RCU)) {
-		struct rcu_head *head;
-
-		if (need_reserve_slab_rcu) {
-			int order = compound_order(page);
-			int offset = (PAGE_SIZE << order) - s->reserved;
-
-			VM_BUG_ON(s->reserved != sizeof(*head));
-			head = page_address(page) + offset;
-		} else {
-			head = &page->rcu_head;
-		}
-
-		call_rcu(head, rcu_free_slab);
+		call_rcu(&page->rcu_head, rcu_free_slab);
 	} else
 		__free_slab(s, page);
 }
@@ -3587,9 +3567,6 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
 	s->random = get_random_long();
 #endif
 
-	if (need_reserve_slab_rcu && (s->flags & SLAB_TYPESAFE_BY_RCU))
-		s->reserved = sizeof(struct rcu_head);
-
 	if (!calculate_sizes(s, -1))
 		goto error;
 	if (disable_higher_order_debug) {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 16/16] slub: Remove kmem_cache->reserved
  2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
                   ` (14 preceding siblings ...)
  2018-04-30 20:22 ` [PATCH v4 15/16] slab,slub: Remove rcu_head size checks Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
  2018-05-01 16:43   ` Christopher Lameter
  15 siblings, 1 reply; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

The reserved field was only used for embedding an rcu_head in the data
structure.  With the previous commit, we no longer need it.  That lets
us remove the 'reserved' argument to a lot of functions.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/slub_def.h |  1 -
 mm/slub.c                | 41 ++++++++++++++++++++--------------------
 2 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 3773e26c08c1..09fa2c6f0e68 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -101,7 +101,6 @@ struct kmem_cache {
 	void (*ctor)(void *);
 	unsigned int inuse;		/* Offset to metadata */
 	unsigned int align;		/* Alignment */
-	unsigned int reserved;		/* Reserved bytes at the end of slabs */
 	unsigned int red_left_pad;	/* Left redzone padding size */
 	const char *name;	/* Name (only for display!) */
 	struct list_head list;	/* List of slab caches */
diff --git a/mm/slub.c b/mm/slub.c
index 27cc2956acba..01c2183aa3d7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -317,16 +317,16 @@ static inline unsigned int slab_index(void *p, struct kmem_cache *s, void *addr)
 	return (p - addr) / s->size;
 }
 
-static inline unsigned int order_objects(unsigned int order, unsigned int size, unsigned int reserved)
+static inline unsigned int order_objects(unsigned int order, unsigned int size)
 {
-	return (((unsigned int)PAGE_SIZE << order) - reserved) / size;
+	return ((unsigned int)PAGE_SIZE << order) / size;
 }
 
 static inline struct kmem_cache_order_objects oo_make(unsigned int order,
-		unsigned int size, unsigned int reserved)
+		unsigned int size)
 {
 	struct kmem_cache_order_objects x = {
-		(order << OO_SHIFT) + order_objects(order, size, reserved)
+		(order << OO_SHIFT) + order_objects(order, size)
 	};
 
 	return x;
@@ -841,7 +841,7 @@ static int slab_pad_check(struct kmem_cache *s, struct page *page)
 		return 1;
 
 	start = page_address(page);
-	length = (PAGE_SIZE << compound_order(page)) - s->reserved;
+	length = PAGE_SIZE << compound_order(page);
 	end = start + length;
 	remainder = length % s->size;
 	if (!remainder)
@@ -930,7 +930,7 @@ static int check_slab(struct kmem_cache *s, struct page *page)
 		return 0;
 	}
 
-	maxobj = order_objects(compound_order(page), s->size, s->reserved);
+	maxobj = order_objects(compound_order(page), s->size);
 	if (page->objects > maxobj) {
 		slab_err(s, page, "objects %u > max %u",
 			page->objects, maxobj);
@@ -980,7 +980,7 @@ static int on_freelist(struct kmem_cache *s, struct page *page, void *search)
 		nr++;
 	}
 
-	max_objects = order_objects(compound_order(page), s->size, s->reserved);
+	max_objects = order_objects(compound_order(page), s->size);
 	if (max_objects > MAX_OBJS_PER_PAGE)
 		max_objects = MAX_OBJS_PER_PAGE;
 
@@ -3197,21 +3197,21 @@ static unsigned int slub_min_objects;
  */
 static inline unsigned int slab_order(unsigned int size,
 		unsigned int min_objects, unsigned int max_order,
-		unsigned int fract_leftover, unsigned int reserved)
+		unsigned int fract_leftover)
 {
 	unsigned int min_order = slub_min_order;
 	unsigned int order;
 
-	if (order_objects(min_order, size, reserved) > MAX_OBJS_PER_PAGE)
+	if (order_objects(min_order, size) > MAX_OBJS_PER_PAGE)
 		return get_order(size * MAX_OBJS_PER_PAGE) - 1;
 
-	for (order = max(min_order, (unsigned int)get_order(min_objects * size + reserved));
+	for (order = max(min_order, (unsigned int)get_order(min_objects * size));
 			order <= max_order; order++) {
 
 		unsigned int slab_size = (unsigned int)PAGE_SIZE << order;
 		unsigned int rem;
 
-		rem = (slab_size - reserved) % size;
+		rem = slab_size % size;
 
 		if (rem <= slab_size / fract_leftover)
 			break;
@@ -3220,7 +3220,7 @@ static inline unsigned int slab_order(unsigned int size,
 	return order;
 }
 
-static inline int calculate_order(unsigned int size, unsigned int reserved)
+static inline int calculate_order(unsigned int size)
 {
 	unsigned int order;
 	unsigned int min_objects;
@@ -3237,7 +3237,7 @@ static inline int calculate_order(unsigned int size, unsigned int reserved)
 	min_objects = slub_min_objects;
 	if (!min_objects)
 		min_objects = 4 * (fls(nr_cpu_ids) + 1);
-	max_objects = order_objects(slub_max_order, size, reserved);
+	max_objects = order_objects(slub_max_order, size);
 	min_objects = min(min_objects, max_objects);
 
 	while (min_objects > 1) {
@@ -3246,7 +3246,7 @@ static inline int calculate_order(unsigned int size, unsigned int reserved)
 		fraction = 16;
 		while (fraction >= 4) {
 			order = slab_order(size, min_objects,
-					slub_max_order, fraction, reserved);
+					slub_max_order, fraction);
 			if (order <= slub_max_order)
 				return order;
 			fraction /= 2;
@@ -3258,14 +3258,14 @@ static inline int calculate_order(unsigned int size, unsigned int reserved)
 	 * We were unable to place multiple objects in a slab. Now
 	 * lets see if we can place a single object there.
 	 */
-	order = slab_order(size, 1, slub_max_order, 1, reserved);
+	order = slab_order(size, 1, slub_max_order, 1);
 	if (order <= slub_max_order)
 		return order;
 
 	/*
 	 * Doh this slab cannot be placed using slub_max_order.
 	 */
-	order = slab_order(size, 1, MAX_ORDER, 1, reserved);
+	order = slab_order(size, 1, MAX_ORDER, 1);
 	if (order < MAX_ORDER)
 		return order;
 	return -ENOSYS;
@@ -3533,7 +3533,7 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
 	if (forced_order >= 0)
 		order = forced_order;
 	else
-		order = calculate_order(size, s->reserved);
+		order = calculate_order(size);
 
 	if ((int)order < 0)
 		return 0;
@@ -3551,8 +3551,8 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
 	/*
 	 * Determine the number of objects per slab
 	 */
-	s->oo = oo_make(order, size, s->reserved);
-	s->min = oo_make(get_order(size), size, s->reserved);
+	s->oo = oo_make(order, size);
+	s->min = oo_make(get_order(size), size);
 	if (oo_objects(s->oo) > oo_objects(s->max))
 		s->max = s->oo;
 
@@ -3562,7 +3562,6 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
 static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
 {
 	s->flags = kmem_cache_flags(s->size, flags, s->name, s->ctor);
-	s->reserved = 0;
 #ifdef CONFIG_SLAB_FREELIST_HARDENED
 	s->random = get_random_long();
 #endif
@@ -5106,7 +5105,7 @@ SLAB_ATTR_RO(destroy_by_rcu);
 
 static ssize_t reserved_show(struct kmem_cache *s, char *buf)
 {
-	return sprintf(buf, "%u\n", s->reserved);
+	return sprintf(buf, "0\n");
 }
 SLAB_ATTR_RO(reserved);
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 16/16] slub: Remove kmem_cache->reserved
  2018-04-30 20:22 ` [PATCH v4 16/16] slub: Remove kmem_cache->reserved Matthew Wilcox
@ 2018-05-01 16:43   ` Christopher Lameter
  0 siblings, 0 replies; 30+ messages in thread
From: Christopher Lameter @ 2018-05-01 16:43 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Lai Jiangshan, Pekka Enberg, Vlastimil Babka, Dave Hansen,
	Jérôme Glisse

On Mon, 30 Apr 2018, Matthew Wilcox wrote:

> The reserved field was only used for embedding an rcu_head in the data
> structure.  With the previous commit, we no longer need it.  That lets
> us remove the 'reserved' argument to a lot of functions.

Great work!

Acked-by: Christoph Lameter <cl@linux.com>

> @@ -5106,7 +5105,7 @@ SLAB_ATTR_RO(destroy_by_rcu);
>
>  static ssize_t reserved_show(struct kmem_cache *s, char *buf)
>  {
> -	return sprintf(buf, "%u\n", s->reserved);
> +	return sprintf(buf, "0\n");
>  }
>  SLAB_ATTR_RO(reserved);


Hmmm... Maybe its better if you remove the reserved file from sysfs
instead? I doubt anyone was using it.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 15/16] slab,slub: Remove rcu_head size checks
  2018-04-30 20:22 ` [PATCH v4 15/16] slab,slub: Remove rcu_head size checks Matthew Wilcox
@ 2018-05-01 16:46   ` Christopher Lameter
  0 siblings, 0 replies; 30+ messages in thread
From: Christopher Lameter @ 2018-05-01 16:46 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Lai Jiangshan, Pekka Enberg, Vlastimil Babka, Dave Hansen,
	Jérôme Glisse

On Mon, 30 Apr 2018, Matthew Wilcox wrote:

> rcu_head may now grow larger than list_head without affecting slab or
> slub.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 07/16] slub: Remove page->counters
  2018-04-30 20:22 ` [PATCH v4 07/16] slub: Remove page->counters Matthew Wilcox
@ 2018-05-01 16:48   ` Christopher Lameter
  2018-05-02 17:26     ` Matthew Wilcox
  0 siblings, 1 reply; 30+ messages in thread
From: Christopher Lameter @ 2018-05-01 16:48 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Lai Jiangshan, Pekka Enberg, Vlastimil Babka, Dave Hansen,
	Jérôme Glisse

On Mon, 30 Apr 2018, Matthew Wilcox wrote:

> Use page->private instead, now that these two fields are in the same
> location.  Include a compile-time assert that the fields don't get out
> of sync.

Hrm. This makes the source code a bit less readable. Guess its ok.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 13/16] mm: Add pt_mm to struct page
  2018-04-30 20:22 ` [PATCH v4 13/16] mm: Add pt_mm to struct page Matthew Wilcox
@ 2018-05-02  8:12   ` Kirill A. Shutemov
  2018-05-03  0:11     ` Matthew Wilcox
  0 siblings, 1 reply; 30+ messages in thread
From: Kirill A. Shutemov @ 2018-05-02  8:12 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

On Mon, Apr 30, 2018 at 01:22:44PM -0700, Matthew Wilcox wrote:
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index e0e74e91f3e8..0e6117123737 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -134,7 +134,7 @@ struct page {
>  			unsigned long _pt_pad_1;	/* compound_head */
>  			pgtable_t pmd_huge_pte; /* protected by page->ptl */
>  			unsigned long _pt_pad_2;	/* mapping */
> -			unsigned long _pt_pad_3;
> +			struct mm_struct *pt_mm;

I guess it worth to have a comment that this field is only used of pgd
page tables and therefore doesn't conflict with pmd_huge_pte.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 07/16] slub: Remove page->counters
  2018-05-01 16:48   ` Christopher Lameter
@ 2018-05-02 17:26     ` Matthew Wilcox
  2018-05-02 22:17       ` Kirill A. Shutemov
  0 siblings, 1 reply; 30+ messages in thread
From: Matthew Wilcox @ 2018-05-02 17:26 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Lai Jiangshan, Pekka Enberg, Vlastimil Babka, Dave Hansen,
	Jérôme Glisse

On Tue, May 01, 2018 at 11:48:53AM -0500, Christopher Lameter wrote:
> On Mon, 30 Apr 2018, Matthew Wilcox wrote:
> 
> > Use page->private instead, now that these two fields are in the same
> > location.  Include a compile-time assert that the fields don't get out
> > of sync.
> 
> Hrm. This makes the source code a bit less readable. Guess its ok.
> 
> Acked-by: Christoph Lameter <cl@linux.com>

Thanks for the ACK.  I'm not thrilled with this particular patch, but
I'm not thrilled with any of the other options we've come up with either.

Option 1:

Patch as written.
Pro: Keeps struct page simple
Con: Hidden dependency on page->private and page->inuse being in the same bits

Option 2:

@@ -113,9 +113,14 @@ struct page {
                        struct kmem_cache *slub_cache;  /* shared with slab */
                        /* Double-word boundary */
                        void *slub_freelist;            /* shared with slab */
-                       unsigned inuse:16;
-                       unsigned objects:15;
-                       unsigned frozen:1;
+                       union {
+                               unsigned long counters;
+                               struct {
+                                       unsigned inuse:16;
+                                       unsigned objects:15;
+                                       unsigned frozen:1;
+                               };
+                       };
                };
                struct {        /* Tail pages of compound page */
                        unsigned long compound_head;    /* Bit zero is set */

Pro: Expresses exactly what we do.
Con: Back to five levels of indentation in struct page

Option 3: Use -fms-extensions to create a slub_page structure.

Pro: Indentation reduced to minimum and no cross-union dependencies
Con: Nobody seemed interested in the idea

Option 4: Use explicit shifting-and-masking to combine the three counters
into one word.

Con: Lots of churn.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 07/16] slub: Remove page->counters
  2018-05-02 17:26     ` Matthew Wilcox
@ 2018-05-02 22:17       ` Kirill A. Shutemov
  2018-05-03  0:52         ` Matthew Wilcox
  0 siblings, 1 reply; 30+ messages in thread
From: Kirill A. Shutemov @ 2018-05-02 22:17 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christopher Lameter, linux-mm, Matthew Wilcox, Andrew Morton,
	Lai Jiangshan, Pekka Enberg, Vlastimil Babka, Dave Hansen,
	Jérôme Glisse

On Wed, May 02, 2018 at 05:26:39PM +0000, Matthew Wilcox wrote:
> Option 2:
> 
> @@ -113,9 +113,14 @@ struct page {
>                         struct kmem_cache *slub_cache;  /* shared with slab */
>                         /* Double-word boundary */
>                         void *slub_freelist;            /* shared with slab */
> -                       unsigned inuse:16;
> -                       unsigned objects:15;
> -                       unsigned frozen:1;
> +                       union {
> +                               unsigned long counters;
> +                               struct {
> +                                       unsigned inuse:16;
> +                                       unsigned objects:15;
> +                                       unsigned frozen:1;
> +                               };
> +                       };
>                 };
>                 struct {        /* Tail pages of compound page */
>                         unsigned long compound_head;    /* Bit zero is set */
> 
> Pro: Expresses exactly what we do.
> Con: Back to five levels of indentation in struct page

The indentation issue can be fixed (to some extend) by declaring the union
outside struct page and just use it inside.

I don't advocate for the approach, just listing the option.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 13/16] mm: Add pt_mm to struct page
  2018-05-02  8:12   ` Kirill A. Shutemov
@ 2018-05-03  0:11     ` Matthew Wilcox
  0 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-05-03  0:11 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

On Wed, May 02, 2018 at 11:12:17AM +0300, Kirill A. Shutemov wrote:
> On Mon, Apr 30, 2018 at 01:22:44PM -0700, Matthew Wilcox wrote:
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index e0e74e91f3e8..0e6117123737 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -134,7 +134,7 @@ struct page {
> >  			unsigned long _pt_pad_1;	/* compound_head */
> >  			pgtable_t pmd_huge_pte; /* protected by page->ptl */
> >  			unsigned long _pt_pad_2;	/* mapping */
> > -			unsigned long _pt_pad_3;
> > +			struct mm_struct *pt_mm;
> 
> I guess it worth to have a comment that this field is only used of pgd
> page tables and therefore doesn't conflict with pmd_huge_pte.

Actually, it doesn't conflict with pmd_huge_pte -- it's in different
bits (both before and after this patch).  What does 'conflict' with
pmd_huge_pte is the use of page->lru in the pgd.  I have a plan to
eliminate that use of pgd->lru, but I need to do a couple of other
things first.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 07/16] slub: Remove page->counters
  2018-05-02 22:17       ` Kirill A. Shutemov
@ 2018-05-03  0:52         ` Matthew Wilcox
  2018-05-03 15:03           ` Christopher Lameter
  0 siblings, 1 reply; 30+ messages in thread
From: Matthew Wilcox @ 2018-05-03  0:52 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Christopher Lameter, linux-mm, Matthew Wilcox, Andrew Morton,
	Lai Jiangshan, Pekka Enberg, Vlastimil Babka, Dave Hansen,
	Jérôme Glisse

On Thu, May 03, 2018 at 01:17:02AM +0300, Kirill A. Shutemov wrote:
> On Wed, May 02, 2018 at 05:26:39PM +0000, Matthew Wilcox wrote:
> > Option 2:
> > +                       union {
> > +                               unsigned long counters;
> > +                               struct {
> > +                                       unsigned inuse:16;
> > +                                       unsigned objects:15;
> > +                                       unsigned frozen:1;
> > +                               };
> > +                       };
> > 
> > Pro: Expresses exactly what we do.
> > Con: Back to five levels of indentation in struct page
> 
> The indentation issue can be fixed (to some extend) by declaring the union
> outside struct page and just use it inside.
> 
> I don't advocate for the approach, just listing the option.

Actually, you can't have an anonymous tagged union without -fms-extensions
(which got zero comments when I proposed it to lkml) or -fplan9-extensions
(which would require gcc 4.6)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 07/16] slub: Remove page->counters
  2018-05-03  0:52         ` Matthew Wilcox
@ 2018-05-03 15:03           ` Christopher Lameter
  2018-05-03 18:28             ` Matthew Wilcox
  0 siblings, 1 reply; 30+ messages in thread
From: Christopher Lameter @ 2018-05-03 15:03 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Kirill A. Shutemov, linux-mm, Matthew Wilcox, Andrew Morton,
	Lai Jiangshan, Pekka Enberg, Vlastimil Babka, Dave Hansen,
	Jérôme Glisse

On Wed, 2 May 2018, Matthew Wilcox wrote:

> > > Option 2:
> > > +                       union {
> > > +                               unsigned long counters;
> > > +                               struct {
> > > +                                       unsigned inuse:16;
> > > +                                       unsigned objects:15;
> > > +                                       unsigned frozen:1;
> > > +                               };
> > > +                       };
> > >
> > > Pro: Expresses exactly what we do.
> > > Con: Back to five levels of indentation in struct page

I like that better. Improves readability of the code using struct page. I
think that is more important than the actual definition of struct page.

Given the overloaded overload situation this will require some deep
throught for newbies anyways. ;-)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 07/16] slub: Remove page->counters
  2018-05-03 15:03           ` Christopher Lameter
@ 2018-05-03 18:28             ` Matthew Wilcox
  2018-05-04 14:55               ` Christopher Lameter
  0 siblings, 1 reply; 30+ messages in thread
From: Matthew Wilcox @ 2018-05-03 18:28 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Kirill A. Shutemov, linux-mm, Matthew Wilcox, Andrew Morton,
	Lai Jiangshan, Pekka Enberg, Vlastimil Babka, Dave Hansen,
	Jérôme Glisse

On Thu, May 03, 2018 at 10:03:10AM -0500, Christopher Lameter wrote:
> On Wed, 2 May 2018, Matthew Wilcox wrote:
> 
> > > > Option 2:
> > > > +                       union {
> > > > +                               unsigned long counters;
> > > > +                               struct {
> > > > +                                       unsigned inuse:16;
> > > > +                                       unsigned objects:15;
> > > > +                                       unsigned frozen:1;
> > > > +                               };
> > > > +                       };
> > > >
> > > > Pro: Expresses exactly what we do.
> > > > Con: Back to five levels of indentation in struct page
> 
> I like that better. Improves readability of the code using struct page. I
> think that is more important than the actual definition of struct page.

OK.  Do you want the conversion of slub to using slub_freelist and slub_list
as part of this patch series as well, then?

The end result looks like this, btw:

                struct {        /* slub */ 
                        union {
                                struct list_head slub_list;
                                struct {
                                        struct page *next; /* Next partial */
#ifdef CONFIG_64BIT
                                        int pages;      /* Nr of pages left */
                                        int pobjects;   /* Apprx # of objects */
#else
                                        short int pages;
                                        short int pobjects;
#endif
                                };
                        };
                        struct kmem_cache *slub_cache;  /* shared with slab */
                        /* Double-word boundary */
                        void *slub_freelist;            /* shared with slab */
                        union {
                                unsigned long counters;
                                struct {
                                        unsigned inuse:16;
                                        unsigned objects:15;
                                        unsigned frozen:1;
                                };
                        };
                };

Oh, and what do you want to do about cache_from_obj() in mm/slab.h?
That relies on having slab_cache be in the same location in struct
page as slub_cache.  Maybe something like this?

        page = virt_to_head_page(x);
#ifdef CONFIG_SLUB
        cachep = page->slub_cache;
#else
        cachep = page->slab_cache;
#endif
        if (slab_equal_or_root(cachep, s))
                return cachep;

> Given the overloaded overload situation this will require some deep
> throught for newbies anyways. ;-)

Yes, it's all quite entangled.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 07/16] slub: Remove page->counters
  2018-05-03 18:28             ` Matthew Wilcox
@ 2018-05-04 14:55               ` Christopher Lameter
  2018-05-04 15:15                 ` Matthew Wilcox
  0 siblings, 1 reply; 30+ messages in thread
From: Christopher Lameter @ 2018-05-04 14:55 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Kirill A. Shutemov, linux-mm, Matthew Wilcox, Andrew Morton,
	Lai Jiangshan, Pekka Enberg, Vlastimil Babka, Dave Hansen,
	Jérôme Glisse

On Thu, 3 May 2018, Matthew Wilcox wrote:

> OK.  Do you want the conversion of slub to using slub_freelist and slub_list
> as part of this patch series as well, then?

Not sure if that is needed. Dont like allocator specific names.

> Oh, and what do you want to do about cache_from_obj() in mm/slab.h?
> That relies on having slab_cache be in the same location in struct
> page as slub_cache.  Maybe something like this?
>
>         page = virt_to_head_page(x);
> #ifdef CONFIG_SLUB
>         cachep = page->slub_cache;
> #else
>         cachep = page->slab_cache;
> #endif
>         if (slab_equal_or_root(cachep, s))
>                 return cachep;

Name the field "cache" instead of sl?b_cache?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 07/16] slub: Remove page->counters
  2018-05-04 14:55               ` Christopher Lameter
@ 2018-05-04 15:15                 ` Matthew Wilcox
  2018-05-04 16:29                   ` Christopher Lameter
  0 siblings, 1 reply; 30+ messages in thread
From: Matthew Wilcox @ 2018-05-04 15:15 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Kirill A. Shutemov, linux-mm, Matthew Wilcox, Andrew Morton,
	Lai Jiangshan, Pekka Enberg, Vlastimil Babka, Dave Hansen,
	Jérôme Glisse

On Fri, May 04, 2018 at 09:55:30AM -0500, Christopher Lameter wrote:
> On Thu, 3 May 2018, Matthew Wilcox wrote:
> 
> > OK.  Do you want the conversion of slub to using slub_freelist and slub_list
> > as part of this patch series as well, then?
> 
> Not sure if that is needed. Dont like allocator specific names.

So you'd rather have one union that's used for slab/slob/slub?  Like this?

                struct {        /* slab, slob and slub */
                        union {
                                struct list_head slab_list;
                                struct {        /* Partial pages */
                                        struct page *next;
#ifdef CONFIG_64BIT
                                        int pages;      /* Nr of pages left */
                                        int pobjects;   /* Approximate count */
#else
                                        short int pages;
                                        short int pobjects;
#endif
                                };
                        };
                        struct kmem_cache *slab_cache;
                        /* Double-word boundary */
                        void *freelist;         /* first free object */
                        union {
                                void *s_mem;    /* first object (slab only) */
                                unsigned long counters; /* slub */
                                struct {
                                        unsigned inuse:16;
                                        unsigned objects:15;
                                        unsigned frozen:1;
                                };
                        };
                };

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 07/16] slub: Remove page->counters
  2018-05-04 15:15                 ` Matthew Wilcox
@ 2018-05-04 16:29                   ` Christopher Lameter
  0 siblings, 0 replies; 30+ messages in thread
From: Christopher Lameter @ 2018-05-04 16:29 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Kirill A. Shutemov, linux-mm, Matthew Wilcox, Andrew Morton,
	Lai Jiangshan, Pekka Enberg, Vlastimil Babka, Dave Hansen,
	Jérôme Glisse

On Fri, 4 May 2018, Matthew Wilcox wrote:

> So you'd rather have one union that's used for slab/slob/slub?  Like this?

Yup that looks better.
>
>                 struct {        /* slab, slob and slub */
>                         union {
>                                 struct list_head slab_list;
>                                 struct {        /* Partial pages */
>                                         struct page *next;
> #ifdef CONFIG_64BIT
>                                         int pages;      /* Nr of pages left */
>                                         int pobjects;   /* Approximate count */
> #else
>                                         short int pages;
>                                         short int pobjects;
> #endif
>                                 };
>                         };
>                         struct kmem_cache *slab_cache;
>                         /* Double-word boundary */
>                         void *freelist;         /* first free object */
>                         union {
>                                 void *s_mem;    /* first object (slab only) */
>                                 unsigned long counters; /* slub */
>                                 struct {
>                                         unsigned inuse:16;
>                                         unsigned objects:15;
>                                         unsigned frozen:1;
>                                 };
>                         };
>                 };
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2018-05-04 16:29 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 01/16] s390: Use _refcount for pgtables Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 02/16] mm: Split page_type out from _mapcount Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 03/16] mm: Mark pages in use for page tables Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 04/16] mm: Switch s_mem and slab_cache in struct page Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 05/16] mm: Move 'private' union within " Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 06/16] mm: Move _refcount out of struct page union Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 07/16] slub: Remove page->counters Matthew Wilcox
2018-05-01 16:48   ` Christopher Lameter
2018-05-02 17:26     ` Matthew Wilcox
2018-05-02 22:17       ` Kirill A. Shutemov
2018-05-03  0:52         ` Matthew Wilcox
2018-05-03 15:03           ` Christopher Lameter
2018-05-03 18:28             ` Matthew Wilcox
2018-05-04 14:55               ` Christopher Lameter
2018-05-04 15:15                 ` Matthew Wilcox
2018-05-04 16:29                   ` Christopher Lameter
2018-04-30 20:22 ` [PATCH v4 08/16] mm: Combine first three unions in struct page Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 09/16] mm: Use page->deferred_list Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 10/16] mm: Move lru union within struct page Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 11/16] mm: Combine first two unions in " Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 12/16] mm: Improve struct page documentation Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 13/16] mm: Add pt_mm to struct page Matthew Wilcox
2018-05-02  8:12   ` Kirill A. Shutemov
2018-05-03  0:11     ` Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 14/16] mm: Add hmm_data " Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 15/16] slab,slub: Remove rcu_head size checks Matthew Wilcox
2018-05-01 16:46   ` Christopher Lameter
2018-04-30 20:22 ` [PATCH v4 16/16] slub: Remove kmem_cache->reserved Matthew Wilcox
2018-05-01 16:43   ` Christopher Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.