Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v6 00/17] Rearrange struct page
@ 2018-05-18 19:45 Matthew Wilcox
  2018-05-18 19:45 ` [PATCH v6 01/17] s390: Use _refcount for pgtables Matthew Wilcox
                   ` (16 more replies)
  0 siblings, 17 replies; 32+ messages in thread
From: Matthew Wilcox @ 2018-05-18 19:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

As presented at LSFMM, this patch-set rearranges struct page to give
more contiguous usable space to users who have allocated a struct page
for their own purposes.  For a graphical view of before-and-after, see
the first two tabs of https://docs.google.com/spreadsheets/d/1tvCszs_7FXrjei9_mtFiKV6nW1FLnYyvPvW-qNZhdog/edit?usp=sharing

Highlights:
 - deferred_list now really exists in struct page instead of just a comment.
 - hmm_data also exists in struct page instead of being a nasty hack.
 - x86's PGD pages have a real pointer to the mm_struct.
 - VMalloc pages now have all sorts of extra information stored in them
   to help with debugging and tuning.
 - rcu_head is no longer tied to slab in case anyone else wants to
   free pages by RCU.
 - slub's counters no longer share space with _refcount.
 - slub's freelist+counters are now naturally dword aligned.
 - slub loses a parameter to a lot of functions and a sysfs file.

Changes in v6:
 - More acks added
 - Incorporated Vlastimil's suggestions to improve the comments

Changes in v5:
 - Added acks from Christoph Lameter
 - Dropped patch to make slub use page->private instead of page->counters.
 - Combined slab/slob/slub into one union in struct page.
 - Added patch to distinguish VMalloc pages.
 - Added patch to remove slub's 'reserved' file in sysfs.
 - Call the unions 'main union' and 'mapcount union' instead of 'first
   union' and 'second union'.
 - Removed a line which described which double-word slub's freelist was in.

Changes in v4:
 - Added acks/reviews from Kirill & Randy
 - Removed call to page_mapcount_reset from slub since it no longer uses
   mapcount union.
 - Add pt_mm and hmm_data to struct page

Matthew Wilcox (17):
  s390: Use _refcount for pgtables
  mm: Split page_type out from _mapcount
  mm: Mark pages in use for page tables
  mm: Switch s_mem and slab_cache in struct page
  mm: Move 'private' union within struct page
  mm: Move _refcount out of struct page union
  mm: Combine first three unions in struct page
  mm: Use page->deferred_list
  mm: Move lru union within struct page
  mm: Combine LRU and main union in struct page
  mm: Improve struct page documentation
  mm: Add pt_mm to struct page
  mm: Add hmm_data to struct page
  slab,slub: Remove rcu_head size checks
  slub: Remove kmem_cache->reserved
  slub: Remove 'reserved' file from sysfs
  mm: Distinguish VMalloc pages

 arch/s390/mm/pgalloc.c                 |  21 ++-
 arch/x86/mm/pgtable.c                  |   5 +-
 fs/proc/page.c                         |   4 +
 include/linux/hmm.h                    |   8 +-
 include/linux/mm.h                     |   2 +
 include/linux/mm_types.h               | 236 ++++++++++++-------------
 include/linux/page-flags.h             |  76 ++++++--
 include/linux/slub_def.h               |   1 -
 include/uapi/linux/kernel-page-flags.h |   3 +-
 kernel/crash_core.c                    |   1 +
 mm/huge_memory.c                       |   7 +-
 mm/page_alloc.c                        |  17 +-
 mm/slab.c                              |   2 -
 mm/slub.c                              | 102 +++--------
 mm/vmalloc.c                           |   5 +-
 scripts/tags.sh                        |   6 +-
 tools/vm/page-types.c                  |   2 +
 17 files changed, 241 insertions(+), 257 deletions(-)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 01/17] s390: Use _refcount for pgtables
  2018-05-18 19:45 [PATCH v6 00/17] Rearrange struct page Matthew Wilcox
@ 2018-05-18 19:45 ` Matthew Wilcox
  2018-05-18 19:45 ` [PATCH v6 02/17] mm: Split page_type out from _mapcount Matthew Wilcox
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2018-05-18 19:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

s390 borrows the storage used for _mapcount in struct page in order to
account whether the bottom or top half is being used for 2kB page
tables.  I want to use that for something else, so use the top byte of
_refcount instead of the bottom byte of _mapcount.  _refcount may
temporarily be incremented by other CPUs that see a stale pointer to
this page in the page cache, but each CPU can only increment it by one,
and there are no systems with 2^24 CPUs today, so they will not change
the upper byte of _refcount.  We do have to be a little careful not to
lose any of their writes (as they will subsequently decrement the
counter).

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 arch/s390/mm/pgalloc.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 562f72955956..84bd6329a88d 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -190,14 +190,15 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
 		if (!list_empty(&mm->context.pgtable_list)) {
 			page = list_first_entry(&mm->context.pgtable_list,
 						struct page, lru);
-			mask = atomic_read(&page->_mapcount);
+			mask = atomic_read(&page->_refcount) >> 24;
 			mask = (mask | (mask >> 4)) & 3;
 			if (mask != 3) {
 				table = (unsigned long *) page_to_phys(page);
 				bit = mask & 1;		/* =1 -> second 2K */
 				if (bit)
 					table += PTRS_PER_PTE;
-				atomic_xor_bits(&page->_mapcount, 1U << bit);
+				atomic_xor_bits(&page->_refcount,
+							1U << (bit + 24));
 				list_del(&page->lru);
 			}
 		}
@@ -218,12 +219,12 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
 	table = (unsigned long *) page_to_phys(page);
 	if (mm_alloc_pgste(mm)) {
 		/* Return 4K page table with PGSTEs */
-		atomic_set(&page->_mapcount, 3);
+		atomic_xor_bits(&page->_refcount, 3 << 24);
 		memset64((u64 *)table, _PAGE_INVALID, PTRS_PER_PTE);
 		memset64((u64 *)table + PTRS_PER_PTE, 0, PTRS_PER_PTE);
 	} else {
 		/* Return the first 2K fragment of the page */
-		atomic_set(&page->_mapcount, 1);
+		atomic_xor_bits(&page->_refcount, 1 << 24);
 		memset64((u64 *)table, _PAGE_INVALID, 2 * PTRS_PER_PTE);
 		spin_lock_bh(&mm->context.lock);
 		list_add(&page->lru, &mm->context.pgtable_list);
@@ -242,7 +243,8 @@ void page_table_free(struct mm_struct *mm, unsigned long *table)
 		/* Free 2K page table fragment of a 4K page */
 		bit = (__pa(table) & ~PAGE_MASK)/(PTRS_PER_PTE*sizeof(pte_t));
 		spin_lock_bh(&mm->context.lock);
-		mask = atomic_xor_bits(&page->_mapcount, 1U << bit);
+		mask = atomic_xor_bits(&page->_refcount, 1U << (bit + 24));
+		mask >>= 24;
 		if (mask & 3)
 			list_add(&page->lru, &mm->context.pgtable_list);
 		else
@@ -253,7 +255,6 @@ void page_table_free(struct mm_struct *mm, unsigned long *table)
 	}
 
 	pgtable_page_dtor(page);
-	atomic_set(&page->_mapcount, -1);
 	__free_page(page);
 }
 
@@ -274,7 +275,8 @@ void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table,
 	}
 	bit = (__pa(table) & ~PAGE_MASK) / (PTRS_PER_PTE*sizeof(pte_t));
 	spin_lock_bh(&mm->context.lock);
-	mask = atomic_xor_bits(&page->_mapcount, 0x11U << bit);
+	mask = atomic_xor_bits(&page->_refcount, 0x11U << (bit + 24));
+	mask >>= 24;
 	if (mask & 3)
 		list_add_tail(&page->lru, &mm->context.pgtable_list);
 	else
@@ -296,12 +298,13 @@ static void __tlb_remove_table(void *_table)
 		break;
 	case 1:		/* lower 2K of a 4K page table */
 	case 2:		/* higher 2K of a 4K page table */
-		if (atomic_xor_bits(&page->_mapcount, mask << 4) != 0)
+		mask = atomic_xor_bits(&page->_refcount, mask << (4 + 24));
+		mask >>= 24;
+		if (mask != 0)
 			break;
 		/* fallthrough */
 	case 3:		/* 4K page table with pgstes */
 		pgtable_page_dtor(page);
-		atomic_set(&page->_mapcount, -1);
 		__free_page(page);
 		break;
 	}
-- 
2.17.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 02/17] mm: Split page_type out from _mapcount
  2018-05-18 19:45 [PATCH v6 00/17] Rearrange struct page Matthew Wilcox
  2018-05-18 19:45 ` [PATCH v6 01/17] s390: Use _refcount for pgtables Matthew Wilcox
@ 2018-05-18 19:45 ` Matthew Wilcox
  2018-05-18 19:45 ` [PATCH v6 03/17] mm: Mark pages in use for page tables Matthew Wilcox
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2018-05-18 19:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

We're already using a union of many fields here, so stop abusing the
_mapcount and make page_type its own field.  That implies renaming some
of the machinery that creates PageBuddy, PageBalloon and PageKmemcg;
bring back the PG_buddy, PG_balloon and PG_kmemcg names.

As suggested by Kirill, make page_type a bitmask.  Because it starts out
life as -1 (thanks to sharing the storage with _mapcount), setting a
page flag means clearing the appropriate bit.  This gives us space for
probably twenty or so extra bits (depending how paranoid we want to be
about _mapcount underflow).

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/mm_types.h   | 13 ++++++-----
 include/linux/page-flags.h | 45 ++++++++++++++++++++++----------------
 kernel/crash_core.c        |  1 +
 mm/page_alloc.c            | 13 +++++------
 scripts/tags.sh            |  6 ++---
 5 files changed, 43 insertions(+), 35 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 21612347d311..41828fb34860 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -96,6 +96,14 @@ struct page {
 	};
 
 	union {
+		/*
+		 * If the page is neither PageSlab nor mappable to userspace,
+		 * the value stored here may help determine what this page
+		 * is used for.  See page-flags.h for a list of page types
+		 * which are currently stored here.
+		 */
+		unsigned int page_type;
+
 		_slub_counter_t counters;
 		unsigned int active;		/* SLAB */
 		struct {			/* SLUB */
@@ -109,11 +117,6 @@ struct page {
 			/*
 			 * Count of ptes mapped in mms, to show when
 			 * page is mapped & limit reverse map searches.
-			 *
-			 * Extra information about page type may be
-			 * stored here for pages that are never mapped,
-			 * in which case the value MUST BE <= -2.
-			 * See page-flags.h for more details.
 			 */
 			atomic_t _mapcount;
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e34a27727b9a..8c25b28a35aa 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -642,49 +642,56 @@ PAGEFLAG_FALSE(DoubleMap)
 #endif
 
 /*
- * For pages that are never mapped to userspace, page->mapcount may be
- * used for storing extra information about page type. Any value used
- * for this purpose must be <= -2, but it's better start not too close
- * to -2 so that an underflow of the page_mapcount() won't be mistaken
- * for a special page.
+ * For pages that are never mapped to userspace (and aren't PageSlab),
+ * page_type may be used.  Because it is initialised to -1, we invert the
+ * sense of the bit, so __SetPageFoo *clears* the bit used for PageFoo, and
+ * __ClearPageFoo *sets* the bit used for PageFoo.  We reserve a few high and
+ * low bits so that an underflow or overflow of page_mapcount() won't be
+ * mistaken for a page type value.
  */
-#define PAGE_MAPCOUNT_OPS(uname, lname)					\
+
+#define PAGE_TYPE_BASE	0xf0000000
+/* Reserve		0x0000007f to catch underflows of page_mapcount */
+#define PG_buddy	0x00000080
+#define PG_balloon	0x00000100
+#define PG_kmemcg	0x00000200
+
+#define PageType(page, flag)						\
+	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
+
+#define PAGE_TYPE_OPS(uname, lname)					\
 static __always_inline int Page##uname(struct page *page)		\
 {									\
-	return atomic_read(&page->_mapcount) ==				\
-				PAGE_##lname##_MAPCOUNT_VALUE;		\
+	return PageType(page, PG_##lname);				\
 }									\
 static __always_inline void __SetPage##uname(struct page *page)		\
 {									\
-	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);	\
-	atomic_set(&page->_mapcount, PAGE_##lname##_MAPCOUNT_VALUE);	\
+	VM_BUG_ON_PAGE(!PageType(page, 0), page);			\
+	page->page_type &= ~PG_##lname;					\
 }									\
 static __always_inline void __ClearPage##uname(struct page *page)	\
 {									\
 	VM_BUG_ON_PAGE(!Page##uname(page), page);			\
-	atomic_set(&page->_mapcount, -1);				\
+	page->page_type |= PG_##lname;					\
 }
 
 /*
- * PageBuddy() indicate that the page is free and in the buddy system
+ * PageBuddy() indicates that the page is free and in the buddy system
  * (see mm/page_alloc.c).
  */
-#define PAGE_BUDDY_MAPCOUNT_VALUE		(-128)
-PAGE_MAPCOUNT_OPS(Buddy, BUDDY)
+PAGE_TYPE_OPS(Buddy, buddy)
 
 /*
- * PageBalloon() is set on pages that are on the balloon page list
+ * PageBalloon() is true for pages that are on the balloon page list
  * (see mm/balloon_compaction.c).
  */
-#define PAGE_BALLOON_MAPCOUNT_VALUE		(-256)
-PAGE_MAPCOUNT_OPS(Balloon, BALLOON)
+PAGE_TYPE_OPS(Balloon, balloon)
 
 /*
  * If kmemcg is enabled, the buddy allocator will set PageKmemcg() on
  * pages allocated with __GFP_ACCOUNT. It gets cleared on page free.
  */
-#define PAGE_KMEMCG_MAPCOUNT_VALUE		(-512)
-PAGE_MAPCOUNT_OPS(Kmemcg, KMEMCG)
+PAGE_TYPE_OPS(Kmemcg, kmemcg)
 
 extern bool is_free_buddy_page(struct page *page);
 
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index f7674d676889..b66aced5e8c2 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -460,6 +460,7 @@ static int __init crash_save_vmcoreinfo_init(void)
 	VMCOREINFO_NUMBER(PG_hwpoison);
 #endif
 	VMCOREINFO_NUMBER(PG_head_mask);
+#define PAGE_BUDDY_MAPCOUNT_VALUE	(~PG_buddy)
 	VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
 #ifdef CONFIG_HUGETLB_PAGE
 	VMCOREINFO_NUMBER(HUGETLB_PAGE_DTOR);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5ee6256e31d0..da3eb2236ba1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -686,16 +686,14 @@ static inline void rmv_page_order(struct page *page)
 
 /*
  * This function checks whether a page is free && is the buddy
- * we can do coalesce a page and its buddy if
+ * we can coalesce a page and its buddy if
  * (a) the buddy is not in a hole (check before calling!) &&
  * (b) the buddy is in the buddy system &&
  * (c) a page and its buddy have the same order &&
  * (d) a page and its buddy are in the same zone.
  *
- * For recording whether a page is in the buddy system, we set ->_mapcount
- * PAGE_BUDDY_MAPCOUNT_VALUE.
- * Setting, clearing, and testing _mapcount PAGE_BUDDY_MAPCOUNT_VALUE is
- * serialized by zone->lock.
+ * For recording whether a page is in the buddy system, we set PageBuddy.
+ * Setting, clearing, and testing PageBuddy is serialized by zone->lock.
  *
  * For recording page's order, we use page_private(page).
  */
@@ -740,9 +738,8 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
  * as necessary, plus some accounting needed to play nicely with other
  * parts of the VM system.
  * At each level, we keep a list of pages, which are heads of continuous
- * free pages of length of (1 << order) and marked with _mapcount
- * PAGE_BUDDY_MAPCOUNT_VALUE. Page's order is recorded in page_private(page)
- * field.
+ * free pages of length of (1 << order) and marked with PageBuddy.
+ * Page's order is recorded in page_private(page) field.
  * So when we are allocating or freeing one, we can derive the state of the
  * other.  That is, if we allocate a small block, and both were
  * free, the remainder of the region must be split into blocks.
diff --git a/scripts/tags.sh b/scripts/tags.sh
index 78e546ff689c..8c3ae36d4ea8 100755
--- a/scripts/tags.sh
+++ b/scripts/tags.sh
@@ -188,9 +188,9 @@ regex_c=(
 	'/\<CLEARPAGEFLAG_NOOP(\([[:alnum:]_]*\).*/ClearPage\1/'
 	'/\<__CLEARPAGEFLAG_NOOP(\([[:alnum:]_]*\).*/__ClearPage\1/'
 	'/\<TESTCLEARFLAG_FALSE(\([[:alnum:]_]*\).*/TestClearPage\1/'
-	'/^PAGE_MAPCOUNT_OPS(\([[:alnum:]_]*\).*/Page\1/'
-	'/^PAGE_MAPCOUNT_OPS(\([[:alnum:]_]*\).*/__SetPage\1/'
-	'/^PAGE_MAPCOUNT_OPS(\([[:alnum:]_]*\).*/__ClearPage\1/'
+	'/^PAGE_TYPE_OPS(\([[:alnum:]_]*\).*/Page\1/'
+	'/^PAGE_TYPE_OPS(\([[:alnum:]_]*\).*/__SetPage\1/'
+	'/^PAGE_TYPE_OPS(\([[:alnum:]_]*\).*/__ClearPage\1/'
 	'/^TASK_PFA_TEST([^,]*, *\([[:alnum:]_]*\))/task_\1/'
 	'/^TASK_PFA_SET([^,]*, *\([[:alnum:]_]*\))/task_set_\1/'
 	'/^TASK_PFA_CLEAR([^,]*, *\([[:alnum:]_]*\))/task_clear_\1/'
-- 
2.17.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 03/17] mm: Mark pages in use for page tables
  2018-05-18 19:45 [PATCH v6 00/17] Rearrange struct page Matthew Wilcox
  2018-05-18 19:45 ` [PATCH v6 01/17] s390: Use _refcount for pgtables Matthew Wilcox
  2018-05-18 19:45 ` [PATCH v6 02/17] mm: Split page_type out from _mapcount Matthew Wilcox
@ 2018-05-18 19:45 ` Matthew Wilcox
  2018-05-18 19:45 ` [PATCH v6 04/17] mm: Switch s_mem and slab_cache in struct page Matthew Wilcox
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2018-05-18 19:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

Define a new PageTable bit in the page_type and use it to mark pages in
use as page tables.  This can be helpful when debugging crashdumps or
analysing memory fragmentation.  Add a KPF flag to report these pages
to userspace and update page-types.c to interpret that flag.

Note that only pages currently accounted as NR_PAGETABLES are tracked
as PageTable; this does not include pgd/p4d/pud/pmd pages.  Those will
be the subject of a later patch.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
 fs/proc/page.c                         | 2 ++
 include/linux/mm.h                     | 2 ++
 include/linux/page-flags.h             | 6 ++++++
 include/uapi/linux/kernel-page-flags.h | 2 +-
 tools/vm/page-types.c                  | 1 +
 5 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/proc/page.c b/fs/proc/page.c
index 1491918a33c3..792c78a49174 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -154,6 +154,8 @@ u64 stable_page_flags(struct page *page)
 
 	if (PageBalloon(page))
 		u |= 1 << KPF_BALLOON;
+	if (PageTable(page))
+		u |= 1 << KPF_PGTABLE;
 
 	if (page_is_idle(page))
 		u |= 1 << KPF_IDLE;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bd6588479d36..d4e9286a6402 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1819,6 +1819,7 @@ static inline bool pgtable_page_ctor(struct page *page)
 {
 	if (!ptlock_init(page))
 		return false;
+	__SetPageTable(page);
 	inc_zone_page_state(page, NR_PAGETABLE);
 	return true;
 }
@@ -1826,6 +1827,7 @@ static inline bool pgtable_page_ctor(struct page *page)
 static inline void pgtable_page_dtor(struct page *page)
 {
 	pte_lock_deinit(page);
+	__ClearPageTable(page);
 	dec_zone_page_state(page, NR_PAGETABLE);
 }
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 8c25b28a35aa..901943e4754b 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -655,6 +655,7 @@ PAGEFLAG_FALSE(DoubleMap)
 #define PG_buddy	0x00000080
 #define PG_balloon	0x00000100
 #define PG_kmemcg	0x00000200
+#define PG_table	0x00000400
 
 #define PageType(page, flag)						\
 	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
@@ -693,6 +694,11 @@ PAGE_TYPE_OPS(Balloon, balloon)
  */
 PAGE_TYPE_OPS(Kmemcg, kmemcg)
 
+/*
+ * Marks pages in use as page tables.
+ */
+PAGE_TYPE_OPS(Table, table)
+
 extern bool is_free_buddy_page(struct page *page);
 
 __PAGEFLAG(Isolated, isolated, PF_ANY);
diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h
index fa139841ec18..21b9113c69da 100644
--- a/include/uapi/linux/kernel-page-flags.h
+++ b/include/uapi/linux/kernel-page-flags.h
@@ -35,6 +35,6 @@
 #define KPF_BALLOON		23
 #define KPF_ZERO_PAGE		24
 #define KPF_IDLE		25
-
+#define KPF_PGTABLE		26
 
 #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
index a8783f48f77f..cce853dca691 100644
--- a/tools/vm/page-types.c
+++ b/tools/vm/page-types.c
@@ -131,6 +131,7 @@ static const char * const page_flag_names[] = {
 	[KPF_KSM]		= "x:ksm",
 	[KPF_THP]		= "t:thp",
 	[KPF_BALLOON]		= "o:balloon",
+	[KPF_PGTABLE]		= "g:pgtable",
 	[KPF_ZERO_PAGE]		= "z:zero_page",
 	[KPF_IDLE]              = "i:idle_page",
 
-- 
2.17.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 04/17] mm: Switch s_mem and slab_cache in struct page
  2018-05-18 19:45 [PATCH v6 00/17] Rearrange struct page Matthew Wilcox
                   ` (2 preceding siblings ...)
  2018-05-18 19:45 ` [PATCH v6 03/17] mm: Mark pages in use for page tables Matthew Wilcox
@ 2018-05-18 19:45 ` Matthew Wilcox
  2018-05-18 19:45 ` [PATCH v6 05/17] mm: Move 'private' union within " Matthew Wilcox
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2018-05-18 19:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

This will allow us to store slub's counters in the same bits as slab's
s_mem.  slub now needs to set page->mapping to NULL as it frees the page,
just like slab does.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/mm_types.h | 4 ++--
 mm/slub.c                | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 41828fb34860..e97a310a6abe 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -83,7 +83,7 @@ struct page {
 		/* See page-flags.h for the definition of PAGE_MAPPING_FLAGS */
 		struct address_space *mapping;
 
-		void *s_mem;			/* slab first object */
+		struct kmem_cache *slab_cache;	/* SL[AU]B: Pointer to slab */
 		atomic_t compound_mapcount;	/* first tail page */
 		/* page_deferred_list().next	 -- second tail page */
 	};
@@ -194,7 +194,7 @@ struct page {
 		spinlock_t ptl;
 #endif
 #endif
-		struct kmem_cache *slab_cache;	/* SL[AU]B: Pointer to slab */
+		void *s_mem;			/* slab first object */
 	};
 
 #ifdef CONFIG_MEMCG
diff --git a/mm/slub.c b/mm/slub.c
index e938184ac847..7fc13c46e975 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1690,6 +1690,7 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
 	__ClearPageSlab(page);
 
 	page_mapcount_reset(page);
+	page->mapping = NULL;
 	if (current->reclaim_state)
 		current->reclaim_state->reclaimed_slab += pages;
 	memcg_uncharge_slab(page, order, s);
-- 
2.17.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 05/17] mm: Move 'private' union within struct page
  2018-05-18 19:45 [PATCH v6 00/17] Rearrange struct page Matthew Wilcox
                   ` (3 preceding siblings ...)
  2018-05-18 19:45 ` [PATCH v6 04/17] mm: Switch s_mem and slab_cache in struct page Matthew Wilcox
@ 2018-05-18 19:45 ` Matthew Wilcox
  2018-05-18 19:45 ` [PATCH v6 06/17] mm: Move _refcount out of struct page union Matthew Wilcox
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2018-05-18 19:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

By moving page->private to the fourth word of struct page, we can put
the SLUB counters in the same word as SLAB's s_mem and still do the
cmpxchg_double trick.  Now the SLUB counters no longer overlap with the
mapcount or refcount so we can drop the call to page_mapcount_reset()
and simplify set_page_slub_counters() to a single line.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm_types.h | 56 ++++++++++++++++++----------------------
 mm/slub.c                | 20 ++------------
 2 files changed, 27 insertions(+), 49 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index e97a310a6abe..23378a789af4 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -65,15 +65,9 @@ struct hmm;
  */
 #ifdef CONFIG_HAVE_ALIGNED_STRUCT_PAGE
 #define _struct_page_alignment	__aligned(2 * sizeof(unsigned long))
-#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE)
-#define _slub_counter_t		unsigned long
 #else
-#define _slub_counter_t		unsigned int
-#endif
-#else /* !CONFIG_HAVE_ALIGNED_STRUCT_PAGE */
 #define _struct_page_alignment
-#define _slub_counter_t		unsigned int
-#endif /* !CONFIG_HAVE_ALIGNED_STRUCT_PAGE */
+#endif
 
 struct page {
 	/* First double word block */
@@ -95,6 +89,30 @@ struct page {
 		/* page_deferred_list().prev	-- second tail page */
 	};
 
+	union {
+		/*
+		 * Mapping-private opaque data:
+		 * Usually used for buffer_heads if PagePrivate
+		 * Used for swp_entry_t if PageSwapCache
+		 * Indicates order in the buddy system if PageBuddy
+		 */
+		unsigned long private;
+#if USE_SPLIT_PTE_PTLOCKS
+#if ALLOC_SPLIT_PTLOCKS
+		spinlock_t *ptl;
+#else
+		spinlock_t ptl;
+#endif
+#endif
+		void *s_mem;			/* slab first object */
+		unsigned long counters;		/* SLUB */
+		struct {			/* SLUB */
+			unsigned inuse:16;
+			unsigned objects:15;
+			unsigned frozen:1;
+		};
+	};
+
 	union {
 		/*
 		 * If the page is neither PageSlab nor mappable to userspace,
@@ -104,13 +122,7 @@ struct page {
 		 */
 		unsigned int page_type;
 
-		_slub_counter_t counters;
 		unsigned int active;		/* SLAB */
-		struct {			/* SLUB */
-			unsigned inuse:16;
-			unsigned objects:15;
-			unsigned frozen:1;
-		};
 		int units;			/* SLOB */
 
 		struct {			/* Page cache */
@@ -179,24 +191,6 @@ struct page {
 #endif
 	};
 
-	union {
-		/*
-		 * Mapping-private opaque data:
-		 * Usually used for buffer_heads if PagePrivate
-		 * Used for swp_entry_t if PageSwapCache
-		 * Indicates order in the buddy system if PageBuddy
-		 */
-		unsigned long private;
-#if USE_SPLIT_PTE_PTLOCKS
-#if ALLOC_SPLIT_PTLOCKS
-		spinlock_t *ptl;
-#else
-		spinlock_t ptl;
-#endif
-#endif
-		void *s_mem;			/* slab first object */
-	};
-
 #ifdef CONFIG_MEMCG
 	struct mem_cgroup *mem_cgroup;
 #endif
diff --git a/mm/slub.c b/mm/slub.c
index 7fc13c46e975..05ca612a5fe6 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -356,21 +356,6 @@ static __always_inline void slab_unlock(struct page *page)
 	__bit_spin_unlock(PG_locked, &page->flags);
 }
 
-static inline void set_page_slub_counters(struct page *page, unsigned long counters_new)
-{
-	struct page tmp;
-	tmp.counters = counters_new;
-	/*
-	 * page->counters can cover frozen/inuse/objects as well
-	 * as page->_refcount.  If we assign to ->counters directly
-	 * we run the risk of losing updates to page->_refcount, so
-	 * be careful and only assign to the fields we need.
-	 */
-	page->frozen  = tmp.frozen;
-	page->inuse   = tmp.inuse;
-	page->objects = tmp.objects;
-}
-
 /* Interrupts must be disabled (for the fallback code to work right) */
 static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
 		void *freelist_old, unsigned long counters_old,
@@ -392,7 +377,7 @@ static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct page *page
 		if (page->freelist == freelist_old &&
 					page->counters == counters_old) {
 			page->freelist = freelist_new;
-			set_page_slub_counters(page, counters_new);
+			page->counters = counters_new;
 			slab_unlock(page);
 			return true;
 		}
@@ -431,7 +416,7 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
 		if (page->freelist == freelist_old &&
 					page->counters == counters_old) {
 			page->freelist = freelist_new;
-			set_page_slub_counters(page, counters_new);
+			page->counters = counters_new;
 			slab_unlock(page);
 			local_irq_restore(flags);
 			return true;
@@ -1689,7 +1674,6 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
 	__ClearPageSlabPfmemalloc(page);
 	__ClearPageSlab(page);
 
-	page_mapcount_reset(page);
 	page->mapping = NULL;
 	if (current->reclaim_state)
 		current->reclaim_state->reclaimed_slab += pages;
-- 
2.17.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 06/17] mm: Move _refcount out of struct page union
  2018-05-18 19:45 [PATCH v6 00/17] Rearrange struct page Matthew Wilcox
                   ` (4 preceding siblings ...)
  2018-05-18 19:45 ` [PATCH v6 05/17] mm: Move 'private' union within " Matthew Wilcox
@ 2018-05-18 19:45 ` Matthew Wilcox
  2018-05-18 19:45 ` [PATCH v6 07/17] mm: Combine first three unions in struct page Matthew Wilcox
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2018-05-18 19:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

Keeping the refcount in the union only encourages people to put
something else in the union which will overlap with _refcount and
eventually explode messily.  pahole reports no fields change location.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm_types.h | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 23378a789af4..9828cd170251 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -113,7 +113,13 @@ struct page {
 		};
 	};
 
-	union {
+	union {		/* This union is 4 bytes in size. */
+		/*
+		 * If the page can be mapped to userspace, encodes the number
+		 * of times this page is referenced by a page table.
+		 */
+		atomic_t _mapcount;
+
 		/*
 		 * If the page is neither PageSlab nor mappable to userspace,
 		 * the value stored here may help determine what this page
@@ -124,22 +130,11 @@ struct page {
 
 		unsigned int active;		/* SLAB */
 		int units;			/* SLOB */
-
-		struct {			/* Page cache */
-			/*
-			 * Count of ptes mapped in mms, to show when
-			 * page is mapped & limit reverse map searches.
-			 */
-			atomic_t _mapcount;
-
-			/*
-			 * Usage count, *USE WRAPPER FUNCTION* when manual
-			 * accounting. See page_ref.h
-			 */
-			atomic_t _refcount;
-		};
 	};
 
+	/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
+	atomic_t _refcount;
+
 	/*
 	 * WARNING: bit 0 of the first word encode PageTail(). That means
 	 * the rest users of the storage space MUST NOT use the bit to
-- 
2.17.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 07/17] mm: Combine first three unions in struct page
  2018-05-18 19:45 [PATCH v6 00/17] Rearrange struct page Matthew Wilcox
                   ` (5 preceding siblings ...)
  2018-05-18 19:45 ` [PATCH v6 06/17] mm: Move _refcount out of struct page union Matthew Wilcox
@ 2018-05-18 19:45 ` Matthew Wilcox
  2018-05-18 19:45 ` [PATCH v6 08/17] mm: Use page->deferred_list Matthew Wilcox
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2018-05-18 19:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

By combining these three one-word unions into one three-word union,
we make it easier for users to add their own multi-word fields to struct
page, as well as making it obvious that SLUB needs to keep its double-word
alignment for its freelist & counters.

No field moves position; verified with pahole.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/mm_types.h | 66 ++++++++++++++++++++--------------------
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 9828cd170251..629a7b568ed7 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -70,46 +70,46 @@ struct hmm;
 #endif
 
 struct page {
-	/* First double word block */
 	unsigned long flags;		/* Atomic flags, some possibly
 					 * updated asynchronously */
+	/* Three words (12/24 bytes) are available in this union. */
 	union {
-		/* See page-flags.h for the definition of PAGE_MAPPING_FLAGS */
-		struct address_space *mapping;
-
-		struct kmem_cache *slab_cache;	/* SL[AU]B: Pointer to slab */
+		struct {	/* Page cache and anonymous pages */
+			/* See page-flags.h for PAGE_MAPPING_FLAGS */
+			struct address_space *mapping;
+			pgoff_t index;		/* Our offset within mapping. */
+			/**
+			 * @private: Mapping-private opaque data.
+			 * Usually used for buffer_heads if PagePrivate.
+			 * Used for swp_entry_t if PageSwapCache.
+			 * Indicates order in the buddy system if PageBuddy.
+			 */
+			unsigned long private;
+		};
+		struct {	/* slab, slob and slub */
+			struct kmem_cache *slab_cache; /* not slob */
+			/* Double-word boundary */
+			void *freelist;		/* first free object */
+			union {
+				void *s_mem;	/* slab: first object */
+				unsigned long counters;		/* SLUB */
+				struct {			/* SLUB */
+					unsigned inuse:16;
+					unsigned objects:15;
+					unsigned frozen:1;
+				};
+			};
+		};
 		atomic_t compound_mapcount;	/* first tail page */
-		/* page_deferred_list().next	 -- second tail page */
-	};
-
-	/* Second double word */
-	union {
-		pgoff_t index;		/* Our offset within mapping. */
-		void *freelist;		/* sl[aou]b first free object */
-		/* page_deferred_list().prev	-- second tail page */
-	};
-
-	union {
-		/*
-		 * Mapping-private opaque data:
-		 * Usually used for buffer_heads if PagePrivate
-		 * Used for swp_entry_t if PageSwapCache
-		 * Indicates order in the buddy system if PageBuddy
-		 */
-		unsigned long private;
-#if USE_SPLIT_PTE_PTLOCKS
+		struct list_head deferred_list; /* second tail page */
+		struct {	/* Page table pages */
+			unsigned long _pt_pad_2;	/* mapping */
+			unsigned long _pt_pad_3;
 #if ALLOC_SPLIT_PTLOCKS
-		spinlock_t *ptl;
+			spinlock_t *ptl;
 #else
-		spinlock_t ptl;
-#endif
+			spinlock_t ptl;
 #endif
-		void *s_mem;			/* slab first object */
-		unsigned long counters;		/* SLUB */
-		struct {			/* SLUB */
-			unsigned inuse:16;
-			unsigned objects:15;
-			unsigned frozen:1;
 		};
 	};
 
-- 
2.17.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 08/17] mm: Use page->deferred_list
  2018-05-18 19:45 [PATCH v6 00/17] Rearrange struct page Matthew Wilcox
                   ` (6 preceding siblings ...)
  2018-05-18 19:45 ` [PATCH v6 07/17] mm: Combine first three unions in struct page Matthew Wilcox
@ 2018-05-18 19:45 ` Matthew Wilcox
  2018-05-18 19:45 ` [PATCH v6 09/17] mm: Move lru union within struct page Matthew Wilcox
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2018-05-18 19:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

Now that we can represent the location of 'deferred_list' in C instead
of comments, make use of that ability.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c | 7 ++-----
 mm/page_alloc.c  | 2 +-
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a3a1815f8e11..cb0954a6de88 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -483,11 +483,8 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
 
 static inline struct list_head *page_deferred_list(struct page *page)
 {
-	/*
-	 * ->lru in the tail pages is occupied by compound_head.
-	 * Let's use ->mapping + ->index in the second tail page as list_head.
-	 */
-	return (struct list_head *)&page[2].mapping;
+	/* ->lru in the tail pages is occupied by compound_head. */
+	return &page[2].deferred_list;
 }
 
 void prep_transhuge_page(struct page *page)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index da3eb2236ba1..1a0149c4f672 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -933,7 +933,7 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
 	case 2:
 		/*
 		 * the second tail page: ->mapping is
-		 * page_deferred_list().next -- ignore value.
+		 * deferred_list.next -- ignore value.
 		 */
 		break;
 	default:
-- 
2.17.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 09/17] mm: Move lru union within struct page
  2018-05-18 19:45 [PATCH v6 00/17] Rearrange struct page Matthew Wilcox
                   ` (7 preceding siblings ...)
  2018-05-18 19:45 ` [PATCH v6 08/17] mm: Use page->deferred_list Matthew Wilcox
@ 2018-05-18 19:45 ` Matthew Wilcox
  2018-05-18 19:45 ` [PATCH v6 10/17] mm: Combine LRU and main union in " Matthew Wilcox
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2018-05-18 19:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

Since the LRU is two words, this does not affect the double-word
alignment of SLUB's freelist.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm_types.h | 102 +++++++++++++++++++--------------------
 mm/slub.c                |   8 +--
 2 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 629a7b568ed7..b6a3948195d3 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -72,6 +72,57 @@ struct hmm;
 struct page {
 	unsigned long flags;		/* Atomic flags, some possibly
 					 * updated asynchronously */
+	/*
+	 * WARNING: bit 0 of the first word encode PageTail(). That means
+	 * the rest users of the storage space MUST NOT use the bit to
+	 * avoid collision and false-positive PageTail().
+	 */
+	union {
+		struct list_head lru;	/* Pageout list, eg. active_list
+					 * protected by zone_lru_lock !
+					 * Can be used as a generic list
+					 * by the page owner.
+					 */
+		struct dev_pagemap *pgmap; /* ZONE_DEVICE pages are never on an
+					    * lru or handled by a slab
+					    * allocator, this points to the
+					    * hosting device page map.
+					    */
+		struct {		/* slub per cpu partial pages */
+			struct page *next;	/* Next partial slab */
+#ifdef CONFIG_64BIT
+			int pages;	/* Nr of partial slabs left */
+			int pobjects;	/* Approximate # of objects */
+#else
+			short int pages;
+			short int pobjects;
+#endif
+		};
+
+		struct rcu_head rcu_head;	/* Used by SLAB
+						 * when destroying via RCU
+						 */
+		/* Tail pages of compound page */
+		struct {
+			unsigned long compound_head; /* If bit zero is set */
+
+			/* First tail page only */
+			unsigned char compound_dtor;
+			unsigned char compound_order;
+			/* two/six bytes available here */
+		};
+
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
+		struct {
+			unsigned long __pad;	/* do not overlay pmd_huge_pte
+						 * with compound_head to avoid
+						 * possible bit 0 collision.
+						 */
+			pgtable_t pmd_huge_pte; /* protected by page->ptl */
+		};
+#endif
+	};
+
 	/* Three words (12/24 bytes) are available in this union. */
 	union {
 		struct {	/* Page cache and anonymous pages */
@@ -135,57 +186,6 @@ struct page {
 	/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
 	atomic_t _refcount;
 
-	/*
-	 * WARNING: bit 0 of the first word encode PageTail(). That means
-	 * the rest users of the storage space MUST NOT use the bit to
-	 * avoid collision and false-positive PageTail().
-	 */
-	union {
-		struct list_head lru;	/* Pageout list, eg. active_list
-					 * protected by zone_lru_lock !
-					 * Can be used as a generic list
-					 * by the page owner.
-					 */
-		struct dev_pagemap *pgmap; /* ZONE_DEVICE pages are never on an
-					    * lru or handled by a slab
-					    * allocator, this points to the
-					    * hosting device page map.
-					    */
-		struct {		/* slub per cpu partial pages */
-			struct page *next;	/* Next partial slab */
-#ifdef CONFIG_64BIT
-			int pages;	/* Nr of partial slabs left */
-			int pobjects;	/* Approximate # of objects */
-#else
-			short int pages;
-			short int pobjects;
-#endif
-		};
-
-		struct rcu_head rcu_head;	/* Used by SLAB
-						 * when destroying via RCU
-						 */
-		/* Tail pages of compound page */
-		struct {
-			unsigned long compound_head; /* If bit zero is set */
-
-			/* First tail page only */
-			unsigned char compound_dtor;
-			unsigned char compound_order;
-			/* two/six bytes available here */
-		};
-
-#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
-		struct {
-			unsigned long __pad;	/* do not overlay pmd_huge_pte
-						 * with compound_head to avoid
-						 * possible bit 0 collision.
-						 */
-			pgtable_t pmd_huge_pte; /* protected by page->ptl */
-		};
-#endif
-	};
-
 #ifdef CONFIG_MEMCG
 	struct mem_cgroup *mem_cgroup;
 #endif
diff --git a/mm/slub.c b/mm/slub.c
index 05ca612a5fe6..57a20f995220 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -52,11 +52,11 @@
  *   and to synchronize major metadata changes to slab cache structures.
  *
  *   The slab_lock is only used for debugging and on arches that do not
- *   have the ability to do a cmpxchg_double. It only protects the second
- *   double word in the page struct. Meaning
+ *   have the ability to do a cmpxchg_double. It only protects:
  *	A. page->freelist	-> List of object free in a page
- *	B. page->counters	-> Counters of objects
- *	C. page->frozen		-> frozen state
+ *	B. page->inuse		-> Number of objects in use
+ *	C. page->objects	-> Number of objects in page
+ *	D. page->frozen		-> frozen state
  *
  *   If a slab is frozen then it is exempt from list management. It is not
  *   on any list. The processor that froze the slab is the one who can
-- 
2.17.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 10/17] mm: Combine LRU and main union in struct page
  2018-05-18 19:45 [PATCH v6 00/17] Rearrange struct page Matthew Wilcox
                   ` (8 preceding siblings ...)
  2018-05-18 19:45 ` [PATCH v6 09/17] mm: Move lru union within struct page Matthew Wilcox
@ 2018-05-18 19:45 ` Matthew Wilcox
  2018-05-18 19:45 ` [PATCH v6 11/17] mm: Improve struct page documentation Matthew Wilcox
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2018-05-18 19:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

This gives us five words of space in a single union in struct page.
The compound_mapcount moves position (from offset 24 to offset 20)
on 64-bit systems, but that does not seem likely to cause any trouble.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm_types.h | 97 +++++++++++++++++++---------------------
 mm/page_alloc.c          |  2 +-
 2 files changed, 47 insertions(+), 52 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index b6a3948195d3..cf3bbee8c9a1 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -73,59 +73,19 @@ struct page {
 	unsigned long flags;		/* Atomic flags, some possibly
 					 * updated asynchronously */
 	/*
-	 * WARNING: bit 0 of the first word encode PageTail(). That means
-	 * the rest users of the storage space MUST NOT use the bit to
+	 * Five words (20/40 bytes) are available in this union.
+	 * WARNING: bit 0 of the first word is used for PageTail(). That
+	 * means the other users of this union MUST NOT use the bit to
 	 * avoid collision and false-positive PageTail().
 	 */
-	union {
-		struct list_head lru;	/* Pageout list, eg. active_list
-					 * protected by zone_lru_lock !
-					 * Can be used as a generic list
-					 * by the page owner.
-					 */
-		struct dev_pagemap *pgmap; /* ZONE_DEVICE pages are never on an
-					    * lru or handled by a slab
-					    * allocator, this points to the
-					    * hosting device page map.
-					    */
-		struct {		/* slub per cpu partial pages */
-			struct page *next;	/* Next partial slab */
-#ifdef CONFIG_64BIT
-			int pages;	/* Nr of partial slabs left */
-			int pobjects;	/* Approximate # of objects */
-#else
-			short int pages;
-			short int pobjects;
-#endif
-		};
-
-		struct rcu_head rcu_head;	/* Used by SLAB
-						 * when destroying via RCU
-						 */
-		/* Tail pages of compound page */
-		struct {
-			unsigned long compound_head; /* If bit zero is set */
-
-			/* First tail page only */
-			unsigned char compound_dtor;
-			unsigned char compound_order;
-			/* two/six bytes available here */
-		};
-
-#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
-		struct {
-			unsigned long __pad;	/* do not overlay pmd_huge_pte
-						 * with compound_head to avoid
-						 * possible bit 0 collision.
-						 */
-			pgtable_t pmd_huge_pte; /* protected by page->ptl */
-		};
-#endif
-	};
-
-	/* Three words (12/24 bytes) are available in this union. */
 	union {
 		struct {	/* Page cache and anonymous pages */
+			/**
+			 * @lru: Pageout list, eg. active_list protected by
+			 * zone_lru_lock.  Sometimes used as a generic list
+			 * by the page owner.
+			 */
+			struct list_head lru;
 			/* See page-flags.h for PAGE_MAPPING_FLAGS */
 			struct address_space *mapping;
 			pgoff_t index;		/* Our offset within mapping. */
@@ -138,6 +98,19 @@ struct page {
 			unsigned long private;
 		};
 		struct {	/* slab, slob and slub */
+			union {
+				struct list_head slab_list;	/* uses lru */
+				struct {	/* Partial pages */
+					struct page *next;
+#ifdef CONFIG_64BIT
+					int pages;	/* Nr of pages left */
+					int pobjects;	/* Approximate count */
+#else
+					short int pages;
+					short int pobjects;
+#endif
+				};
+			};
 			struct kmem_cache *slab_cache; /* not slob */
 			/* Double-word boundary */
 			void *freelist;		/* first free object */
@@ -151,9 +124,22 @@ struct page {
 				};
 			};
 		};
-		atomic_t compound_mapcount;	/* first tail page */
-		struct list_head deferred_list; /* second tail page */
+		struct {	/* Tail pages of compound page */
+			unsigned long compound_head;	/* Bit zero is set */
+
+			/* First tail page only */
+			unsigned char compound_dtor;
+			unsigned char compound_order;
+			atomic_t compound_mapcount;
+		};
+		struct {	/* Second tail page of compound page */
+			unsigned long _compound_pad_1;	/* compound_head */
+			unsigned long _compound_pad_2;
+			struct list_head deferred_list;
+		};
 		struct {	/* Page table pages */
+			unsigned long _pt_pad_1;	/* compound_head */
+			pgtable_t pmd_huge_pte; /* protected by page->ptl */
 			unsigned long _pt_pad_2;	/* mapping */
 			unsigned long _pt_pad_3;
 #if ALLOC_SPLIT_PTLOCKS
@@ -162,6 +148,15 @@ struct page {
 			spinlock_t ptl;
 #endif
 		};
+
+		/** @rcu_head: You can use this to free a page by RCU. */
+		struct rcu_head rcu_head;
+
+		/**
+		 * @pgmap: For ZONE_DEVICE pages, this points to the hosting
+		 * device page map.
+		 */
+		struct dev_pagemap *pgmap;
 	};
 
 	union {		/* This union is 4 bytes in size. */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1a0149c4f672..787440218def 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -924,7 +924,7 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
 	}
 	switch (page - head_page) {
 	case 1:
-		/* the first tail page: ->mapping is compound_mapcount() */
+		/* the first tail page: ->mapping may be compound_mapcount() */
 		if (unlikely(compound_mapcount(page))) {
 			bad_page(page, "nonzero compound_mapcount", 0);
 			goto out;
-- 
2.17.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 11/17] mm: Improve struct page documentation
  2018-05-18 19:45 [PATCH v6 00/17] Rearrange struct page Matthew Wilcox
                   ` (9 preceding siblings ...)
  2018-05-18 19:45 ` [PATCH v6 10/17] mm: Combine LRU and main union in " Matthew Wilcox
@ 2018-05-18 19:45 ` Matthew Wilcox
  2018-05-18 19:45 ` [PATCH v6 12/17] mm: Add pt_mm to struct page Matthew Wilcox
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2018-05-18 19:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

Rewrite the documentation to describe what you can use in struct
page rather than what you can't.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/mm_types.h | 40 +++++++++++++++++++---------------------
 1 file changed, 19 insertions(+), 21 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index cf3bbee8c9a1..90a6dbeeef11 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -33,29 +33,27 @@ struct hmm;
  * it to keep track of whatever it is we are using the page for at the
  * moment. Note that we have no way to track which tasks are using
  * a page, though if it is a pagecache page, rmap structures can tell us
- * who is mapping it. If you allocate the page using alloc_pages(), you
- * can use some of the space in struct page for your own purposes.
+ * who is mapping it.
  *
- * Pages that were once in the page cache may be found under the RCU lock
- * even after they have been recycled to a different purpose.  The page
- * cache reads and writes some of the fields in struct page to pin the
- * page before checking that it's still in the page cache.  It is vital
- * that all users of struct page:
- * 1. Use the first word as PageFlags.
- * 2. Clear or preserve bit 0 of page->compound_head.  It is used as
- *    PageTail for compound pages, and the page cache must not see false
- *    positives.  Some users put a pointer here (guaranteed to be at least
- *    4-byte aligned), other users avoid using the field altogether.
- * 3. page->_refcount must either not be used, or must be used in such a
- *    way that other CPUs temporarily incrementing and then decrementing the
- *    refcount does not cause problems.  On receiving the page from
- *    alloc_pages(), the refcount will be positive.
- * 4. Either preserve page->_mapcount or restore it to -1 before freeing it.
+ * If you allocate the page using alloc_pages(), you can use some of the
+ * space in struct page for your own purposes.  The five words in the main
+ * union are available, except for bit 0 of the first word which must be
+ * kept clear.  Many users use this word to store a pointer to an object
+ * which is guaranteed to be aligned.  If you use the same storage as
+ * page->mapping, you must restore it to NULL before freeing the page.
  *
- * If you allocate pages of order > 0, you can use the fields in the struct
- * page associated with each page, but bear in mind that the pages may have
- * been inserted individually into the page cache, so you must use the above
- * four fields in a compatible way for each struct page.
+ * If your page will not be mapped to userspace, you can also use the four
+ * bytes in the mapcount union, but you must call page_mapcount_reset()
+ * before freeing it.
+ *
+ * If you want to use the refcount field, it must be used in such a way
+ * that other CPUs temporarily incrementing and then decrementing the
+ * refcount does not cause problems.  On receiving the page from
+ * alloc_pages(), the refcount will be positive.
+ *
+ * If you allocate pages of order > 0, you can use some of the fields
+ * in each subpage, but you may need to restore some of their values
+ * afterwards.
  *
  * SLUB uses cmpxchg_double() to atomically update its freelist and
  * counters.  That requires that freelist & counters be adjacent and
-- 
2.17.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 12/17] mm: Add pt_mm to struct page
  2018-05-18 19:45 [PATCH v6 00/17] Rearrange struct page Matthew Wilcox
                   ` (10 preceding siblings ...)
  2018-05-18 19:45 ` [PATCH v6 11/17] mm: Improve struct page documentation Matthew Wilcox
@ 2018-05-18 19:45 ` Matthew Wilcox
  2018-05-18 19:45 ` [PATCH v6 13/17] mm: Add hmm_data " Matthew Wilcox
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2018-05-18 19:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

For pgd page table pages, x86 overloads the page->index field to store
a pointer to the mm_struct.  Rename this to pt_mm so it's visible to
other users.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
 arch/x86/mm/pgtable.c    | 5 ++---
 include/linux/mm_types.h | 2 +-
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ffc8c13c50e4..938dbcd46b97 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -114,13 +114,12 @@ static inline void pgd_list_del(pgd_t *pgd)
 
 static void pgd_set_mm(pgd_t *pgd, struct mm_struct *mm)
 {
-	BUILD_BUG_ON(sizeof(virt_to_page(pgd)->index) < sizeof(mm));
-	virt_to_page(pgd)->index = (pgoff_t)mm;
+	virt_to_page(pgd)->pt_mm = mm;
 }
 
 struct mm_struct *pgd_page_get_mm(struct page *page)
 {
-	return (struct mm_struct *)page->index;
+	return page->pt_mm;
 }
 
 static void pgd_ctor(struct mm_struct *mm, pgd_t *pgd)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 90a6dbeeef11..7eb7092424b7 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -139,7 +139,7 @@ struct page {
 			unsigned long _pt_pad_1;	/* compound_head */
 			pgtable_t pmd_huge_pte; /* protected by page->ptl */
 			unsigned long _pt_pad_2;	/* mapping */
-			unsigned long _pt_pad_3;
+			struct mm_struct *pt_mm;	/* x86 pgds only */
 #if ALLOC_SPLIT_PTLOCKS
 			spinlock_t *ptl;
 #else
-- 
2.17.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 13/17] mm: Add hmm_data to struct page
  2018-05-18 19:45 [PATCH v6 00/17] Rearrange struct page Matthew Wilcox
                   ` (11 preceding siblings ...)
  2018-05-18 19:45 ` [PATCH v6 12/17] mm: Add pt_mm to struct page Matthew Wilcox
@ 2018-05-18 19:45 ` Matthew Wilcox
  2018-05-18 19:45 ` [PATCH v6 14/17] slab,slub: Remove rcu_head size checks Matthew Wilcox
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2018-05-18 19:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

Make hmm_data an explicit member of the struct page union.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/hmm.h      |  8 ++------
 include/linux/mm_types.h | 12 ++++++------
 2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 39988924de3a..91c1b2dccbbb 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -522,9 +522,7 @@ void hmm_devmem_remove(struct hmm_devmem *devmem);
 static inline void hmm_devmem_page_set_drvdata(struct page *page,
 					       unsigned long data)
 {
-	unsigned long *drvdata = (unsigned long *)&page->pgmap;
-
-	drvdata[1] = data;
+	page->hmm_data = data;
 }
 
 /*
@@ -535,9 +533,7 @@ static inline void hmm_devmem_page_set_drvdata(struct page *page,
  */
 static inline unsigned long hmm_devmem_page_get_drvdata(const struct page *page)
 {
-	const unsigned long *drvdata = (const unsigned long *)&page->pgmap;
-
-	return drvdata[1];
+	return page->hmm_data;
 }
 
 
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 7eb7092424b7..530a9a2b039b 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -146,15 +146,15 @@ struct page {
 			spinlock_t ptl;
 #endif
 		};
+		struct {	/* ZONE_DEVICE pages */
+			/** @pgmap: Points to the hosting device page map. */
+			struct dev_pagemap *pgmap;
+			unsigned long hmm_data;
+			unsigned long _zd_pad_1;	/* uses mapping */
+		};
 
 		/** @rcu_head: You can use this to free a page by RCU. */
 		struct rcu_head rcu_head;
-
-		/**
-		 * @pgmap: For ZONE_DEVICE pages, this points to the hosting
-		 * device page map.
-		 */
-		struct dev_pagemap *pgmap;
 	};
 
 	union {		/* This union is 4 bytes in size. */
-- 
2.17.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 14/17] slab,slub: Remove rcu_head size checks
  2018-05-18 19:45 [PATCH v6 00/17] Rearrange struct page Matthew Wilcox
                   ` (12 preceding siblings ...)
  2018-05-18 19:45 ` [PATCH v6 13/17] mm: Add hmm_data " Matthew Wilcox
@ 2018-05-18 19:45 ` Matthew Wilcox
  2018-05-18 19:45 ` [PATCH v6 15/17] slub: Remove kmem_cache->reserved Matthew Wilcox
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2018-05-18 19:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

rcu_head may now grow larger than list_head without affecting slab or
slub.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab.c |  2 --
 mm/slub.c | 27 ++-------------------------
 2 files changed, 2 insertions(+), 27 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index e387a17d6d56..e6ab1327db25 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1235,8 +1235,6 @@ void __init kmem_cache_init(void)
 {
 	int i;
 
-	BUILD_BUG_ON(sizeof(((struct page *)NULL)->lru) <
-					sizeof(struct rcu_head));
 	kmem_cache = &kmem_cache_boot;
 
 	if (!IS_ENABLED(CONFIG_NUMA) || num_possible_nodes() == 1)
diff --git a/mm/slub.c b/mm/slub.c
index 57a20f995220..8e2407f69855 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1681,17 +1681,9 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
 	__free_pages(page, order);
 }
 
-#define need_reserve_slab_rcu						\
-	(sizeof(((struct page *)NULL)->lru) < sizeof(struct rcu_head))
-
 static void rcu_free_slab(struct rcu_head *h)
 {
-	struct page *page;
-
-	if (need_reserve_slab_rcu)
-		page = virt_to_head_page(h);
-	else
-		page = container_of((struct list_head *)h, struct page, lru);
+	struct page *page = container_of(h, struct page, rcu_head);
 
 	__free_slab(page->slab_cache, page);
 }
@@ -1699,19 +1691,7 @@ static void rcu_free_slab(struct rcu_head *h)
 static void free_slab(struct kmem_cache *s, struct page *page)
 {
 	if (unlikely(s->flags & SLAB_TYPESAFE_BY_RCU)) {
-		struct rcu_head *head;
-
-		if (need_reserve_slab_rcu) {
-			int order = compound_order(page);
-			int offset = (PAGE_SIZE << order) - s->reserved;
-
-			VM_BUG_ON(s->reserved != sizeof(*head));
-			head = page_address(page) + offset;
-		} else {
-			head = &page->rcu_head;
-		}
-
-		call_rcu(head, rcu_free_slab);
+		call_rcu(&page->rcu_head, rcu_free_slab);
 	} else
 		__free_slab(s, page);
 }
@@ -3578,9 +3558,6 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
 	s->random = get_random_long();
 #endif
 
-	if (need_reserve_slab_rcu && (s->flags & SLAB_TYPESAFE_BY_RCU))
-		s->reserved = sizeof(struct rcu_head);
-
 	if (!calculate_sizes(s, -1))
 		goto error;
 	if (disable_higher_order_debug) {
-- 
2.17.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 15/17] slub: Remove kmem_cache->reserved
  2018-05-18 19:45 [PATCH v6 00/17] Rearrange struct page Matthew Wilcox
                   ` (13 preceding siblings ...)
  2018-05-18 19:45 ` [PATCH v6 14/17] slab,slub: Remove rcu_head size checks Matthew Wilcox
@ 2018-05-18 19:45 ` Matthew Wilcox
  2018-05-18 19:45 ` [PATCH v6 16/17] slub: Remove 'reserved' file from sysfs Matthew Wilcox
  2018-05-18 19:45 ` [PATCH v6 17/17] mm: Distinguish VMalloc pages Matthew Wilcox
  16 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2018-05-18 19:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

The reserved field was only used for embedding an rcu_head in the data
structure.  With the previous commit, we no longer need it.  That lets
us remove the 'reserved' argument to a lot of functions.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Christoph Lameter <cl@linux.com>
---
 include/linux/slub_def.h |  1 -
 mm/slub.c                | 41 ++++++++++++++++++++--------------------
 2 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 3773e26c08c1..09fa2c6f0e68 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -101,7 +101,6 @@ struct kmem_cache {
 	void (*ctor)(void *);
 	unsigned int inuse;		/* Offset to metadata */
 	unsigned int align;		/* Alignment */
-	unsigned int reserved;		/* Reserved bytes at the end of slabs */
 	unsigned int red_left_pad;	/* Left redzone padding size */
 	const char *name;	/* Name (only for display!) */
 	struct list_head list;	/* List of slab caches */
diff --git a/mm/slub.c b/mm/slub.c
index 8e2407f69855..33a811168fa9 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -316,16 +316,16 @@ static inline unsigned int slab_index(void *p, struct kmem_cache *s, void *addr)
 	return (p - addr) / s->size;
 }
 
-static inline unsigned int order_objects(unsigned int order, unsigned int size, unsigned int reserved)
+static inline unsigned int order_objects(unsigned int order, unsigned int size)
 {
-	return (((unsigned int)PAGE_SIZE << order) - reserved) / size;
+	return ((unsigned int)PAGE_SIZE << order) / size;
 }
 
 static inline struct kmem_cache_order_objects oo_make(unsigned int order,
-		unsigned int size, unsigned int reserved)
+		unsigned int size)
 {
 	struct kmem_cache_order_objects x = {
-		(order << OO_SHIFT) + order_objects(order, size, reserved)
+		(order << OO_SHIFT) + order_objects(order, size)
 	};
 
 	return x;
@@ -832,7 +832,7 @@ static int slab_pad_check(struct kmem_cache *s, struct page *page)
 		return 1;
 
 	start = page_address(page);
-	length = (PAGE_SIZE << compound_order(page)) - s->reserved;
+	length = PAGE_SIZE << compound_order(page);
 	end = start + length;
 	remainder = length % s->size;
 	if (!remainder)
@@ -921,7 +921,7 @@ static int check_slab(struct kmem_cache *s, struct page *page)
 		return 0;
 	}
 
-	maxobj = order_objects(compound_order(page), s->size, s->reserved);
+	maxobj = order_objects(compound_order(page), s->size);
 	if (page->objects > maxobj) {
 		slab_err(s, page, "objects %u > max %u",
 			page->objects, maxobj);
@@ -971,7 +971,7 @@ static int on_freelist(struct kmem_cache *s, struct page *page, void *search)
 		nr++;
 	}
 
-	max_objects = order_objects(compound_order(page), s->size, s->reserved);
+	max_objects = order_objects(compound_order(page), s->size);
 	if (max_objects > MAX_OBJS_PER_PAGE)
 		max_objects = MAX_OBJS_PER_PAGE;
 
@@ -3188,21 +3188,21 @@ static unsigned int slub_min_objects;
  */
 static inline unsigned int slab_order(unsigned int size,
 		unsigned int min_objects, unsigned int max_order,
-		unsigned int fract_leftover, unsigned int reserved)
+		unsigned int fract_leftover)
 {
 	unsigned int min_order = slub_min_order;
 	unsigned int order;
 
-	if (order_objects(min_order, size, reserved) > MAX_OBJS_PER_PAGE)
+	if (order_objects(min_order, size) > MAX_OBJS_PER_PAGE)
 		return get_order(size * MAX_OBJS_PER_PAGE) - 1;
 
-	for (order = max(min_order, (unsigned int)get_order(min_objects * size + reserved));
+	for (order = max(min_order, (unsigned int)get_order(min_objects * size));
 			order <= max_order; order++) {
 
 		unsigned int slab_size = (unsigned int)PAGE_SIZE << order;
 		unsigned int rem;
 
-		rem = (slab_size - reserved) % size;
+		rem = slab_size % size;
 
 		if (rem <= slab_size / fract_leftover)
 			break;
@@ -3211,7 +3211,7 @@ static inline unsigned int slab_order(unsigned int size,
 	return order;
 }
 
-static inline int calculate_order(unsigned int size, unsigned int reserved)
+static inline int calculate_order(unsigned int size)
 {
 	unsigned int order;
 	unsigned int min_objects;
@@ -3228,7 +3228,7 @@ static inline int calculate_order(unsigned int size, unsigned int reserved)
 	min_objects = slub_min_objects;
 	if (!min_objects)
 		min_objects = 4 * (fls(nr_cpu_ids) + 1);
-	max_objects = order_objects(slub_max_order, size, reserved);
+	max_objects = order_objects(slub_max_order, size);
 	min_objects = min(min_objects, max_objects);
 
 	while (min_objects > 1) {
@@ -3237,7 +3237,7 @@ static inline int calculate_order(unsigned int size, unsigned int reserved)
 		fraction = 16;
 		while (fraction >= 4) {
 			order = slab_order(size, min_objects,
-					slub_max_order, fraction, reserved);
+					slub_max_order, fraction);
 			if (order <= slub_max_order)
 				return order;
 			fraction /= 2;
@@ -3249,14 +3249,14 @@ static inline int calculate_order(unsigned int size, unsigned int reserved)
 	 * We were unable to place multiple objects in a slab. Now
 	 * lets see if we can place a single object there.
 	 */
-	order = slab_order(size, 1, slub_max_order, 1, reserved);
+	order = slab_order(size, 1, slub_max_order, 1);
 	if (order <= slub_max_order)
 		return order;
 
 	/*
 	 * Doh this slab cannot be placed using slub_max_order.
 	 */
-	order = slab_order(size, 1, MAX_ORDER, 1, reserved);
+	order = slab_order(size, 1, MAX_ORDER, 1);
 	if (order < MAX_ORDER)
 		return order;
 	return -ENOSYS;
@@ -3524,7 +3524,7 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
 	if (forced_order >= 0)
 		order = forced_order;
 	else
-		order = calculate_order(size, s->reserved);
+		order = calculate_order(size);
 
 	if ((int)order < 0)
 		return 0;
@@ -3542,8 +3542,8 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
 	/*
 	 * Determine the number of objects per slab
 	 */
-	s->oo = oo_make(order, size, s->reserved);
-	s->min = oo_make(get_order(size), size, s->reserved);
+	s->oo = oo_make(order, size);
+	s->min = oo_make(get_order(size), size);
 	if (oo_objects(s->oo) > oo_objects(s->max))
 		s->max = s->oo;
 
@@ -3553,7 +3553,6 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
 static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
 {
 	s->flags = kmem_cache_flags(s->size, flags, s->name, s->ctor);
-	s->reserved = 0;
 #ifdef CONFIG_SLAB_FREELIST_HARDENED
 	s->random = get_random_long();
 #endif
@@ -5097,7 +5096,7 @@ SLAB_ATTR_RO(destroy_by_rcu);
 
 static ssize_t reserved_show(struct kmem_cache *s, char *buf)
 {
-	return sprintf(buf, "%u\n", s->reserved);
+	return sprintf(buf, "0\n");
 }
 SLAB_ATTR_RO(reserved);
 
-- 
2.17.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 16/17] slub: Remove 'reserved' file from sysfs
  2018-05-18 19:45 [PATCH v6 00/17] Rearrange struct page Matthew Wilcox
                   ` (14 preceding siblings ...)
  2018-05-18 19:45 ` [PATCH v6 15/17] slub: Remove kmem_cache->reserved Matthew Wilcox
@ 2018-05-18 19:45 ` Matthew Wilcox
  2018-05-18 19:45 ` [PATCH v6 17/17] mm: Distinguish VMalloc pages Matthew Wilcox
  16 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2018-05-18 19:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

Christoph doubts anyone was using the 'reserved' file in sysfs, so
remove it.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Christoph Lameter <cl@linux.com>
---
 mm/slub.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 33a811168fa9..4d77fd8450fa 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5094,12 +5094,6 @@ static ssize_t destroy_by_rcu_show(struct kmem_cache *s, char *buf)
 }
 SLAB_ATTR_RO(destroy_by_rcu);
 
-static ssize_t reserved_show(struct kmem_cache *s, char *buf)
-{
-	return sprintf(buf, "0\n");
-}
-SLAB_ATTR_RO(reserved);
-
 #ifdef CONFIG_SLUB_DEBUG
 static ssize_t slabs_show(struct kmem_cache *s, char *buf)
 {
@@ -5412,7 +5406,6 @@ static struct attribute *slab_attrs[] = {
 	&reclaim_account_attr.attr,
 	&destroy_by_rcu_attr.attr,
 	&shrink_attr.attr,
-	&reserved_attr.attr,
 	&slabs_cpu_partial_attr.attr,
 #ifdef CONFIG_SLUB_DEBUG
 	&total_objects_attr.attr,
-- 
2.17.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 17/17] mm: Distinguish VMalloc pages
  2018-05-18 19:45 [PATCH v6 00/17] Rearrange struct page Matthew Wilcox
                   ` (15 preceding siblings ...)
  2018-05-18 19:45 ` [PATCH v6 16/17] slub: Remove 'reserved' file from sysfs Matthew Wilcox
@ 2018-05-18 19:45 ` Matthew Wilcox
  2018-05-22 16:10   ` Andrey Ryabinin
  16 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox @ 2018-05-18 19:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

From: Matthew Wilcox <mawilcox@microsoft.com>

For diagnosing various performance and memory-leak problems, it is helpful
to be able to distinguish pages which are in use as VMalloc pages.
Unfortunately, we cannot use the page_type field in struct page, as
this is in use for mapcount by some drivers which map vmalloced pages
to userspace.

Use a special page->mapping value to distinguish VMalloc pages from
other kinds of pages.  Also record a pointer to the vm_struct and the
offset within the area in struct page to help reconstruct exactly what
this page is being used for.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/proc/page.c                         |  2 ++
 include/linux/mm_types.h               |  5 +++++
 include/linux/page-flags.h             | 25 +++++++++++++++++++++++++
 include/uapi/linux/kernel-page-flags.h |  1 +
 mm/vmalloc.c                           |  5 ++++-
 tools/vm/page-types.c                  |  1 +
 6 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/fs/proc/page.c b/fs/proc/page.c
index 792c78a49174..fc83dae1af7b 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -156,6 +156,8 @@ u64 stable_page_flags(struct page *page)
 		u |= 1 << KPF_BALLOON;
 	if (PageTable(page))
 		u |= 1 << KPF_PGTABLE;
+	if (PageVMalloc(page))
+		u |= 1 << KPF_VMALLOC;
 
 	if (page_is_idle(page))
 		u |= 1 << KPF_IDLE;
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 530a9a2b039b..9a3b677e2c1d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -146,6 +146,11 @@ struct page {
 			spinlock_t ptl;
 #endif
 		};
+		struct {	/* VMalloc pages */
+			struct vm_struct *vm_area;
+			unsigned long vm_offset;
+			unsigned long _vm_id;	/* MAPPING_VMalloc */
+		};
 		struct {	/* ZONE_DEVICE pages */
 			/** @pgmap: Points to the hosting device page map. */
 			struct dev_pagemap *pgmap;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 901943e4754b..5232433175c1 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -699,6 +699,31 @@ PAGE_TYPE_OPS(Kmemcg, kmemcg)
  */
 PAGE_TYPE_OPS(Table, table)
 
+/*
+ * vmalloc pages may be mapped to userspace, so we need some other way
+ * to distinguish them from other kinds of pages.  Use page->mapping
+ * for this purpose.  Values below 0x1000 cannot be real pointers.
+ */
+#define MAPPING_VMalloc		(void *)0x440
+
+#define PAGE_MAPPING_OPS(name)						\
+static __always_inline int Page##name(struct page *page)		\
+{									\
+	return page->mapping == MAPPING_##name;				\
+}									\
+static __always_inline void __SetPage##name(struct page *page)		\
+{									\
+	VM_BUG_ON_PAGE(page->mapping != NULL, page);			\
+	page->mapping = MAPPING_##name;					\
+}									\
+static __always_inline void __ClearPage##name(struct page *page)	\
+{									\
+	VM_BUG_ON_PAGE(page->mapping != MAPPING_##name, page);		\
+	page->mapping = NULL;						\
+}
+
+PAGE_MAPPING_OPS(VMalloc)
+
 extern bool is_free_buddy_page(struct page *page);
 
 __PAGEFLAG(Isolated, isolated, PF_ANY);
diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h
index 21b9113c69da..6800968b8f47 100644
--- a/include/uapi/linux/kernel-page-flags.h
+++ b/include/uapi/linux/kernel-page-flags.h
@@ -36,5 +36,6 @@
 #define KPF_ZERO_PAGE		24
 #define KPF_IDLE		25
 #define KPF_PGTABLE		26
+#define KPF_VMALLOC		27
 
 #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 5fbf27e7f956..98bc690d472d 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1535,7 +1535,7 @@ static void __vunmap(const void *addr, int deallocate_pages)
 		for (i = 0; i < area->nr_pages; i++) {
 			struct page *page = area->pages[i];
 
-			BUG_ON(!page);
+			__ClearPageVMalloc(page);
 			__free_pages(page, 0);
 		}
 
@@ -1704,6 +1704,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 			area->nr_pages = i;
 			goto fail;
 		}
+		__SetPageVMalloc(page);
+		page->vm_area = area;
+		page->vm_offset = i;
 		area->pages[i] = page;
 		if (gfpflags_allow_blocking(gfp_mask))
 			cond_resched();
diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
index cce853dca691..25cc21855be4 100644
--- a/tools/vm/page-types.c
+++ b/tools/vm/page-types.c
@@ -132,6 +132,7 @@ static const char * const page_flag_names[] = {
 	[KPF_THP]		= "t:thp",
 	[KPF_BALLOON]		= "o:balloon",
 	[KPF_PGTABLE]		= "g:pgtable",
+	[KPF_VMALLOC]		= "V:vmalloc",
 	[KPF_ZERO_PAGE]		= "z:zero_page",
 	[KPF_IDLE]              = "i:idle_page",
 
-- 
2.17.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 17/17] mm: Distinguish VMalloc pages
  2018-05-18 19:45 ` [PATCH v6 17/17] mm: Distinguish VMalloc pages Matthew Wilcox
@ 2018-05-22 16:10   ` Andrey Ryabinin
  2018-05-22 17:58     ` Matthew Wilcox
  0 siblings, 1 reply; 32+ messages in thread
From: Andrey Ryabinin @ 2018-05-22 16:10 UTC (permalink / raw)
  To: Matthew Wilcox, linux-mm
  Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse



On 05/18/2018 10:45 PM, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> For diagnosing various performance and memory-leak problems, it is helpful
> to be able to distinguish pages which are in use as VMalloc pages.
> Unfortunately, we cannot use the page_type field in struct page, as
> this is in use for mapcount by some drivers which map vmalloced pages
> to userspace.
> 
> Use a special page->mapping value to distinguish VMalloc pages from
> other kinds of pages.  Also record a pointer to the vm_struct and the
> offset within the area in struct page to help reconstruct exactly what
> this page is being used for.
> 


This seems useless. page->vm_area and page->vm_offset are never used.
There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems',
and no explanation how is it can be used in current form.

Also, this patch breaks code like this:
	if (mapping = page_mapping(page))
		// access mapping

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 17/17] mm: Distinguish VMalloc pages
  2018-05-22 16:10   ` Andrey Ryabinin
@ 2018-05-22 17:58     ` Matthew Wilcox
  2018-05-22 19:49       ` Andrew Morton
  2018-05-22 19:57       ` Andrey Ryabinin
  0 siblings, 2 replies; 32+ messages in thread
From: Matthew Wilcox @ 2018-05-22 17:58 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote:
> On 05/18/2018 10:45 PM, Matthew Wilcox wrote:
> > From: Matthew Wilcox <mawilcox@microsoft.com>
> > 
> > For diagnosing various performance and memory-leak problems, it is helpful
> > to be able to distinguish pages which are in use as VMalloc pages.
> > Unfortunately, we cannot use the page_type field in struct page, as
> > this is in use for mapcount by some drivers which map vmalloced pages
> > to userspace.
> > 
> > Use a special page->mapping value to distinguish VMalloc pages from
> > other kinds of pages.  Also record a pointer to the vm_struct and the
> > offset within the area in struct page to help reconstruct exactly what
> > this page is being used for.
> 
> This seems useless. page->vm_area and page->vm_offset are never used.
> There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems',
> and no explanation how is it can be used in current form.

Right now, it's by-hand.  tools/vm/page-types.c will tell you which pages
are allocated to VMalloc.  Many people use kernel debuggers, crashdumps
and similar to examine the kernel's memory.  Leaving these breadcrumbs
is helpful, and those fields simply weren't in use before.

> Also, this patch breaks code like this:
> 	if (mapping = page_mapping(page))
> 		// access mapping

Example of broken code, please?  Pages allocated from the page allocator
with alloc_page() come with page->mapping == NULL.  This code snippet
would not have granted access to vmalloc pages before.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 17/17] mm: Distinguish VMalloc pages
  2018-05-22 17:58     ` Matthew Wilcox
@ 2018-05-22 19:49       ` Andrew Morton
  2018-05-22 19:57       ` Andrey Ryabinin
  1 sibling, 0 replies; 32+ messages in thread
From: Andrew Morton @ 2018-05-22 19:49 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrey Ryabinin, linux-mm, Matthew Wilcox, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

On Tue, 22 May 2018 10:58:36 -0700 Matthew Wilcox <willy@infradead.org> wrote:

> On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote:
> > On 05/18/2018 10:45 PM, Matthew Wilcox wrote:
> > > From: Matthew Wilcox <mawilcox@microsoft.com>
> > > 
> > > For diagnosing various performance and memory-leak problems, it is helpful
> > > to be able to distinguish pages which are in use as VMalloc pages.
> > > Unfortunately, we cannot use the page_type field in struct page, as
> > > this is in use for mapcount by some drivers which map vmalloced pages
> > > to userspace.
> > > 
> > > Use a special page->mapping value to distinguish VMalloc pages from
> > > other kinds of pages.  Also record a pointer to the vm_struct and the
> > > offset within the area in struct page to help reconstruct exactly what
> > > this page is being used for.
> > 
> > This seems useless. page->vm_area and page->vm_offset are never used.
> > There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems',
> > and no explanation how is it can be used in current form.
> 
> Right now, it's by-hand.  tools/vm/page-types.c will tell you which pages
> are allocated to VMalloc.  Many people use kernel debuggers, crashdumps
> and similar to examine the kernel's memory.  Leaving these breadcrumbs
> is helpful, and those fields simply weren't in use before.

I added this to the changelog:

: No in-kernel code uses the new KPF_VMALLOC.  Like the other KPF_*
: flags, it is for use by tools/vm/page-types.c.

> > Also, this patch breaks code like this:
> > 	if (mapping = page_mapping(page))
> > 		// access mapping
> 
> Example of broken code, please?  Pages allocated from the page allocator
> with alloc_page() come with page->mapping == NULL.  This code snippet
> would not have granted access to vmalloc pages before.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 17/17] mm: Distinguish VMalloc pages
  2018-05-22 17:58     ` Matthew Wilcox
  2018-05-22 19:49       ` Andrew Morton
@ 2018-05-22 19:57       ` Andrey Ryabinin
  2018-05-22 20:19         ` Matthew Wilcox
  2018-05-23  6:34         ` Michal Hocko
  1 sibling, 2 replies; 32+ messages in thread
From: Andrey Ryabinin @ 2018-05-22 19:57 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse



On 05/22/2018 08:58 PM, Matthew Wilcox wrote:
> On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote:
>> On 05/18/2018 10:45 PM, Matthew Wilcox wrote:
>>> From: Matthew Wilcox <mawilcox@microsoft.com>
>>>
>>> For diagnosing various performance and memory-leak problems, it is helpful
>>> to be able to distinguish pages which are in use as VMalloc pages.
>>> Unfortunately, we cannot use the page_type field in struct page, as
>>> this is in use for mapcount by some drivers which map vmalloced pages
>>> to userspace.
>>>
>>> Use a special page->mapping value to distinguish VMalloc pages from
>>> other kinds of pages.  Also record a pointer to the vm_struct and the
>>> offset within the area in struct page to help reconstruct exactly what
>>> this page is being used for.
>>
>> This seems useless. page->vm_area and page->vm_offset are never used.
>> There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems',
>> and no explanation how is it can be used in current form.
> 
> Right now, it's by-hand.  tools/vm/page-types.c will tell you which pages
> are allocated to VMalloc.  Many people use kernel debuggers, crashdumps
> and similar to examine the kernel's memory.  Leaving these breadcrumbs
> is helpful, and those fields simply weren't in use before.
> 
>> Also, this patch breaks code like this:
>> 	if (mapping = page_mapping(page))
>> 		// access mapping
> 
> Example of broken code, please?  Pages allocated from the page allocator
> with alloc_page() come with page->mapping == NULL.  This code snippet
> would not have granted access to vmalloc pages before.
> 

Some implementation of the flush_dcache_page(), also set_page_dirty() can be called
on userspace-mapped vmalloc pages during unmap - zap_pte_range() -> set_page_dirty()

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 17/17] mm: Distinguish VMalloc pages
  2018-05-22 19:57       ` Andrey Ryabinin
@ 2018-05-22 20:19         ` Matthew Wilcox
  2018-05-22 20:48           ` Andrew Morton
  2018-05-23  6:36           ` Michal Hocko
  2018-05-23  6:34         ` Michal Hocko
  1 sibling, 2 replies; 32+ messages in thread
From: Matthew Wilcox @ 2018-05-22 20:19 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

On Tue, May 22, 2018 at 10:57:34PM +0300, Andrey Ryabinin wrote:
> On 05/22/2018 08:58 PM, Matthew Wilcox wrote:
> > On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote:
> >> On 05/18/2018 10:45 PM, Matthew Wilcox wrote:
> >>> From: Matthew Wilcox <mawilcox@microsoft.com>
> >>>
> >>> For diagnosing various performance and memory-leak problems, it is helpful
> >>> to be able to distinguish pages which are in use as VMalloc pages.
> >>> Unfortunately, we cannot use the page_type field in struct page, as
> >>> this is in use for mapcount by some drivers which map vmalloced pages
> >>> to userspace.
> >>>
> >>> Use a special page->mapping value to distinguish VMalloc pages from
> >>> other kinds of pages.  Also record a pointer to the vm_struct and the
> >>> offset within the area in struct page to help reconstruct exactly what
> >>> this page is being used for.
> >>
> >> This seems useless. page->vm_area and page->vm_offset are never used.
> >> There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems',
> >> and no explanation how is it can be used in current form.
> > 
> > Right now, it's by-hand.  tools/vm/page-types.c will tell you which pages
> > are allocated to VMalloc.  Many people use kernel debuggers, crashdumps
> > and similar to examine the kernel's memory.  Leaving these breadcrumbs
> > is helpful, and those fields simply weren't in use before.
> > 
> >> Also, this patch breaks code like this:
> >> 	if (mapping = page_mapping(page))
> >> 		// access mapping
> > 
> > Example of broken code, please?  Pages allocated from the page allocator
> > with alloc_page() come with page->mapping == NULL.  This code snippet
> > would not have granted access to vmalloc pages before.
> > 
> 
> Some implementation of the flush_dcache_page(), also set_page_dirty() can be called
> on userspace-mapped vmalloc pages during unmap - zap_pte_range() -> set_page_dirty()

Ah, good catch!  I'm anticipating we'll have other special values for
page->mapping in the future. so how about this?

(no changelog because I assume Andrew will add this as a -fix patch)

diff --git a/mm/util.c b/mm/util.c
index 10ca6f1d5c75..be81c9052ef7 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -561,6 +561,8 @@ struct address_space *page_mapping(struct page *page)
 	mapping = page->mapping;
 	if ((unsigned long)mapping & PAGE_MAPPING_ANON)
 		return NULL;
+	if ((unsigned long)mapping < PAGE_SIZE)
+		return NULL;
 
 	return (void *)((unsigned long)mapping & ~PAGE_MAPPING_FLAGS);
 }

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 17/17] mm: Distinguish VMalloc pages
  2018-05-22 20:19         ` Matthew Wilcox
@ 2018-05-22 20:48           ` Andrew Morton
  2018-05-22 21:45             ` Matthew Wilcox
  2018-05-23  6:36           ` Michal Hocko
  1 sibling, 1 reply; 32+ messages in thread
From: Andrew Morton @ 2018-05-22 20:48 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrey Ryabinin, linux-mm, Matthew Wilcox, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

On Tue, 22 May 2018 13:19:58 -0700 Matthew Wilcox <willy@infradead.org> wrote:

> On Tue, May 22, 2018 at 10:57:34PM +0300, Andrey Ryabinin wrote:
> > On 05/22/2018 08:58 PM, Matthew Wilcox wrote:
> > > On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote:
> > >> On 05/18/2018 10:45 PM, Matthew Wilcox wrote:
> > >>> From: Matthew Wilcox <mawilcox@microsoft.com>
> > >>>
> > >>> For diagnosing various performance and memory-leak problems, it is helpful
> > >>> to be able to distinguish pages which are in use as VMalloc pages.
> > >>> Unfortunately, we cannot use the page_type field in struct page, as
> > >>> this is in use for mapcount by some drivers which map vmalloced pages
> > >>> to userspace.
> > >>>
> > >>> Use a special page->mapping value to distinguish VMalloc pages from
> > >>> other kinds of pages.  Also record a pointer to the vm_struct and the
> > >>> offset within the area in struct page to help reconstruct exactly what
> > >>> this page is being used for.
> > >>
> > >> This seems useless. page->vm_area and page->vm_offset are never used.
> > >> There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems',
> > >> and no explanation how is it can be used in current form.
> > > 
> > > Right now, it's by-hand.  tools/vm/page-types.c will tell you which pages
> > > are allocated to VMalloc.  Many people use kernel debuggers, crashdumps
> > > and similar to examine the kernel's memory.  Leaving these breadcrumbs
> > > is helpful, and those fields simply weren't in use before.
> > > 
> > >> Also, this patch breaks code like this:
> > >> 	if (mapping = page_mapping(page))
> > >> 		// access mapping
> > > 
> > > Example of broken code, please?  Pages allocated from the page allocator
> > > with alloc_page() come with page->mapping == NULL.  This code snippet
> > > would not have granted access to vmalloc pages before.
> > > 
> > 
> > Some implementation of the flush_dcache_page(), also set_page_dirty() can be called
> > on userspace-mapped vmalloc pages during unmap - zap_pte_range() -> set_page_dirty()
> 
> Ah, good catch!  I'm anticipating we'll have other special values for
> page->mapping in the future. so how about this?
> 
> (no changelog because I assume Andrew will add this as a -fix patch)

I give the -fix patches a single-line summary in the final rollup.

> diff --git a/mm/util.c b/mm/util.c
> index 10ca6f1d5c75..be81c9052ef7 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -561,6 +561,8 @@ struct address_space *page_mapping(struct page *page)
>  	mapping = page->mapping;
>  	if ((unsigned long)mapping & PAGE_MAPPING_ANON)
>  		return NULL;
> +	if ((unsigned long)mapping < PAGE_SIZE)
> +		return NULL;
>  
>  	return (void *)((unsigned long)mapping & ~PAGE_MAPPING_FLAGS);
>  }

-ENOCOMMENT ;)

--- a/mm/util.c~mm-distinguish-vmalloc-pages-fix-fix
+++ a/mm/util.c
@@ -512,6 +512,8 @@ struct address_space *page_mapping(struc
 	mapping = page->mapping;
 	if ((unsigned long)mapping & PAGE_MAPPING_ANON)
 		return NULL;
+
+	/* Don't trip over a vmalloc page's MAPPING_VMalloc cookie */
 	if ((unsigned long)mapping < PAGE_SIZE)
 		return NULL;
 
It's a bit sad to put even more stuff into page_mapping() just for
page_types diddling.  Is this really justified?  How many people will
use it, and get significant benefit from it?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 17/17] mm: Distinguish VMalloc pages
  2018-05-22 20:48           ` Andrew Morton
@ 2018-05-22 21:45             ` Matthew Wilcox
  2018-05-22 23:02               ` Andrew Morton
  0 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox @ 2018-05-22 21:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andrey Ryabinin, linux-mm, Matthew Wilcox, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

On Tue, May 22, 2018 at 01:48:38PM -0700, Andrew Morton wrote:
> -ENOCOMMENT ;)
> 
> --- a/mm/util.c~mm-distinguish-vmalloc-pages-fix-fix
> +++ a/mm/util.c
> @@ -512,6 +512,8 @@ struct address_space *page_mapping(struc
>  	mapping = page->mapping;
>  	if ((unsigned long)mapping & PAGE_MAPPING_ANON)
>  		return NULL;
> +
> +	/* Don't trip over a vmalloc page's MAPPING_VMalloc cookie */
>  	if ((unsigned long)mapping < PAGE_SIZE)
>  		return NULL;
>  
> It's a bit sad to put even more stuff into page_mapping() just for
> page_types diddling.  Is this really justified?  How many people will
> use it, and get significant benefit from it?

We could leave page->mapping NULL for vmalloc pages.  We just need to
find a spot where we can put a unique identifier.  The first word of
the union looks like a string candidate; bit 0 is already reserved for
PageTail.  The other users are list_head.prev, a struct page *, and
struct dev_pagemap *, so that should work out OK.

If you want to just drop this patch, I'd be OK with that.  I can always
submit it to you again next merge window.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 17/17] mm: Distinguish VMalloc pages
  2018-05-22 21:45             ` Matthew Wilcox
@ 2018-05-22 23:02               ` Andrew Morton
  0 siblings, 0 replies; 32+ messages in thread
From: Andrew Morton @ 2018-05-22 23:02 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrey Ryabinin, linux-mm, Matthew Wilcox, Kirill A . Shutemov,
	Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
	Dave Hansen, Jérôme Glisse

On Tue, 22 May 2018 14:45:17 -0700 Matthew Wilcox <willy@infradead.org> wrote:

> On Tue, May 22, 2018 at 01:48:38PM -0700, Andrew Morton wrote:
> > -ENOCOMMENT ;)
> > 
> > --- a/mm/util.c~mm-distinguish-vmalloc-pages-fix-fix
> > +++ a/mm/util.c
> > @@ -512,6 +512,8 @@ struct address_space *page_mapping(struc
> >  	mapping = page->mapping;
> >  	if ((unsigned long)mapping & PAGE_MAPPING_ANON)
> >  		return NULL;
> > +
> > +	/* Don't trip over a vmalloc page's MAPPING_VMalloc cookie */
> >  	if ((unsigned long)mapping < PAGE_SIZE)
> >  		return NULL;
> >  
> > It's a bit sad to put even more stuff into page_mapping() just for
> > page_types diddling.  Is this really justified?  How many people will
> > use it, and get significant benefit from it?
> 
> We could leave page->mapping NULL for vmalloc pages.  We just need to
> find a spot where we can put a unique identifier.  The first word of
> the union looks like a string candidate; bit 0 is already reserved for
> PageTail.  The other users are list_head.prev, a struct page *, and
> struct dev_pagemap *, so that should work out OK.
> 
> If you want to just drop this patch, I'd be OK with that.  I can always
> submit it to you again next merge window.

OK, let's park it for now.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 17/17] mm: Distinguish VMalloc pages
  2018-05-22 19:57       ` Andrey Ryabinin
  2018-05-22 20:19         ` Matthew Wilcox
@ 2018-05-23  6:34         ` Michal Hocko
  2018-05-23  9:14           ` Andrey Ryabinin
  1 sibling, 1 reply; 32+ messages in thread
From: Michal Hocko @ 2018-05-23  6:34 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Matthew Wilcox, linux-mm, Matthew Wilcox, Andrew Morton,
	Kirill A . Shutemov, Christoph Lameter, Lai Jiangshan,
	Pekka Enberg, Vlastimil Babka, Dave Hansen,
	Jérôme Glisse

On Tue 22-05-18 22:57:34, Andrey Ryabinin wrote:
> 
> 
> On 05/22/2018 08:58 PM, Matthew Wilcox wrote:
> > On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote:
> >> On 05/18/2018 10:45 PM, Matthew Wilcox wrote:
> >>> From: Matthew Wilcox <mawilcox@microsoft.com>
> >>>
> >>> For diagnosing various performance and memory-leak problems, it is helpful
> >>> to be able to distinguish pages which are in use as VMalloc pages.
> >>> Unfortunately, we cannot use the page_type field in struct page, as
> >>> this is in use for mapcount by some drivers which map vmalloced pages
> >>> to userspace.
> >>>
> >>> Use a special page->mapping value to distinguish VMalloc pages from
> >>> other kinds of pages.  Also record a pointer to the vm_struct and the
> >>> offset within the area in struct page to help reconstruct exactly what
> >>> this page is being used for.
> >>
> >> This seems useless. page->vm_area and page->vm_offset are never used.
> >> There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems',
> >> and no explanation how is it can be used in current form.
> > 
> > Right now, it's by-hand.  tools/vm/page-types.c will tell you which pages
> > are allocated to VMalloc.  Many people use kernel debuggers, crashdumps
> > and similar to examine the kernel's memory.  Leaving these breadcrumbs
> > is helpful, and those fields simply weren't in use before.
> > 
> >> Also, this patch breaks code like this:
> >> 	if (mapping = page_mapping(page))
> >> 		// access mapping
> > 
> > Example of broken code, please?  Pages allocated from the page allocator
> > with alloc_page() come with page->mapping == NULL.  This code snippet
> > would not have granted access to vmalloc pages before.
> > 
> 
> Some implementation of the flush_dcache_page(), also set_page_dirty() can be called
> on userspace-mapped vmalloc pages during unmap - zap_pte_range() -> set_page_dirty()

Do you have any specific example? Why would anybody map vmalloc pages to
the userspace? flush_dcache_page on a vmalloc page sounds quite
unexpected to me as well.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 17/17] mm: Distinguish VMalloc pages
  2018-05-22 20:19         ` Matthew Wilcox
  2018-05-22 20:48           ` Andrew Morton
@ 2018-05-23  6:36           ` Michal Hocko
  1 sibling, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2018-05-23  6:36 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrey Ryabinin, linux-mm, Matthew Wilcox, Andrew Morton,
	Kirill A . Shutemov, Christoph Lameter, Lai Jiangshan,
	Pekka Enberg, Vlastimil Babka, Dave Hansen,
	Jérôme Glisse

On Tue 22-05-18 13:19:58, Matthew Wilcox wrote:
> On Tue, May 22, 2018 at 10:57:34PM +0300, Andrey Ryabinin wrote:
> > On 05/22/2018 08:58 PM, Matthew Wilcox wrote:
> > > On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote:
> > >> On 05/18/2018 10:45 PM, Matthew Wilcox wrote:
> > >>> From: Matthew Wilcox <mawilcox@microsoft.com>
> > >>>
> > >>> For diagnosing various performance and memory-leak problems, it is helpful
> > >>> to be able to distinguish pages which are in use as VMalloc pages.
> > >>> Unfortunately, we cannot use the page_type field in struct page, as
> > >>> this is in use for mapcount by some drivers which map vmalloced pages
> > >>> to userspace.
> > >>>
> > >>> Use a special page->mapping value to distinguish VMalloc pages from
> > >>> other kinds of pages.  Also record a pointer to the vm_struct and the
> > >>> offset within the area in struct page to help reconstruct exactly what
> > >>> this page is being used for.
> > >>
> > >> This seems useless. page->vm_area and page->vm_offset are never used.
> > >> There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems',
> > >> and no explanation how is it can be used in current form.
> > > 
> > > Right now, it's by-hand.  tools/vm/page-types.c will tell you which pages
> > > are allocated to VMalloc.  Many people use kernel debuggers, crashdumps
> > > and similar to examine the kernel's memory.  Leaving these breadcrumbs
> > > is helpful, and those fields simply weren't in use before.
> > > 
> > >> Also, this patch breaks code like this:
> > >> 	if (mapping = page_mapping(page))
> > >> 		// access mapping
> > > 
> > > Example of broken code, please?  Pages allocated from the page allocator
> > > with alloc_page() come with page->mapping == NULL.  This code snippet
> > > would not have granted access to vmalloc pages before.
> > > 
> > 
> > Some implementation of the flush_dcache_page(), also set_page_dirty() can be called
> > on userspace-mapped vmalloc pages during unmap - zap_pte_range() -> set_page_dirty()
> 
> Ah, good catch!  I'm anticipating we'll have other special values for
> page->mapping in the future. so how about this?
> 
> (no changelog because I assume Andrew will add this as a -fix patch)
> 
> diff --git a/mm/util.c b/mm/util.c
> index 10ca6f1d5c75..be81c9052ef7 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -561,6 +561,8 @@ struct address_space *page_mapping(struct page *page)
>  	mapping = page->mapping;
>  	if ((unsigned long)mapping & PAGE_MAPPING_ANON)
>  		return NULL;
> +	if ((unsigned long)mapping < PAGE_SIZE)
> +		return NULL;
>  
>  	return (void *)((unsigned long)mapping & ~PAGE_MAPPING_FLAGS);
>  }

Well, this would be quite unfortunate. We do not want to pay a branch
price for something that doesn't have a _real_ user. Which is kinda sad
because I found the explicit vmalloc page "flag" nice to have (if it was
for free basically).

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 17/17] mm: Distinguish VMalloc pages
  2018-05-23  6:34         ` Michal Hocko
@ 2018-05-23  9:14           ` Andrey Ryabinin
  2018-05-23  9:25             ` Michal Hocko
  0 siblings, 1 reply; 32+ messages in thread
From: Andrey Ryabinin @ 2018-05-23  9:14 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Matthew Wilcox, linux-mm, Matthew Wilcox, Andrew Morton,
	Kirill A . Shutemov, Christoph Lameter, Lai Jiangshan,
	Pekka Enberg, Vlastimil Babka, Dave Hansen,
	Jérôme Glisse



On 05/23/2018 09:34 AM, Michal Hocko wrote:
> On Tue 22-05-18 22:57:34, Andrey Ryabinin wrote:
>>
>>
>> On 05/22/2018 08:58 PM, Matthew Wilcox wrote:
>>> On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote:
>>>> On 05/18/2018 10:45 PM, Matthew Wilcox wrote:
>>>>> From: Matthew Wilcox <mawilcox@microsoft.com>
>>>>>
>>>>> For diagnosing various performance and memory-leak problems, it is helpful
>>>>> to be able to distinguish pages which are in use as VMalloc pages.
>>>>> Unfortunately, we cannot use the page_type field in struct page, as
>>>>> this is in use for mapcount by some drivers which map vmalloced pages
>>>>> to userspace.
>>>>>
>>>>> Use a special page->mapping value to distinguish VMalloc pages from
>>>>> other kinds of pages.  Also record a pointer to the vm_struct and the
>>>>> offset within the area in struct page to help reconstruct exactly what
>>>>> this page is being used for.
>>>>
>>>> This seems useless. page->vm_area and page->vm_offset are never used.
>>>> There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems',
>>>> and no explanation how is it can be used in current form.
>>>
>>> Right now, it's by-hand.  tools/vm/page-types.c will tell you which pages
>>> are allocated to VMalloc.  Many people use kernel debuggers, crashdumps
>>> and similar to examine the kernel's memory.  Leaving these breadcrumbs
>>> is helpful, and those fields simply weren't in use before.
>>>
>>>> Also, this patch breaks code like this:
>>>> 	if (mapping = page_mapping(page))
>>>> 		// access mapping
>>>
>>> Example of broken code, please?  Pages allocated from the page allocator
>>> with alloc_page() come with page->mapping == NULL.  This code snippet
>>> would not have granted access to vmalloc pages before.
>>>
>>
>> Some implementation of the flush_dcache_page(), also set_page_dirty() can be called
>> on userspace-mapped vmalloc pages during unmap - zap_pte_range() -> set_page_dirty()
> 
> Do you have any specific example?

git grep -e remap_vmalloc_range -e vmalloc_user

But that's not all, vmalloc*() + vmalloc_to_page() + vm_insert_page() are another candidates.

> Why would anybody map vmalloc pages to the userspace?

To have shared memory between usespace and the kernel.

> flush_dcache_page on a vmalloc page sounds quite
> unexpected to me as well.
> 

remap_vmalloc_range()->vm_insret_page()->insert_page()->flush_dcache_page()

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 17/17] mm: Distinguish VMalloc pages
  2018-05-23  9:14           ` Andrey Ryabinin
@ 2018-05-23  9:25             ` Michal Hocko
  2018-05-23  9:28               ` Andrey Ryabinin
  0 siblings, 1 reply; 32+ messages in thread
From: Michal Hocko @ 2018-05-23  9:25 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Matthew Wilcox, linux-mm, Matthew Wilcox, Andrew Morton,
	Kirill A . Shutemov, Christoph Lameter, Lai Jiangshan,
	Pekka Enberg, Vlastimil Babka, Dave Hansen,
	Jérôme Glisse

On Wed 23-05-18 12:14:10, Andrey Ryabinin wrote:
> 
> 
> On 05/23/2018 09:34 AM, Michal Hocko wrote:
> > On Tue 22-05-18 22:57:34, Andrey Ryabinin wrote:
> >>
> >>
> >> On 05/22/2018 08:58 PM, Matthew Wilcox wrote:
> >>> On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote:
> >>>> On 05/18/2018 10:45 PM, Matthew Wilcox wrote:
> >>>>> From: Matthew Wilcox <mawilcox@microsoft.com>
> >>>>>
> >>>>> For diagnosing various performance and memory-leak problems, it is helpful
> >>>>> to be able to distinguish pages which are in use as VMalloc pages.
> >>>>> Unfortunately, we cannot use the page_type field in struct page, as
> >>>>> this is in use for mapcount by some drivers which map vmalloced pages
> >>>>> to userspace.
> >>>>>
> >>>>> Use a special page->mapping value to distinguish VMalloc pages from
> >>>>> other kinds of pages.  Also record a pointer to the vm_struct and the
> >>>>> offset within the area in struct page to help reconstruct exactly what
> >>>>> this page is being used for.
> >>>>
> >>>> This seems useless. page->vm_area and page->vm_offset are never used.
> >>>> There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems',
> >>>> and no explanation how is it can be used in current form.
> >>>
> >>> Right now, it's by-hand.  tools/vm/page-types.c will tell you which pages
> >>> are allocated to VMalloc.  Many people use kernel debuggers, crashdumps
> >>> and similar to examine the kernel's memory.  Leaving these breadcrumbs
> >>> is helpful, and those fields simply weren't in use before.
> >>>
> >>>> Also, this patch breaks code like this:
> >>>> 	if (mapping = page_mapping(page))
> >>>> 		// access mapping
> >>>
> >>> Example of broken code, please?  Pages allocated from the page allocator
> >>> with alloc_page() come with page->mapping == NULL.  This code snippet
> >>> would not have granted access to vmalloc pages before.
> >>>
> >>
> >> Some implementation of the flush_dcache_page(), also set_page_dirty() can be called
> >> on userspace-mapped vmalloc pages during unmap - zap_pte_range() -> set_page_dirty()
> > 
> > Do you have any specific example?
> 
> git grep -e remap_vmalloc_range -e vmalloc_user
> 
> But that's not all, vmalloc*() + vmalloc_to_page() + vm_insert_page() are another candidates.

Thanks for the pointer. I was not aware of remap_vmalloc_range.
> 
> > Why would anybody map vmalloc pages to the userspace?
> 
> To have shared memory between usespace and the kernel.

OK, so the point seems to be to share large physically contiguous memory
with userspace.

> > flush_dcache_page on a vmalloc page sounds quite
> > unexpected to me as well.
> > 
> 
> remap_vmalloc_range()->vm_insret_page()->insert_page()->flush_dcache_page()

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 17/17] mm: Distinguish VMalloc pages
  2018-05-23  9:25             ` Michal Hocko
@ 2018-05-23  9:28               ` Andrey Ryabinin
  2018-05-23  9:52                 ` Michal Hocko
  0 siblings, 1 reply; 32+ messages in thread
From: Andrey Ryabinin @ 2018-05-23  9:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Matthew Wilcox, linux-mm, Matthew Wilcox, Andrew Morton,
	Kirill A . Shutemov, Christoph Lameter, Lai Jiangshan,
	Pekka Enberg, Vlastimil Babka, Dave Hansen,
	Jérôme Glisse



On 05/23/2018 12:25 PM, Michal Hocko wrote:
> On Wed 23-05-18 12:14:10, Andrey Ryabinin wrote:
>>
>>
>> On 05/23/2018 09:34 AM, Michal Hocko wrote:
>>> On Tue 22-05-18 22:57:34, Andrey Ryabinin wrote:
>>>>
>>>>
>>>> On 05/22/2018 08:58 PM, Matthew Wilcox wrote:
>>>>> On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote:
>>>>>> On 05/18/2018 10:45 PM, Matthew Wilcox wrote:
>>>>>>> From: Matthew Wilcox <mawilcox@microsoft.com>
>>>>>>>
>>>>>>> For diagnosing various performance and memory-leak problems, it is helpful
>>>>>>> to be able to distinguish pages which are in use as VMalloc pages.
>>>>>>> Unfortunately, we cannot use the page_type field in struct page, as
>>>>>>> this is in use for mapcount by some drivers which map vmalloced pages
>>>>>>> to userspace.
>>>>>>>
>>>>>>> Use a special page->mapping value to distinguish VMalloc pages from
>>>>>>> other kinds of pages.  Also record a pointer to the vm_struct and the
>>>>>>> offset within the area in struct page to help reconstruct exactly what
>>>>>>> this page is being used for.
>>>>>>
>>>>>> This seems useless. page->vm_area and page->vm_offset are never used.
>>>>>> There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems',
>>>>>> and no explanation how is it can be used in current form.
>>>>>
>>>>> Right now, it's by-hand.  tools/vm/page-types.c will tell you which pages
>>>>> are allocated to VMalloc.  Many people use kernel debuggers, crashdumps
>>>>> and similar to examine the kernel's memory.  Leaving these breadcrumbs
>>>>> is helpful, and those fields simply weren't in use before.
>>>>>
>>>>>> Also, this patch breaks code like this:
>>>>>> 	if (mapping = page_mapping(page))
>>>>>> 		// access mapping
>>>>>
>>>>> Example of broken code, please?  Pages allocated from the page allocator
>>>>> with alloc_page() come with page->mapping == NULL.  This code snippet
>>>>> would not have granted access to vmalloc pages before.
>>>>>
>>>>
>>>> Some implementation of the flush_dcache_page(), also set_page_dirty() can be called
>>>> on userspace-mapped vmalloc pages during unmap - zap_pte_range() -> set_page_dirty()
>>>
>>> Do you have any specific example?
>>
>> git grep -e remap_vmalloc_range -e vmalloc_user
>>
>> But that's not all, vmalloc*() + vmalloc_to_page() + vm_insert_page() are another candidates.
> 
> Thanks for the pointer. I was not aware of remap_vmalloc_range.
>>
>>> Why would anybody map vmalloc pages to the userspace?
>>
>> To have shared memory between usespace and the kernel.
> 
> OK, so the point seems to be to share large physically contiguous memory
> with userspace.
> 

Not physically, but virtually contiguous.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 17/17] mm: Distinguish VMalloc pages
  2018-05-23  9:28               ` Andrey Ryabinin
@ 2018-05-23  9:52                 ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2018-05-23  9:52 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Matthew Wilcox, linux-mm, Matthew Wilcox, Andrew Morton,
	Kirill A . Shutemov, Christoph Lameter, Lai Jiangshan,
	Pekka Enberg, Vlastimil Babka, Dave Hansen,
	Jérôme Glisse

On Wed 23-05-18 12:28:10, Andrey Ryabinin wrote:
> On 05/23/2018 12:25 PM, Michal Hocko wrote:
> > OK, so the point seems to be to share large physically contiguous memory
> > with userspace.
> > 
> 
> Not physically, but virtually contiguous.

Ble, you are right! That's what I meant...

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, back to index

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-18 19:45 [PATCH v6 00/17] Rearrange struct page Matthew Wilcox
2018-05-18 19:45 ` [PATCH v6 01/17] s390: Use _refcount for pgtables Matthew Wilcox
2018-05-18 19:45 ` [PATCH v6 02/17] mm: Split page_type out from _mapcount Matthew Wilcox
2018-05-18 19:45 ` [PATCH v6 03/17] mm: Mark pages in use for page tables Matthew Wilcox
2018-05-18 19:45 ` [PATCH v6 04/17] mm: Switch s_mem and slab_cache in struct page Matthew Wilcox
2018-05-18 19:45 ` [PATCH v6 05/17] mm: Move 'private' union within " Matthew Wilcox
2018-05-18 19:45 ` [PATCH v6 06/17] mm: Move _refcount out of struct page union Matthew Wilcox
2018-05-18 19:45 ` [PATCH v6 07/17] mm: Combine first three unions in struct page Matthew Wilcox
2018-05-18 19:45 ` [PATCH v6 08/17] mm: Use page->deferred_list Matthew Wilcox
2018-05-18 19:45 ` [PATCH v6 09/17] mm: Move lru union within struct page Matthew Wilcox
2018-05-18 19:45 ` [PATCH v6 10/17] mm: Combine LRU and main union in " Matthew Wilcox
2018-05-18 19:45 ` [PATCH v6 11/17] mm: Improve struct page documentation Matthew Wilcox
2018-05-18 19:45 ` [PATCH v6 12/17] mm: Add pt_mm to struct page Matthew Wilcox
2018-05-18 19:45 ` [PATCH v6 13/17] mm: Add hmm_data " Matthew Wilcox
2018-05-18 19:45 ` [PATCH v6 14/17] slab,slub: Remove rcu_head size checks Matthew Wilcox
2018-05-18 19:45 ` [PATCH v6 15/17] slub: Remove kmem_cache->reserved Matthew Wilcox
2018-05-18 19:45 ` [PATCH v6 16/17] slub: Remove 'reserved' file from sysfs Matthew Wilcox
2018-05-18 19:45 ` [PATCH v6 17/17] mm: Distinguish VMalloc pages Matthew Wilcox
2018-05-22 16:10   ` Andrey Ryabinin
2018-05-22 17:58     ` Matthew Wilcox
2018-05-22 19:49       ` Andrew Morton
2018-05-22 19:57       ` Andrey Ryabinin
2018-05-22 20:19         ` Matthew Wilcox
2018-05-22 20:48           ` Andrew Morton
2018-05-22 21:45             ` Matthew Wilcox
2018-05-22 23:02               ` Andrew Morton
2018-05-23  6:36           ` Michal Hocko
2018-05-23  6:34         ` Michal Hocko
2018-05-23  9:14           ` Andrey Ryabinin
2018-05-23  9:25             ` Michal Hocko
2018-05-23  9:28               ` Andrey Ryabinin
2018-05-23  9:52                 ` Michal Hocko

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git