* [PATCH v4 00/16] Rearrange struct page
@ 2018-04-30 20:22 Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 01/16] s390: Use _refcount for pgtables Matthew Wilcox
` (15 more replies)
0 siblings, 16 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
Dave Hansen, Jérôme Glisse
From: Matthew Wilcox <mawilcox@microsoft.com>
As presented at LSFMM last week, this patch-set rearranges struct page
to give more usable space to users who have allocated a struct page for
their own purposes. For a graphical view of before-and-after, see the first
two tabs of https://docs.google.com/spreadsheets/d/1tvCszs_7FXrjei9_mtFiKV6nW1FLnYyvPvW-qNZhdog/edit?usp=sharing
Highlights:
- slub's counters no longer share space with _refcount.
- slub's freelist+counters are now naturally dword aligned.
- It's now more obvious what fields in struct page are used by which
owners (some owners still take advantage of the union aliasing).
- deferred_list now really exists in struct page instead of just a
comment.
- slub loses a parameter to a lot of functions.
- Several hidden uses of struct page are now documented in code.
Changes v3 -> v4:
- Added acks/reviews from Kirill & Randy
- Removed call to page_mapcount_reset from slub since it no longer uses
mapcount union.
- Add pt_mm and hmm_data to struct page
Matthew Wilcox (16):
s390: Use _refcount for pgtables
mm: Split page_type out from _mapcount
mm: Mark pages in use for page tables
mm: Switch s_mem and slab_cache in struct page
mm: Move 'private' union within struct page
mm: Move _refcount out of struct page union
slub: Remove page->counters
mm: Combine first three unions in struct page
mm: Use page->deferred_list
mm: Move lru union within struct page
mm: Combine first two unions in struct page
mm: Improve struct page documentation
mm: Add pt_mm to struct page
mm: Add hmm_data to struct page
slab,slub: Remove rcu_head size checks
slub: Remove kmem_cache->reserved
arch/s390/mm/pgalloc.c | 21 ++-
arch/x86/mm/pgtable.c | 5 +-
fs/proc/page.c | 2 +
include/linux/hmm.h | 8 +-
include/linux/mm.h | 2 +
include/linux/mm_types.h | 218 ++++++++++++-------------
include/linux/page-flags.h | 51 +++---
include/linux/slub_def.h | 1 -
include/uapi/linux/kernel-page-flags.h | 2 +-
kernel/crash_core.c | 1 +
mm/huge_memory.c | 7 +-
mm/page_alloc.c | 17 +-
mm/slab.c | 2 -
mm/slub.c | 138 ++++++----------
scripts/tags.sh | 6 +-
tools/vm/page-types.c | 1 +
16 files changed, 222 insertions(+), 260 deletions(-)
--
2.17.0
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 01/16] s390: Use _refcount for pgtables
2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 02/16] mm: Split page_type out from _mapcount Matthew Wilcox
` (14 subsequent siblings)
15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
Dave Hansen, Jérôme Glisse
From: Matthew Wilcox <mawilcox@microsoft.com>
s390 borrows the storage used for _mapcount in struct page in order to
account whether the bottom or top half is being used for 2kB page
tables. I want to use that for something else, so use the top byte of
_refcount instead of the bottom byte of _mapcount. _refcount may
temporarily be incremented by other CPUs that see a stale pointer to
this page in the page cache, but each CPU can only increment it by one,
and there are no systems with 2^24 CPUs today, so they will not change
the upper byte of _refcount. We do have to be a little careful not to
lose any of their writes (as they will subsequently decrement the
counter).
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
arch/s390/mm/pgalloc.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 562f72955956..84bd6329a88d 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -190,14 +190,15 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
if (!list_empty(&mm->context.pgtable_list)) {
page = list_first_entry(&mm->context.pgtable_list,
struct page, lru);
- mask = atomic_read(&page->_mapcount);
+ mask = atomic_read(&page->_refcount) >> 24;
mask = (mask | (mask >> 4)) & 3;
if (mask != 3) {
table = (unsigned long *) page_to_phys(page);
bit = mask & 1; /* =1 -> second 2K */
if (bit)
table += PTRS_PER_PTE;
- atomic_xor_bits(&page->_mapcount, 1U << bit);
+ atomic_xor_bits(&page->_refcount,
+ 1U << (bit + 24));
list_del(&page->lru);
}
}
@@ -218,12 +219,12 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
table = (unsigned long *) page_to_phys(page);
if (mm_alloc_pgste(mm)) {
/* Return 4K page table with PGSTEs */
- atomic_set(&page->_mapcount, 3);
+ atomic_xor_bits(&page->_refcount, 3 << 24);
memset64((u64 *)table, _PAGE_INVALID, PTRS_PER_PTE);
memset64((u64 *)table + PTRS_PER_PTE, 0, PTRS_PER_PTE);
} else {
/* Return the first 2K fragment of the page */
- atomic_set(&page->_mapcount, 1);
+ atomic_xor_bits(&page->_refcount, 1 << 24);
memset64((u64 *)table, _PAGE_INVALID, 2 * PTRS_PER_PTE);
spin_lock_bh(&mm->context.lock);
list_add(&page->lru, &mm->context.pgtable_list);
@@ -242,7 +243,8 @@ void page_table_free(struct mm_struct *mm, unsigned long *table)
/* Free 2K page table fragment of a 4K page */
bit = (__pa(table) & ~PAGE_MASK)/(PTRS_PER_PTE*sizeof(pte_t));
spin_lock_bh(&mm->context.lock);
- mask = atomic_xor_bits(&page->_mapcount, 1U << bit);
+ mask = atomic_xor_bits(&page->_refcount, 1U << (bit + 24));
+ mask >>= 24;
if (mask & 3)
list_add(&page->lru, &mm->context.pgtable_list);
else
@@ -253,7 +255,6 @@ void page_table_free(struct mm_struct *mm, unsigned long *table)
}
pgtable_page_dtor(page);
- atomic_set(&page->_mapcount, -1);
__free_page(page);
}
@@ -274,7 +275,8 @@ void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table,
}
bit = (__pa(table) & ~PAGE_MASK) / (PTRS_PER_PTE*sizeof(pte_t));
spin_lock_bh(&mm->context.lock);
- mask = atomic_xor_bits(&page->_mapcount, 0x11U << bit);
+ mask = atomic_xor_bits(&page->_refcount, 0x11U << (bit + 24));
+ mask >>= 24;
if (mask & 3)
list_add_tail(&page->lru, &mm->context.pgtable_list);
else
@@ -296,12 +298,13 @@ static void __tlb_remove_table(void *_table)
break;
case 1: /* lower 2K of a 4K page table */
case 2: /* higher 2K of a 4K page table */
- if (atomic_xor_bits(&page->_mapcount, mask << 4) != 0)
+ mask = atomic_xor_bits(&page->_refcount, mask << (4 + 24));
+ mask >>= 24;
+ if (mask != 0)
break;
/* fallthrough */
case 3: /* 4K page table with pgstes */
pgtable_page_dtor(page);
- atomic_set(&page->_mapcount, -1);
__free_page(page);
break;
}
--
2.17.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v4 02/16] mm: Split page_type out from _mapcount
2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 01/16] s390: Use _refcount for pgtables Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 03/16] mm: Mark pages in use for page tables Matthew Wilcox
` (13 subsequent siblings)
15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
Dave Hansen, Jérôme Glisse
From: Matthew Wilcox <mawilcox@microsoft.com>
We're already using a union of many fields here, so stop abusing the
_mapcount and make page_type its own field. That implies renaming some
of the machinery that creates PageBuddy, PageBalloon and PageKmemcg;
bring back the PG_buddy, PG_balloon and PG_kmemcg names.
As suggested by Kirill, make page_type a bitmask. Because it starts out
life as -1 (thanks to sharing the storage with _mapcount), setting a
page flag means clearing the appropriate bit. This gives us space for
probably twenty or so extra bits (depending how paranoid we want to be
about _mapcount underflow).
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
include/linux/mm_types.h | 13 ++++++-----
include/linux/page-flags.h | 45 ++++++++++++++++++++++----------------
kernel/crash_core.c | 1 +
mm/page_alloc.c | 13 +++++------
scripts/tags.sh | 6 ++---
5 files changed, 43 insertions(+), 35 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 21612347d311..41828fb34860 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -96,6 +96,14 @@ struct page {
};
union {
+ /*
+ * If the page is neither PageSlab nor mappable to userspace,
+ * the value stored here may help determine what this page
+ * is used for. See page-flags.h for a list of page types
+ * which are currently stored here.
+ */
+ unsigned int page_type;
+
_slub_counter_t counters;
unsigned int active; /* SLAB */
struct { /* SLUB */
@@ -109,11 +117,6 @@ struct page {
/*
* Count of ptes mapped in mms, to show when
* page is mapped & limit reverse map searches.
- *
- * Extra information about page type may be
- * stored here for pages that are never mapped,
- * in which case the value MUST BE <= -2.
- * See page-flags.h for more details.
*/
atomic_t _mapcount;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e34a27727b9a..8c25b28a35aa 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -642,49 +642,56 @@ PAGEFLAG_FALSE(DoubleMap)
#endif
/*
- * For pages that are never mapped to userspace, page->mapcount may be
- * used for storing extra information about page type. Any value used
- * for this purpose must be <= -2, but it's better start not too close
- * to -2 so that an underflow of the page_mapcount() won't be mistaken
- * for a special page.
+ * For pages that are never mapped to userspace (and aren't PageSlab),
+ * page_type may be used. Because it is initialised to -1, we invert the
+ * sense of the bit, so __SetPageFoo *clears* the bit used for PageFoo, and
+ * __ClearPageFoo *sets* the bit used for PageFoo. We reserve a few high and
+ * low bits so that an underflow or overflow of page_mapcount() won't be
+ * mistaken for a page type value.
*/
-#define PAGE_MAPCOUNT_OPS(uname, lname) \
+
+#define PAGE_TYPE_BASE 0xf0000000
+/* Reserve 0x0000007f to catch underflows of page_mapcount */
+#define PG_buddy 0x00000080
+#define PG_balloon 0x00000100
+#define PG_kmemcg 0x00000200
+
+#define PageType(page, flag) \
+ ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
+
+#define PAGE_TYPE_OPS(uname, lname) \
static __always_inline int Page##uname(struct page *page) \
{ \
- return atomic_read(&page->_mapcount) == \
- PAGE_##lname##_MAPCOUNT_VALUE; \
+ return PageType(page, PG_##lname); \
} \
static __always_inline void __SetPage##uname(struct page *page) \
{ \
- VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page); \
- atomic_set(&page->_mapcount, PAGE_##lname##_MAPCOUNT_VALUE); \
+ VM_BUG_ON_PAGE(!PageType(page, 0), page); \
+ page->page_type &= ~PG_##lname; \
} \
static __always_inline void __ClearPage##uname(struct page *page) \
{ \
VM_BUG_ON_PAGE(!Page##uname(page), page); \
- atomic_set(&page->_mapcount, -1); \
+ page->page_type |= PG_##lname; \
}
/*
- * PageBuddy() indicate that the page is free and in the buddy system
+ * PageBuddy() indicates that the page is free and in the buddy system
* (see mm/page_alloc.c).
*/
-#define PAGE_BUDDY_MAPCOUNT_VALUE (-128)
-PAGE_MAPCOUNT_OPS(Buddy, BUDDY)
+PAGE_TYPE_OPS(Buddy, buddy)
/*
- * PageBalloon() is set on pages that are on the balloon page list
+ * PageBalloon() is true for pages that are on the balloon page list
* (see mm/balloon_compaction.c).
*/
-#define PAGE_BALLOON_MAPCOUNT_VALUE (-256)
-PAGE_MAPCOUNT_OPS(Balloon, BALLOON)
+PAGE_TYPE_OPS(Balloon, balloon)
/*
* If kmemcg is enabled, the buddy allocator will set PageKmemcg() on
* pages allocated with __GFP_ACCOUNT. It gets cleared on page free.
*/
-#define PAGE_KMEMCG_MAPCOUNT_VALUE (-512)
-PAGE_MAPCOUNT_OPS(Kmemcg, KMEMCG)
+PAGE_TYPE_OPS(Kmemcg, kmemcg)
extern bool is_free_buddy_page(struct page *page);
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index f7674d676889..b66aced5e8c2 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -460,6 +460,7 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_NUMBER(PG_hwpoison);
#endif
VMCOREINFO_NUMBER(PG_head_mask);
+#define PAGE_BUDDY_MAPCOUNT_VALUE (~PG_buddy)
VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
#ifdef CONFIG_HUGETLB_PAGE
VMCOREINFO_NUMBER(HUGETLB_PAGE_DTOR);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7eebd6925b10..88e817d7ccef 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -706,16 +706,14 @@ static inline void rmv_page_order(struct page *page)
/*
* This function checks whether a page is free && is the buddy
- * we can do coalesce a page and its buddy if
+ * we can coalesce a page and its buddy if
* (a) the buddy is not in a hole (check before calling!) &&
* (b) the buddy is in the buddy system &&
* (c) a page and its buddy have the same order &&
* (d) a page and its buddy are in the same zone.
*
- * For recording whether a page is in the buddy system, we set ->_mapcount
- * PAGE_BUDDY_MAPCOUNT_VALUE.
- * Setting, clearing, and testing _mapcount PAGE_BUDDY_MAPCOUNT_VALUE is
- * serialized by zone->lock.
+ * For recording whether a page is in the buddy system, we set PageBuddy.
+ * Setting, clearing, and testing PageBuddy is serialized by zone->lock.
*
* For recording page's order, we use page_private(page).
*/
@@ -760,9 +758,8 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
* as necessary, plus some accounting needed to play nicely with other
* parts of the VM system.
* At each level, we keep a list of pages, which are heads of continuous
- * free pages of length of (1 << order) and marked with _mapcount
- * PAGE_BUDDY_MAPCOUNT_VALUE. Page's order is recorded in page_private(page)
- * field.
+ * free pages of length of (1 << order) and marked with PageBuddy.
+ * Page's order is recorded in page_private(page) field.
* So when we are allocating or freeing one, we can derive the state of the
* other. That is, if we allocate a small block, and both were
* free, the remainder of the region must be split into blocks.
diff --git a/scripts/tags.sh b/scripts/tags.sh
index 78e546ff689c..8c3ae36d4ea8 100755
--- a/scripts/tags.sh
+++ b/scripts/tags.sh
@@ -188,9 +188,9 @@ regex_c=(
'/\<CLEARPAGEFLAG_NOOP(\([[:alnum:]_]*\).*/ClearPage\1/'
'/\<__CLEARPAGEFLAG_NOOP(\([[:alnum:]_]*\).*/__ClearPage\1/'
'/\<TESTCLEARFLAG_FALSE(\([[:alnum:]_]*\).*/TestClearPage\1/'
- '/^PAGE_MAPCOUNT_OPS(\([[:alnum:]_]*\).*/Page\1/'
- '/^PAGE_MAPCOUNT_OPS(\([[:alnum:]_]*\).*/__SetPage\1/'
- '/^PAGE_MAPCOUNT_OPS(\([[:alnum:]_]*\).*/__ClearPage\1/'
+ '/^PAGE_TYPE_OPS(\([[:alnum:]_]*\).*/Page\1/'
+ '/^PAGE_TYPE_OPS(\([[:alnum:]_]*\).*/__SetPage\1/'
+ '/^PAGE_TYPE_OPS(\([[:alnum:]_]*\).*/__ClearPage\1/'
'/^TASK_PFA_TEST([^,]*, *\([[:alnum:]_]*\))/task_\1/'
'/^TASK_PFA_SET([^,]*, *\([[:alnum:]_]*\))/task_set_\1/'
'/^TASK_PFA_CLEAR([^,]*, *\([[:alnum:]_]*\))/task_clear_\1/'
--
2.17.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v4 03/16] mm: Mark pages in use for page tables
2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 01/16] s390: Use _refcount for pgtables Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 02/16] mm: Split page_type out from _mapcount Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 04/16] mm: Switch s_mem and slab_cache in struct page Matthew Wilcox
` (12 subsequent siblings)
15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
Dave Hansen, Jérôme Glisse
From: Matthew Wilcox <mawilcox@microsoft.com>
Define a new PageTable bit in the page_type and use it to mark pages in
use as page tables. This can be helpful when debugging crashdumps or
analysing memory fragmentation. Add a KPF flag to report these pages
to userspace and update page-types.c to interpret that flag.
Note that only pages currently accounted as NR_PAGETABLES are tracked
as PageTable; this does not include pgd/p4d/pud/pmd pages. Those will
be the subject of a later patch.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
fs/proc/page.c | 2 ++
include/linux/mm.h | 2 ++
include/linux/page-flags.h | 6 ++++++
include/uapi/linux/kernel-page-flags.h | 2 +-
tools/vm/page-types.c | 1 +
5 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/fs/proc/page.c b/fs/proc/page.c
index 1491918a33c3..792c78a49174 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -154,6 +154,8 @@ u64 stable_page_flags(struct page *page)
if (PageBalloon(page))
u |= 1 << KPF_BALLOON;
+ if (PageTable(page))
+ u |= 1 << KPF_PGTABLE;
if (page_is_idle(page))
u |= 1 << KPF_IDLE;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 974e8f8ffe03..5c6069219425 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1819,6 +1819,7 @@ static inline bool pgtable_page_ctor(struct page *page)
{
if (!ptlock_init(page))
return false;
+ __SetPageTable(page);
inc_zone_page_state(page, NR_PAGETABLE);
return true;
}
@@ -1826,6 +1827,7 @@ static inline bool pgtable_page_ctor(struct page *page)
static inline void pgtable_page_dtor(struct page *page)
{
pte_lock_deinit(page);
+ __ClearPageTable(page);
dec_zone_page_state(page, NR_PAGETABLE);
}
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 8c25b28a35aa..901943e4754b 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -655,6 +655,7 @@ PAGEFLAG_FALSE(DoubleMap)
#define PG_buddy 0x00000080
#define PG_balloon 0x00000100
#define PG_kmemcg 0x00000200
+#define PG_table 0x00000400
#define PageType(page, flag) \
((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
@@ -693,6 +694,11 @@ PAGE_TYPE_OPS(Balloon, balloon)
*/
PAGE_TYPE_OPS(Kmemcg, kmemcg)
+/*
+ * Marks pages in use as page tables.
+ */
+PAGE_TYPE_OPS(Table, table)
+
extern bool is_free_buddy_page(struct page *page);
__PAGEFLAG(Isolated, isolated, PF_ANY);
diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h
index fa139841ec18..21b9113c69da 100644
--- a/include/uapi/linux/kernel-page-flags.h
+++ b/include/uapi/linux/kernel-page-flags.h
@@ -35,6 +35,6 @@
#define KPF_BALLOON 23
#define KPF_ZERO_PAGE 24
#define KPF_IDLE 25
-
+#define KPF_PGTABLE 26
#endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
index a8783f48f77f..cce853dca691 100644
--- a/tools/vm/page-types.c
+++ b/tools/vm/page-types.c
@@ -131,6 +131,7 @@ static const char * const page_flag_names[] = {
[KPF_KSM] = "x:ksm",
[KPF_THP] = "t:thp",
[KPF_BALLOON] = "o:balloon",
+ [KPF_PGTABLE] = "g:pgtable",
[KPF_ZERO_PAGE] = "z:zero_page",
[KPF_IDLE] = "i:idle_page",
--
2.17.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v4 04/16] mm: Switch s_mem and slab_cache in struct page
2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
` (2 preceding siblings ...)
2018-04-30 20:22 ` [PATCH v4 03/16] mm: Mark pages in use for page tables Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 05/16] mm: Move 'private' union within " Matthew Wilcox
` (11 subsequent siblings)
15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
Dave Hansen, Jérôme Glisse
From: Matthew Wilcox <mawilcox@microsoft.com>
This will allow us to store slub's counters in the same bits as slab's
s_mem. slub now needs to set page->mapping to NULL as it frees the page,
just like slab does.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
include/linux/mm_types.h | 4 ++--
mm/slub.c | 1 +
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 41828fb34860..e97a310a6abe 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -83,7 +83,7 @@ struct page {
/* See page-flags.h for the definition of PAGE_MAPPING_FLAGS */
struct address_space *mapping;
- void *s_mem; /* slab first object */
+ struct kmem_cache *slab_cache; /* SL[AU]B: Pointer to slab */
atomic_t compound_mapcount; /* first tail page */
/* page_deferred_list().next -- second tail page */
};
@@ -194,7 +194,7 @@ struct page {
spinlock_t ptl;
#endif
#endif
- struct kmem_cache *slab_cache; /* SL[AU]B: Pointer to slab */
+ void *s_mem; /* slab first object */
};
#ifdef CONFIG_MEMCG
diff --git a/mm/slub.c b/mm/slub.c
index e938184ac847..7fc13c46e975 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1690,6 +1690,7 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
__ClearPageSlab(page);
page_mapcount_reset(page);
+ page->mapping = NULL;
if (current->reclaim_state)
current->reclaim_state->reclaimed_slab += pages;
memcg_uncharge_slab(page, order, s);
--
2.17.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v4 05/16] mm: Move 'private' union within struct page
2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
` (3 preceding siblings ...)
2018-04-30 20:22 ` [PATCH v4 04/16] mm: Switch s_mem and slab_cache in struct page Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 06/16] mm: Move _refcount out of struct page union Matthew Wilcox
` (10 subsequent siblings)
15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
Dave Hansen, Jérôme Glisse
From: Matthew Wilcox <mawilcox@microsoft.com>
By moving page->private to the fourth word of struct page, we can put
the SLUB counters in the same word as SLAB's s_mem and still do the
cmpxchg_double trick. Now the SLUB counters no longer overlap with the
mapcount or refcount so we can drop the call to page_mapcount_reset().
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
include/linux/mm_types.h | 56 ++++++++++++++++++----------------------
mm/slub.c | 1 -
2 files changed, 25 insertions(+), 32 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index e97a310a6abe..23378a789af4 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -65,15 +65,9 @@ struct hmm;
*/
#ifdef CONFIG_HAVE_ALIGNED_STRUCT_PAGE
#define _struct_page_alignment __aligned(2 * sizeof(unsigned long))
-#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE)
-#define _slub_counter_t unsigned long
#else
-#define _slub_counter_t unsigned int
-#endif
-#else /* !CONFIG_HAVE_ALIGNED_STRUCT_PAGE */
#define _struct_page_alignment
-#define _slub_counter_t unsigned int
-#endif /* !CONFIG_HAVE_ALIGNED_STRUCT_PAGE */
+#endif
struct page {
/* First double word block */
@@ -95,6 +89,30 @@ struct page {
/* page_deferred_list().prev -- second tail page */
};
+ union {
+ /*
+ * Mapping-private opaque data:
+ * Usually used for buffer_heads if PagePrivate
+ * Used for swp_entry_t if PageSwapCache
+ * Indicates order in the buddy system if PageBuddy
+ */
+ unsigned long private;
+#if USE_SPLIT_PTE_PTLOCKS
+#if ALLOC_SPLIT_PTLOCKS
+ spinlock_t *ptl;
+#else
+ spinlock_t ptl;
+#endif
+#endif
+ void *s_mem; /* slab first object */
+ unsigned long counters; /* SLUB */
+ struct { /* SLUB */
+ unsigned inuse:16;
+ unsigned objects:15;
+ unsigned frozen:1;
+ };
+ };
+
union {
/*
* If the page is neither PageSlab nor mappable to userspace,
@@ -104,13 +122,7 @@ struct page {
*/
unsigned int page_type;
- _slub_counter_t counters;
unsigned int active; /* SLAB */
- struct { /* SLUB */
- unsigned inuse:16;
- unsigned objects:15;
- unsigned frozen:1;
- };
int units; /* SLOB */
struct { /* Page cache */
@@ -179,24 +191,6 @@ struct page {
#endif
};
- union {
- /*
- * Mapping-private opaque data:
- * Usually used for buffer_heads if PagePrivate
- * Used for swp_entry_t if PageSwapCache
- * Indicates order in the buddy system if PageBuddy
- */
- unsigned long private;
-#if USE_SPLIT_PTE_PTLOCKS
-#if ALLOC_SPLIT_PTLOCKS
- spinlock_t *ptl;
-#else
- spinlock_t ptl;
-#endif
-#endif
- void *s_mem; /* slab first object */
- };
-
#ifdef CONFIG_MEMCG
struct mem_cgroup *mem_cgroup;
#endif
diff --git a/mm/slub.c b/mm/slub.c
index 7fc13c46e975..0b4b58740ed8 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1689,7 +1689,6 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
__ClearPageSlabPfmemalloc(page);
__ClearPageSlab(page);
- page_mapcount_reset(page);
page->mapping = NULL;
if (current->reclaim_state)
current->reclaim_state->reclaimed_slab += pages;
--
2.17.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v4 06/16] mm: Move _refcount out of struct page union
2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
` (4 preceding siblings ...)
2018-04-30 20:22 ` [PATCH v4 05/16] mm: Move 'private' union within " Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 07/16] slub: Remove page->counters Matthew Wilcox
` (9 subsequent siblings)
15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
Dave Hansen, Jérôme Glisse
From: Matthew Wilcox <mawilcox@microsoft.com>
Keeping the refcount in the union only encourages people to put
something else in the union which will overlap with _refcount and
eventually explode messily. pahole reports no fields change location.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
include/linux/mm_types.h | 25 ++++++++++---------------
1 file changed, 10 insertions(+), 15 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 23378a789af4..9828cd170251 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -113,7 +113,13 @@ struct page {
};
};
- union {
+ union { /* This union is 4 bytes in size. */
+ /*
+ * If the page can be mapped to userspace, encodes the number
+ * of times this page is referenced by a page table.
+ */
+ atomic_t _mapcount;
+
/*
* If the page is neither PageSlab nor mappable to userspace,
* the value stored here may help determine what this page
@@ -124,22 +130,11 @@ struct page {
unsigned int active; /* SLAB */
int units; /* SLOB */
-
- struct { /* Page cache */
- /*
- * Count of ptes mapped in mms, to show when
- * page is mapped & limit reverse map searches.
- */
- atomic_t _mapcount;
-
- /*
- * Usage count, *USE WRAPPER FUNCTION* when manual
- * accounting. See page_ref.h
- */
- atomic_t _refcount;
- };
};
+ /* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
+ atomic_t _refcount;
+
/*
* WARNING: bit 0 of the first word encode PageTail(). That means
* the rest users of the storage space MUST NOT use the bit to
--
2.17.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v4 07/16] slub: Remove page->counters
2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
` (5 preceding siblings ...)
2018-04-30 20:22 ` [PATCH v4 06/16] mm: Move _refcount out of struct page union Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
2018-05-01 16:48 ` Christopher Lameter
2018-04-30 20:22 ` [PATCH v4 08/16] mm: Combine first three unions in struct page Matthew Wilcox
` (8 subsequent siblings)
15 siblings, 1 reply; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
Dave Hansen, Jérôme Glisse
From: Matthew Wilcox <mawilcox@microsoft.com>
Use page->private instead, now that these two fields are in the same
location. Include a compile-time assert that the fields don't get out
of sync.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
include/linux/mm_types.h | 1 -
mm/slub.c | 68 ++++++++++++++++++----------------------
2 files changed, 31 insertions(+), 38 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 9828cd170251..04d9dc442029 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -105,7 +105,6 @@ struct page {
#endif
#endif
void *s_mem; /* slab first object */
- unsigned long counters; /* SLUB */
struct { /* SLUB */
unsigned inuse:16;
unsigned objects:15;
diff --git a/mm/slub.c b/mm/slub.c
index 0b4b58740ed8..04625e3dab13 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -55,8 +55,9 @@
* have the ability to do a cmpxchg_double. It only protects the second
* double word in the page struct. Meaning
* A. page->freelist -> List of object free in a page
- * B. page->counters -> Counters of objects
- * C. page->frozen -> frozen state
+ * B. page->inuse -> Number of objects in use
+ * C. page->objects -> Number of objects in page
+ * D. page->frozen -> frozen state
*
* If a slab is frozen then it is exempt from list management. It is not
* on any list. The processor that froze the slab is the one who can
@@ -358,17 +359,10 @@ static __always_inline void slab_unlock(struct page *page)
static inline void set_page_slub_counters(struct page *page, unsigned long counters_new)
{
- struct page tmp;
- tmp.counters = counters_new;
- /*
- * page->counters can cover frozen/inuse/objects as well
- * as page->_refcount. If we assign to ->counters directly
- * we run the risk of losing updates to page->_refcount, so
- * be careful and only assign to the fields we need.
- */
- page->frozen = tmp.frozen;
- page->inuse = tmp.inuse;
- page->objects = tmp.objects;
+ BUILD_BUG_ON(offsetof(struct page, freelist) + sizeof(void *) !=
+ offsetof(struct page, private));
+ BUILD_BUG_ON(offsetof(struct page, freelist) % (2 * sizeof(void *)));
+ page->private = counters_new;
}
/* Interrupts must be disabled (for the fallback code to work right) */
@@ -381,7 +375,7 @@ static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct page *page
#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
if (s->flags & __CMPXCHG_DOUBLE) {
- if (cmpxchg_double(&page->freelist, &page->counters,
+ if (cmpxchg_double(&page->freelist, &page->private,
freelist_old, counters_old,
freelist_new, counters_new))
return true;
@@ -390,7 +384,7 @@ static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct page *page
{
slab_lock(page);
if (page->freelist == freelist_old &&
- page->counters == counters_old) {
+ page->private == counters_old) {
page->freelist = freelist_new;
set_page_slub_counters(page, counters_new);
slab_unlock(page);
@@ -417,7 +411,7 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
if (s->flags & __CMPXCHG_DOUBLE) {
- if (cmpxchg_double(&page->freelist, &page->counters,
+ if (cmpxchg_double(&page->freelist, &page->private,
freelist_old, counters_old,
freelist_new, counters_new))
return true;
@@ -429,7 +423,7 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
local_irq_save(flags);
slab_lock(page);
if (page->freelist == freelist_old &&
- page->counters == counters_old) {
+ page->private == counters_old) {
page->freelist = freelist_new;
set_page_slub_counters(page, counters_new);
slab_unlock(page);
@@ -1787,8 +1781,8 @@ static inline void *acquire_slab(struct kmem_cache *s,
* per cpu allocation list.
*/
freelist = page->freelist;
- counters = page->counters;
- new.counters = counters;
+ counters = page->private;
+ new.private = counters;
*objects = new.objects - new.inuse;
if (mode) {
new.inuse = page->objects;
@@ -1802,7 +1796,7 @@ static inline void *acquire_slab(struct kmem_cache *s,
if (!__cmpxchg_double_slab(s, page,
freelist, counters,
- new.freelist, new.counters,
+ new.freelist, new.private,
"acquire_slab"))
return NULL;
@@ -2049,15 +2043,15 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page,
do {
prior = page->freelist;
- counters = page->counters;
+ counters = page->private;
set_freepointer(s, freelist, prior);
- new.counters = counters;
+ new.private = counters;
new.inuse--;
VM_BUG_ON(!new.frozen);
} while (!__cmpxchg_double_slab(s, page,
prior, counters,
- freelist, new.counters,
+ freelist, new.private,
"drain percpu freelist"));
freelist = nextfree;
@@ -2080,11 +2074,11 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page,
redo:
old.freelist = page->freelist;
- old.counters = page->counters;
+ old.private = page->private;
VM_BUG_ON(!old.frozen);
/* Determine target state of the slab */
- new.counters = old.counters;
+ new.private = old.private;
if (freelist) {
new.inuse--;
set_freepointer(s, freelist, old.freelist);
@@ -2145,8 +2139,8 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page,
l = m;
if (!__cmpxchg_double_slab(s, page,
- old.freelist, old.counters,
- new.freelist, new.counters,
+ old.freelist, old.private,
+ new.freelist, new.private,
"unfreezing slab"))
goto redo;
@@ -2195,17 +2189,17 @@ static void unfreeze_partials(struct kmem_cache *s,
do {
old.freelist = page->freelist;
- old.counters = page->counters;
+ old.private = page->private;
VM_BUG_ON(!old.frozen);
- new.counters = old.counters;
+ new.private = old.private;
new.freelist = old.freelist;
new.frozen = 0;
} while (!__cmpxchg_double_slab(s, page,
- old.freelist, old.counters,
- new.freelist, new.counters,
+ old.freelist, old.private,
+ new.freelist, new.private,
"unfreezing slab"));
if (unlikely(!new.inuse && n->nr_partial >= s->min_partial)) {
@@ -2494,9 +2488,9 @@ static inline void *get_freelist(struct kmem_cache *s, struct page *page)
do {
freelist = page->freelist;
- counters = page->counters;
+ counters = page->private;
- new.counters = counters;
+ new.private = counters;
VM_BUG_ON(!new.frozen);
new.inuse = page->objects;
@@ -2504,7 +2498,7 @@ static inline void *get_freelist(struct kmem_cache *s, struct page *page)
} while (!__cmpxchg_double_slab(s, page,
freelist, counters,
- NULL, new.counters,
+ NULL, new.private,
"get_freelist"));
return freelist;
@@ -2829,9 +2823,9 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
n = NULL;
}
prior = page->freelist;
- counters = page->counters;
+ counters = page->private;
set_freepointer(s, tail, prior);
- new.counters = counters;
+ new.private = counters;
was_frozen = new.frozen;
new.inuse -= cnt;
if ((!new.inuse || !prior) && !was_frozen) {
@@ -2864,7 +2858,7 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
} while (!cmpxchg_double_slab(s, page,
prior, counters,
- head, new.counters,
+ head, new.private,
"__slab_free"));
if (likely(!n)) {
--
2.17.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v4 08/16] mm: Combine first three unions in struct page
2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
` (6 preceding siblings ...)
2018-04-30 20:22 ` [PATCH v4 07/16] slub: Remove page->counters Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 09/16] mm: Use page->deferred_list Matthew Wilcox
` (7 subsequent siblings)
15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
Dave Hansen, Jérôme Glisse
From: Matthew Wilcox <mawilcox@microsoft.com>
By combining these three one-word unions into one three-word union,
we make it easier for users to add their own multi-word fields to struct
page, as well as making it obvious that SLUB needs to keep its double-word
alignment for its freelist & counters.
No field moves position; verified with pahole.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
include/linux/mm_types.h | 65 ++++++++++++++++++++--------------------
1 file changed, 32 insertions(+), 33 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 04d9dc442029..f0ccb699641d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -70,45 +70,44 @@ struct hmm;
#endif
struct page {
- /* First double word block */
unsigned long flags; /* Atomic flags, some possibly
* updated asynchronously */
- union {
- /* See page-flags.h for the definition of PAGE_MAPPING_FLAGS */
- struct address_space *mapping;
-
- struct kmem_cache *slab_cache; /* SL[AU]B: Pointer to slab */
+ union { /* This union is three words (12/24 bytes) in size */
+ struct { /* Page cache and anonymous pages */
+ /* See page-flags.h for PAGE_MAPPING_FLAGS */
+ struct address_space *mapping;
+ pgoff_t index; /* Our offset within mapping. */
+ /**
+ * @private: Mapping-private opaque data.
+ * Usually used for buffer_heads if PagePrivate.
+ * Used for swp_entry_t if PageSwapCache.
+ * Indicates order in the buddy system if PageBuddy.
+ */
+ unsigned long private;
+ };
+ struct { /* slab, slob and slub */
+ struct kmem_cache *slab_cache; /* (slub) */
+ void *freelist; /* first free object (slub) */
+ void *s_mem; /* first object */
+ };
+ struct { /* slub */
+ struct kmem_cache *slub_cache; /* shared with slab */
+ /* Double-word boundary */
+ void *slub_freelist; /* shared with slab */
+ unsigned inuse:16;
+ unsigned objects:15;
+ unsigned frozen:1;
+ };
atomic_t compound_mapcount; /* first tail page */
- /* page_deferred_list().next -- second tail page */
- };
-
- /* Second double word */
- union {
- pgoff_t index; /* Our offset within mapping. */
- void *freelist; /* sl[aou]b first free object */
- /* page_deferred_list().prev -- second tail page */
- };
-
- union {
- /*
- * Mapping-private opaque data:
- * Usually used for buffer_heads if PagePrivate
- * Used for swp_entry_t if PageSwapCache
- * Indicates order in the buddy system if PageBuddy
- */
- unsigned long private;
-#if USE_SPLIT_PTE_PTLOCKS
+ struct list_head deferred_list; /* second tail page */
+ struct { /* Page table pages */
+ unsigned long _pt_pad_2; /* mapping */
+ unsigned long _pt_pad_3;
#if ALLOC_SPLIT_PTLOCKS
- spinlock_t *ptl;
+ spinlock_t *ptl;
#else
- spinlock_t ptl;
-#endif
+ spinlock_t ptl;
#endif
- void *s_mem; /* slab first object */
- struct { /* SLUB */
- unsigned inuse:16;
- unsigned objects:15;
- unsigned frozen:1;
};
};
--
2.17.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v4 09/16] mm: Use page->deferred_list
2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
` (7 preceding siblings ...)
2018-04-30 20:22 ` [PATCH v4 08/16] mm: Combine first three unions in struct page Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 10/16] mm: Move lru union within struct page Matthew Wilcox
` (6 subsequent siblings)
15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
Dave Hansen, Jérôme Glisse
From: Matthew Wilcox <mawilcox@microsoft.com>
Now that we can represent the location of 'deferred_list' in C instead
of comments, make use of that ability.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
mm/huge_memory.c | 7 ++-----
mm/page_alloc.c | 2 +-
2 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a3a1815f8e11..cb0954a6de88 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -483,11 +483,8 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
static inline struct list_head *page_deferred_list(struct page *page)
{
- /*
- * ->lru in the tail pages is occupied by compound_head.
- * Let's use ->mapping + ->index in the second tail page as list_head.
- */
- return (struct list_head *)&page[2].mapping;
+ /* ->lru in the tail pages is occupied by compound_head. */
+ return &page[2].deferred_list;
}
void prep_transhuge_page(struct page *page)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 88e817d7ccef..18720eccbce1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -953,7 +953,7 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
case 2:
/*
* the second tail page: ->mapping is
- * page_deferred_list().next -- ignore value.
+ * deferred_list.next -- ignore value.
*/
break;
default:
--
2.17.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v4 10/16] mm: Move lru union within struct page
2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
` (8 preceding siblings ...)
2018-04-30 20:22 ` [PATCH v4 09/16] mm: Use page->deferred_list Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 11/16] mm: Combine first two unions in " Matthew Wilcox
` (5 subsequent siblings)
15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
Dave Hansen, Jérôme Glisse
From: Matthew Wilcox <mawilcox@microsoft.com>
Since the LRU is two words, this does not affect the double-word
alignment of SLUB's freelist.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
include/linux/mm_types.h | 102 +++++++++++++++++++--------------------
1 file changed, 51 insertions(+), 51 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index f0ccb699641d..935944c7438d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -72,6 +72,57 @@ struct hmm;
struct page {
unsigned long flags; /* Atomic flags, some possibly
* updated asynchronously */
+ /*
+ * WARNING: bit 0 of the first word encode PageTail(). That means
+ * the rest users of the storage space MUST NOT use the bit to
+ * avoid collision and false-positive PageTail().
+ */
+ union {
+ struct list_head lru; /* Pageout list, eg. active_list
+ * protected by zone_lru_lock !
+ * Can be used as a generic list
+ * by the page owner.
+ */
+ struct dev_pagemap *pgmap; /* ZONE_DEVICE pages are never on an
+ * lru or handled by a slab
+ * allocator, this points to the
+ * hosting device page map.
+ */
+ struct { /* slub per cpu partial pages */
+ struct page *next; /* Next partial slab */
+#ifdef CONFIG_64BIT
+ int pages; /* Nr of partial slabs left */
+ int pobjects; /* Approximate # of objects */
+#else
+ short int pages;
+ short int pobjects;
+#endif
+ };
+
+ struct rcu_head rcu_head; /* Used by SLAB
+ * when destroying via RCU
+ */
+ /* Tail pages of compound page */
+ struct {
+ unsigned long compound_head; /* If bit zero is set */
+
+ /* First tail page only */
+ unsigned char compound_dtor;
+ unsigned char compound_order;
+ /* two/six bytes available here */
+ };
+
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
+ struct {
+ unsigned long __pad; /* do not overlay pmd_huge_pte
+ * with compound_head to avoid
+ * possible bit 0 collision.
+ */
+ pgtable_t pmd_huge_pte; /* protected by page->ptl */
+ };
+#endif
+ };
+
union { /* This union is three words (12/24 bytes) in size */
struct { /* Page cache and anonymous pages */
/* See page-flags.h for PAGE_MAPPING_FLAGS */
@@ -133,57 +184,6 @@ struct page {
/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
atomic_t _refcount;
- /*
- * WARNING: bit 0 of the first word encode PageTail(). That means
- * the rest users of the storage space MUST NOT use the bit to
- * avoid collision and false-positive PageTail().
- */
- union {
- struct list_head lru; /* Pageout list, eg. active_list
- * protected by zone_lru_lock !
- * Can be used as a generic list
- * by the page owner.
- */
- struct dev_pagemap *pgmap; /* ZONE_DEVICE pages are never on an
- * lru or handled by a slab
- * allocator, this points to the
- * hosting device page map.
- */
- struct { /* slub per cpu partial pages */
- struct page *next; /* Next partial slab */
-#ifdef CONFIG_64BIT
- int pages; /* Nr of partial slabs left */
- int pobjects; /* Approximate # of objects */
-#else
- short int pages;
- short int pobjects;
-#endif
- };
-
- struct rcu_head rcu_head; /* Used by SLAB
- * when destroying via RCU
- */
- /* Tail pages of compound page */
- struct {
- unsigned long compound_head; /* If bit zero is set */
-
- /* First tail page only */
- unsigned char compound_dtor;
- unsigned char compound_order;
- /* two/six bytes available here */
- };
-
-#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
- struct {
- unsigned long __pad; /* do not overlay pmd_huge_pte
- * with compound_head to avoid
- * possible bit 0 collision.
- */
- pgtable_t pmd_huge_pte; /* protected by page->ptl */
- };
-#endif
- };
-
#ifdef CONFIG_MEMCG
struct mem_cgroup *mem_cgroup;
#endif
--
2.17.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v4 11/16] mm: Combine first two unions in struct page
2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
` (9 preceding siblings ...)
2018-04-30 20:22 ` [PATCH v4 10/16] mm: Move lru union within struct page Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 12/16] mm: Improve struct page documentation Matthew Wilcox
` (4 subsequent siblings)
15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
Dave Hansen, Jérôme Glisse
From: Matthew Wilcox <mawilcox@microsoft.com>
This gives us five words of space in a single union in struct page.
The compound_mapcount moves position (from offset 24 to offset 20)
on 64-bit systems, but that does not seem likely to cause any trouble.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
include/linux/mm_types.h | 92 ++++++++++++++++++----------------------
mm/page_alloc.c | 2 +-
2 files changed, 43 insertions(+), 51 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 935944c7438d..1d1552767a89 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -73,58 +73,19 @@ struct page {
unsigned long flags; /* Atomic flags, some possibly
* updated asynchronously */
/*
- * WARNING: bit 0 of the first word encode PageTail(). That means
- * the rest users of the storage space MUST NOT use the bit to
+ * Five words (20/40 bytes) are available in this union.
+ * WARNING: bit 0 of the first word is used for PageTail(). That
+ * means the other users of this union MUST NOT use the bit to
* avoid collision and false-positive PageTail().
*/
union {
- struct list_head lru; /* Pageout list, eg. active_list
- * protected by zone_lru_lock !
- * Can be used as a generic list
- * by the page owner.
- */
- struct dev_pagemap *pgmap; /* ZONE_DEVICE pages are never on an
- * lru or handled by a slab
- * allocator, this points to the
- * hosting device page map.
- */
- struct { /* slub per cpu partial pages */
- struct page *next; /* Next partial slab */
-#ifdef CONFIG_64BIT
- int pages; /* Nr of partial slabs left */
- int pobjects; /* Approximate # of objects */
-#else
- short int pages;
- short int pobjects;
-#endif
- };
-
- struct rcu_head rcu_head; /* Used by SLAB
- * when destroying via RCU
- */
- /* Tail pages of compound page */
- struct {
- unsigned long compound_head; /* If bit zero is set */
-
- /* First tail page only */
- unsigned char compound_dtor;
- unsigned char compound_order;
- /* two/six bytes available here */
- };
-
-#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
- struct {
- unsigned long __pad; /* do not overlay pmd_huge_pte
- * with compound_head to avoid
- * possible bit 0 collision.
- */
- pgtable_t pmd_huge_pte; /* protected by page->ptl */
- };
-#endif
- };
-
- union { /* This union is three words (12/24 bytes) in size */
struct { /* Page cache and anonymous pages */
+ /**
+ * @lru: Pageout list, eg. active_list protected by
+ * zone_lru_lock. Sometimes used as a generic list
+ * by the page owner.
+ */
+ struct list_head lru;
/* See page-flags.h for PAGE_MAPPING_FLAGS */
struct address_space *mapping;
pgoff_t index; /* Our offset within mapping. */
@@ -137,11 +98,20 @@ struct page {
unsigned long private;
};
struct { /* slab, slob and slub */
+ struct list_head slab_list; /* shared with lru */
struct kmem_cache *slab_cache; /* (slub) */
void *freelist; /* first free object (slub) */
void *s_mem; /* first object */
};
struct { /* slub */
+ struct page *next; /* Next partial slab */
+#ifdef CONFIG_64BIT
+ int pages; /* Nr of partial slabs left */
+ int pobjects; /* Approximate # of objects */
+#else
+ short int pages;
+ short int pobjects;
+#endif
struct kmem_cache *slub_cache; /* shared with slab */
/* Double-word boundary */
void *slub_freelist; /* shared with slab */
@@ -149,9 +119,22 @@ struct page {
unsigned objects:15;
unsigned frozen:1;
};
- atomic_t compound_mapcount; /* first tail page */
- struct list_head deferred_list; /* second tail page */
+ struct { /* Tail pages of compound page */
+ unsigned long compound_head; /* Bit zero is set */
+
+ /* First tail page only */
+ unsigned char compound_dtor;
+ unsigned char compound_order;
+ atomic_t compound_mapcount;
+ };
+ struct { /* Second tail page of compound page */
+ unsigned long _compound_pad_1; /* compound_head */
+ unsigned long _compound_pad_2;
+ struct list_head deferred_list;
+ };
struct { /* Page table pages */
+ unsigned long _pt_pad_1; /* compound_head */
+ pgtable_t pmd_huge_pte; /* protected by page->ptl */
unsigned long _pt_pad_2; /* mapping */
unsigned long _pt_pad_3;
#if ALLOC_SPLIT_PTLOCKS
@@ -160,6 +143,15 @@ struct page {
spinlock_t ptl;
#endif
};
+
+ /** @rcu_head: You can use this to free a page by RCU. */
+ struct rcu_head rcu_head;
+
+ /**
+ * @pgmap: For ZONE_DEVICE pages, this points to the hosting
+ * device page map.
+ */
+ struct dev_pagemap *pgmap;
};
union { /* This union is 4 bytes in size. */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 18720eccbce1..d1e4df7d57bf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -944,7 +944,7 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
}
switch (page - head_page) {
case 1:
- /* the first tail page: ->mapping is compound_mapcount() */
+ /* the first tail page: ->mapping may be compound_mapcount() */
if (unlikely(compound_mapcount(page))) {
bad_page(page, "nonzero compound_mapcount", 0);
goto out;
--
2.17.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v4 12/16] mm: Improve struct page documentation
2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
` (10 preceding siblings ...)
2018-04-30 20:22 ` [PATCH v4 11/16] mm: Combine first two unions in " Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 13/16] mm: Add pt_mm to struct page Matthew Wilcox
` (3 subsequent siblings)
15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
Dave Hansen, Jérôme Glisse
From: Matthew Wilcox <mawilcox@microsoft.com>
Rewrite the documentation to describe what you can use in struct
page rather than what you can't.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
---
include/linux/mm_types.h | 40 +++++++++++++++++++---------------------
1 file changed, 19 insertions(+), 21 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 1d1552767a89..e0e74e91f3e8 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -33,29 +33,27 @@ struct hmm;
* it to keep track of whatever it is we are using the page for at the
* moment. Note that we have no way to track which tasks are using
* a page, though if it is a pagecache page, rmap structures can tell us
- * who is mapping it. If you allocate the page using alloc_pages(), you
- * can use some of the space in struct page for your own purposes.
+ * who is mapping it.
*
- * Pages that were once in the page cache may be found under the RCU lock
- * even after they have been recycled to a different purpose. The page
- * cache reads and writes some of the fields in struct page to pin the
- * page before checking that it's still in the page cache. It is vital
- * that all users of struct page:
- * 1. Use the first word as PageFlags.
- * 2. Clear or preserve bit 0 of page->compound_head. It is used as
- * PageTail for compound pages, and the page cache must not see false
- * positives. Some users put a pointer here (guaranteed to be at least
- * 4-byte aligned), other users avoid using the field altogether.
- * 3. page->_refcount must either not be used, or must be used in such a
- * way that other CPUs temporarily incrementing and then decrementing the
- * refcount does not cause problems. On receiving the page from
- * alloc_pages(), the refcount will be positive.
- * 4. Either preserve page->_mapcount or restore it to -1 before freeing it.
+ * If you allocate the page using alloc_pages(), you can use some of the
+ * space in struct page for your own purposes. The five words in the first
+ * union are available, except for bit 0 of the first word which must be
+ * kept clear. Many users use this word to store a pointer to an object
+ * which is guaranteed to be aligned. If you use the same storage as
+ * page->mapping, you must restore it to NULL before freeing the page.
*
- * If you allocate pages of order > 0, you can use the fields in the struct
- * page associated with each page, but bear in mind that the pages may have
- * been inserted individually into the page cache, so you must use the above
- * four fields in a compatible way for each struct page.
+ * If your page will not be mapped to userspace, you can also use the 4
+ * bytes in the second union, but you must call page_mapcount_reset()
+ * before freeing it.
+ *
+ * If you want to use the refcount field, it must be used in such a way
+ * that other CPUs temporarily incrementing and then decrementing the
+ * refcount does not cause problems. On receiving the page from
+ * alloc_pages(), the refcount will be positive.
+ *
+ * If you allocate pages of order > 0, you can use some of the fields
+ * in each subpage, but you may need to restore some of their values
+ * afterwards.
*
* SLUB uses cmpxchg_double() to atomically update its freelist and
* counters. That requires that freelist & counters be adjacent and
--
2.17.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v4 13/16] mm: Add pt_mm to struct page
2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
` (11 preceding siblings ...)
2018-04-30 20:22 ` [PATCH v4 12/16] mm: Improve struct page documentation Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
2018-05-02 8:12 ` Kirill A. Shutemov
2018-04-30 20:22 ` [PATCH v4 14/16] mm: Add hmm_data " Matthew Wilcox
` (2 subsequent siblings)
15 siblings, 1 reply; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
Dave Hansen, Jérôme Glisse
From: Matthew Wilcox <mawilcox@microsoft.com>
x86 overloads the page->index field to store a pointer to the mm_struct.
Rename this to pt_mm so it's visible to other users.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
arch/x86/mm/pgtable.c | 5 ++---
include/linux/mm_types.h | 2 +-
2 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ffc8c13c50e4..938dbcd46b97 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -114,13 +114,12 @@ static inline void pgd_list_del(pgd_t *pgd)
static void pgd_set_mm(pgd_t *pgd, struct mm_struct *mm)
{
- BUILD_BUG_ON(sizeof(virt_to_page(pgd)->index) < sizeof(mm));
- virt_to_page(pgd)->index = (pgoff_t)mm;
+ virt_to_page(pgd)->pt_mm = mm;
}
struct mm_struct *pgd_page_get_mm(struct page *page)
{
- return (struct mm_struct *)page->index;
+ return page->pt_mm;
}
static void pgd_ctor(struct mm_struct *mm, pgd_t *pgd)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index e0e74e91f3e8..0e6117123737 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -134,7 +134,7 @@ struct page {
unsigned long _pt_pad_1; /* compound_head */
pgtable_t pmd_huge_pte; /* protected by page->ptl */
unsigned long _pt_pad_2; /* mapping */
- unsigned long _pt_pad_3;
+ struct mm_struct *pt_mm;
#if ALLOC_SPLIT_PTLOCKS
spinlock_t *ptl;
#else
--
2.17.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v4 14/16] mm: Add hmm_data to struct page
2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
` (12 preceding siblings ...)
2018-04-30 20:22 ` [PATCH v4 13/16] mm: Add pt_mm to struct page Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 15/16] slab,slub: Remove rcu_head size checks Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 16/16] slub: Remove kmem_cache->reserved Matthew Wilcox
15 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
Dave Hansen, Jérôme Glisse
From: Matthew Wilcox <mawilcox@microsoft.com>
Make hmm_data an explicit member of the struct page union.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
include/linux/hmm.h | 8 ++------
include/linux/mm_types.h | 14 +++++++++-----
2 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 39988924de3a..91c1b2dccbbb 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -522,9 +522,7 @@ void hmm_devmem_remove(struct hmm_devmem *devmem);
static inline void hmm_devmem_page_set_drvdata(struct page *page,
unsigned long data)
{
- unsigned long *drvdata = (unsigned long *)&page->pgmap;
-
- drvdata[1] = data;
+ page->hmm_data = data;
}
/*
@@ -535,9 +533,7 @@ static inline void hmm_devmem_page_set_drvdata(struct page *page,
*/
static inline unsigned long hmm_devmem_page_get_drvdata(const struct page *page)
{
- const unsigned long *drvdata = (const unsigned long *)&page->pgmap;
-
- return drvdata[1];
+ return page->hmm_data;
}
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 0e6117123737..42619e16047f 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -145,11 +145,15 @@ struct page {
/** @rcu_head: You can use this to free a page by RCU. */
struct rcu_head rcu_head;
- /**
- * @pgmap: For ZONE_DEVICE pages, this points to the hosting
- * device page map.
- */
- struct dev_pagemap *pgmap;
+ struct {
+ /**
+ * @pgmap: For ZONE_DEVICE pages, this points to the
+ * hosting device page map.
+ */
+ struct dev_pagemap *pgmap;
+ unsigned long hmm_data;
+ unsigned long _zd_pad_1; /* uses mapping */
+ };
};
union { /* This union is 4 bytes in size. */
--
2.17.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v4 15/16] slab,slub: Remove rcu_head size checks
2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
` (13 preceding siblings ...)
2018-04-30 20:22 ` [PATCH v4 14/16] mm: Add hmm_data " Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
2018-05-01 16:46 ` Christopher Lameter
2018-04-30 20:22 ` [PATCH v4 16/16] slub: Remove kmem_cache->reserved Matthew Wilcox
15 siblings, 1 reply; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
Dave Hansen, Jérôme Glisse
From: Matthew Wilcox <mawilcox@microsoft.com>
rcu_head may now grow larger than list_head without affecting slab or
slub.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
mm/slab.c | 2 --
mm/slub.c | 27 ++-------------------------
2 files changed, 2 insertions(+), 27 deletions(-)
diff --git a/mm/slab.c b/mm/slab.c
index e387a17d6d56..e6ab1327db25 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1235,8 +1235,6 @@ void __init kmem_cache_init(void)
{
int i;
- BUILD_BUG_ON(sizeof(((struct page *)NULL)->lru) <
- sizeof(struct rcu_head));
kmem_cache = &kmem_cache_boot;
if (!IS_ENABLED(CONFIG_NUMA) || num_possible_nodes() == 1)
diff --git a/mm/slub.c b/mm/slub.c
index 04625e3dab13..27cc2956acba 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1690,17 +1690,9 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
__free_pages(page, order);
}
-#define need_reserve_slab_rcu \
- (sizeof(((struct page *)NULL)->lru) < sizeof(struct rcu_head))
-
static void rcu_free_slab(struct rcu_head *h)
{
- struct page *page;
-
- if (need_reserve_slab_rcu)
- page = virt_to_head_page(h);
- else
- page = container_of((struct list_head *)h, struct page, lru);
+ struct page *page = container_of(h, struct page, rcu_head);
__free_slab(page->slab_cache, page);
}
@@ -1708,19 +1700,7 @@ static void rcu_free_slab(struct rcu_head *h)
static void free_slab(struct kmem_cache *s, struct page *page)
{
if (unlikely(s->flags & SLAB_TYPESAFE_BY_RCU)) {
- struct rcu_head *head;
-
- if (need_reserve_slab_rcu) {
- int order = compound_order(page);
- int offset = (PAGE_SIZE << order) - s->reserved;
-
- VM_BUG_ON(s->reserved != sizeof(*head));
- head = page_address(page) + offset;
- } else {
- head = &page->rcu_head;
- }
-
- call_rcu(head, rcu_free_slab);
+ call_rcu(&page->rcu_head, rcu_free_slab);
} else
__free_slab(s, page);
}
@@ -3587,9 +3567,6 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
s->random = get_random_long();
#endif
- if (need_reserve_slab_rcu && (s->flags & SLAB_TYPESAFE_BY_RCU))
- s->reserved = sizeof(struct rcu_head);
-
if (!calculate_sizes(s, -1))
goto error;
if (disable_higher_order_debug) {
--
2.17.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v4 16/16] slub: Remove kmem_cache->reserved
2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
` (14 preceding siblings ...)
2018-04-30 20:22 ` [PATCH v4 15/16] slab,slub: Remove rcu_head size checks Matthew Wilcox
@ 2018-04-30 20:22 ` Matthew Wilcox
2018-05-01 16:43 ` Christopher Lameter
15 siblings, 1 reply; 30+ messages in thread
From: Matthew Wilcox @ 2018-04-30 20:22 UTC (permalink / raw)
To: linux-mm
Cc: Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
Dave Hansen, Jérôme Glisse
From: Matthew Wilcox <mawilcox@microsoft.com>
The reserved field was only used for embedding an rcu_head in the data
structure. With the previous commit, we no longer need it. That lets
us remove the 'reserved' argument to a lot of functions.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
include/linux/slub_def.h | 1 -
mm/slub.c | 41 ++++++++++++++++++++--------------------
2 files changed, 20 insertions(+), 22 deletions(-)
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 3773e26c08c1..09fa2c6f0e68 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -101,7 +101,6 @@ struct kmem_cache {
void (*ctor)(void *);
unsigned int inuse; /* Offset to metadata */
unsigned int align; /* Alignment */
- unsigned int reserved; /* Reserved bytes at the end of slabs */
unsigned int red_left_pad; /* Left redzone padding size */
const char *name; /* Name (only for display!) */
struct list_head list; /* List of slab caches */
diff --git a/mm/slub.c b/mm/slub.c
index 27cc2956acba..01c2183aa3d7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -317,16 +317,16 @@ static inline unsigned int slab_index(void *p, struct kmem_cache *s, void *addr)
return (p - addr) / s->size;
}
-static inline unsigned int order_objects(unsigned int order, unsigned int size, unsigned int reserved)
+static inline unsigned int order_objects(unsigned int order, unsigned int size)
{
- return (((unsigned int)PAGE_SIZE << order) - reserved) / size;
+ return ((unsigned int)PAGE_SIZE << order) / size;
}
static inline struct kmem_cache_order_objects oo_make(unsigned int order,
- unsigned int size, unsigned int reserved)
+ unsigned int size)
{
struct kmem_cache_order_objects x = {
- (order << OO_SHIFT) + order_objects(order, size, reserved)
+ (order << OO_SHIFT) + order_objects(order, size)
};
return x;
@@ -841,7 +841,7 @@ static int slab_pad_check(struct kmem_cache *s, struct page *page)
return 1;
start = page_address(page);
- length = (PAGE_SIZE << compound_order(page)) - s->reserved;
+ length = PAGE_SIZE << compound_order(page);
end = start + length;
remainder = length % s->size;
if (!remainder)
@@ -930,7 +930,7 @@ static int check_slab(struct kmem_cache *s, struct page *page)
return 0;
}
- maxobj = order_objects(compound_order(page), s->size, s->reserved);
+ maxobj = order_objects(compound_order(page), s->size);
if (page->objects > maxobj) {
slab_err(s, page, "objects %u > max %u",
page->objects, maxobj);
@@ -980,7 +980,7 @@ static int on_freelist(struct kmem_cache *s, struct page *page, void *search)
nr++;
}
- max_objects = order_objects(compound_order(page), s->size, s->reserved);
+ max_objects = order_objects(compound_order(page), s->size);
if (max_objects > MAX_OBJS_PER_PAGE)
max_objects = MAX_OBJS_PER_PAGE;
@@ -3197,21 +3197,21 @@ static unsigned int slub_min_objects;
*/
static inline unsigned int slab_order(unsigned int size,
unsigned int min_objects, unsigned int max_order,
- unsigned int fract_leftover, unsigned int reserved)
+ unsigned int fract_leftover)
{
unsigned int min_order = slub_min_order;
unsigned int order;
- if (order_objects(min_order, size, reserved) > MAX_OBJS_PER_PAGE)
+ if (order_objects(min_order, size) > MAX_OBJS_PER_PAGE)
return get_order(size * MAX_OBJS_PER_PAGE) - 1;
- for (order = max(min_order, (unsigned int)get_order(min_objects * size + reserved));
+ for (order = max(min_order, (unsigned int)get_order(min_objects * size));
order <= max_order; order++) {
unsigned int slab_size = (unsigned int)PAGE_SIZE << order;
unsigned int rem;
- rem = (slab_size - reserved) % size;
+ rem = slab_size % size;
if (rem <= slab_size / fract_leftover)
break;
@@ -3220,7 +3220,7 @@ static inline unsigned int slab_order(unsigned int size,
return order;
}
-static inline int calculate_order(unsigned int size, unsigned int reserved)
+static inline int calculate_order(unsigned int size)
{
unsigned int order;
unsigned int min_objects;
@@ -3237,7 +3237,7 @@ static inline int calculate_order(unsigned int size, unsigned int reserved)
min_objects = slub_min_objects;
if (!min_objects)
min_objects = 4 * (fls(nr_cpu_ids) + 1);
- max_objects = order_objects(slub_max_order, size, reserved);
+ max_objects = order_objects(slub_max_order, size);
min_objects = min(min_objects, max_objects);
while (min_objects > 1) {
@@ -3246,7 +3246,7 @@ static inline int calculate_order(unsigned int size, unsigned int reserved)
fraction = 16;
while (fraction >= 4) {
order = slab_order(size, min_objects,
- slub_max_order, fraction, reserved);
+ slub_max_order, fraction);
if (order <= slub_max_order)
return order;
fraction /= 2;
@@ -3258,14 +3258,14 @@ static inline int calculate_order(unsigned int size, unsigned int reserved)
* We were unable to place multiple objects in a slab. Now
* lets see if we can place a single object there.
*/
- order = slab_order(size, 1, slub_max_order, 1, reserved);
+ order = slab_order(size, 1, slub_max_order, 1);
if (order <= slub_max_order)
return order;
/*
* Doh this slab cannot be placed using slub_max_order.
*/
- order = slab_order(size, 1, MAX_ORDER, 1, reserved);
+ order = slab_order(size, 1, MAX_ORDER, 1);
if (order < MAX_ORDER)
return order;
return -ENOSYS;
@@ -3533,7 +3533,7 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
if (forced_order >= 0)
order = forced_order;
else
- order = calculate_order(size, s->reserved);
+ order = calculate_order(size);
if ((int)order < 0)
return 0;
@@ -3551,8 +3551,8 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
/*
* Determine the number of objects per slab
*/
- s->oo = oo_make(order, size, s->reserved);
- s->min = oo_make(get_order(size), size, s->reserved);
+ s->oo = oo_make(order, size);
+ s->min = oo_make(get_order(size), size);
if (oo_objects(s->oo) > oo_objects(s->max))
s->max = s->oo;
@@ -3562,7 +3562,6 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
{
s->flags = kmem_cache_flags(s->size, flags, s->name, s->ctor);
- s->reserved = 0;
#ifdef CONFIG_SLAB_FREELIST_HARDENED
s->random = get_random_long();
#endif
@@ -5106,7 +5105,7 @@ SLAB_ATTR_RO(destroy_by_rcu);
static ssize_t reserved_show(struct kmem_cache *s, char *buf)
{
- return sprintf(buf, "%u\n", s->reserved);
+ return sprintf(buf, "0\n");
}
SLAB_ATTR_RO(reserved);
--
2.17.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v4 16/16] slub: Remove kmem_cache->reserved
2018-04-30 20:22 ` [PATCH v4 16/16] slub: Remove kmem_cache->reserved Matthew Wilcox
@ 2018-05-01 16:43 ` Christopher Lameter
0 siblings, 0 replies; 30+ messages in thread
From: Christopher Lameter @ 2018-05-01 16:43 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Lai Jiangshan, Pekka Enberg, Vlastimil Babka, Dave Hansen,
Jérôme Glisse
On Mon, 30 Apr 2018, Matthew Wilcox wrote:
> The reserved field was only used for embedding an rcu_head in the data
> structure. With the previous commit, we no longer need it. That lets
> us remove the 'reserved' argument to a lot of functions.
Great work!
Acked-by: Christoph Lameter <cl@linux.com>
> @@ -5106,7 +5105,7 @@ SLAB_ATTR_RO(destroy_by_rcu);
>
> static ssize_t reserved_show(struct kmem_cache *s, char *buf)
> {
> - return sprintf(buf, "%u\n", s->reserved);
> + return sprintf(buf, "0\n");
> }
> SLAB_ATTR_RO(reserved);
Hmmm... Maybe its better if you remove the reserved file from sysfs
instead? I doubt anyone was using it.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v4 15/16] slab,slub: Remove rcu_head size checks
2018-04-30 20:22 ` [PATCH v4 15/16] slab,slub: Remove rcu_head size checks Matthew Wilcox
@ 2018-05-01 16:46 ` Christopher Lameter
0 siblings, 0 replies; 30+ messages in thread
From: Christopher Lameter @ 2018-05-01 16:46 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Lai Jiangshan, Pekka Enberg, Vlastimil Babka, Dave Hansen,
Jérôme Glisse
On Mon, 30 Apr 2018, Matthew Wilcox wrote:
> rcu_head may now grow larger than list_head without affecting slab or
> slub.
Acked-by: Christoph Lameter <cl@linux.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v4 07/16] slub: Remove page->counters
2018-04-30 20:22 ` [PATCH v4 07/16] slub: Remove page->counters Matthew Wilcox
@ 2018-05-01 16:48 ` Christopher Lameter
2018-05-02 17:26 ` Matthew Wilcox
0 siblings, 1 reply; 30+ messages in thread
From: Christopher Lameter @ 2018-05-01 16:48 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Lai Jiangshan, Pekka Enberg, Vlastimil Babka, Dave Hansen,
Jérôme Glisse
On Mon, 30 Apr 2018, Matthew Wilcox wrote:
> Use page->private instead, now that these two fields are in the same
> location. Include a compile-time assert that the fields don't get out
> of sync.
Hrm. This makes the source code a bit less readable. Guess its ok.
Acked-by: Christoph Lameter <cl@linux.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v4 13/16] mm: Add pt_mm to struct page
2018-04-30 20:22 ` [PATCH v4 13/16] mm: Add pt_mm to struct page Matthew Wilcox
@ 2018-05-02 8:12 ` Kirill A. Shutemov
2018-05-03 0:11 ` Matthew Wilcox
0 siblings, 1 reply; 30+ messages in thread
From: Kirill A. Shutemov @ 2018-05-02 8:12 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
Dave Hansen, Jérôme Glisse
On Mon, Apr 30, 2018 at 01:22:44PM -0700, Matthew Wilcox wrote:
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index e0e74e91f3e8..0e6117123737 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -134,7 +134,7 @@ struct page {
> unsigned long _pt_pad_1; /* compound_head */
> pgtable_t pmd_huge_pte; /* protected by page->ptl */
> unsigned long _pt_pad_2; /* mapping */
> - unsigned long _pt_pad_3;
> + struct mm_struct *pt_mm;
I guess it worth to have a comment that this field is only used of pgd
page tables and therefore doesn't conflict with pmd_huge_pte.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v4 07/16] slub: Remove page->counters
2018-05-01 16:48 ` Christopher Lameter
@ 2018-05-02 17:26 ` Matthew Wilcox
2018-05-02 22:17 ` Kirill A. Shutemov
0 siblings, 1 reply; 30+ messages in thread
From: Matthew Wilcox @ 2018-05-02 17:26 UTC (permalink / raw)
To: Christopher Lameter
Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Lai Jiangshan, Pekka Enberg, Vlastimil Babka, Dave Hansen,
Jérôme Glisse
On Tue, May 01, 2018 at 11:48:53AM -0500, Christopher Lameter wrote:
> On Mon, 30 Apr 2018, Matthew Wilcox wrote:
>
> > Use page->private instead, now that these two fields are in the same
> > location. Include a compile-time assert that the fields don't get out
> > of sync.
>
> Hrm. This makes the source code a bit less readable. Guess its ok.
>
> Acked-by: Christoph Lameter <cl@linux.com>
Thanks for the ACK. I'm not thrilled with this particular patch, but
I'm not thrilled with any of the other options we've come up with either.
Option 1:
Patch as written.
Pro: Keeps struct page simple
Con: Hidden dependency on page->private and page->inuse being in the same bits
Option 2:
@@ -113,9 +113,14 @@ struct page {
struct kmem_cache *slub_cache; /* shared with slab */
/* Double-word boundary */
void *slub_freelist; /* shared with slab */
- unsigned inuse:16;
- unsigned objects:15;
- unsigned frozen:1;
+ union {
+ unsigned long counters;
+ struct {
+ unsigned inuse:16;
+ unsigned objects:15;
+ unsigned frozen:1;
+ };
+ };
};
struct { /* Tail pages of compound page */
unsigned long compound_head; /* Bit zero is set */
Pro: Expresses exactly what we do.
Con: Back to five levels of indentation in struct page
Option 3: Use -fms-extensions to create a slub_page structure.
Pro: Indentation reduced to minimum and no cross-union dependencies
Con: Nobody seemed interested in the idea
Option 4: Use explicit shifting-and-masking to combine the three counters
into one word.
Con: Lots of churn.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v4 07/16] slub: Remove page->counters
2018-05-02 17:26 ` Matthew Wilcox
@ 2018-05-02 22:17 ` Kirill A. Shutemov
2018-05-03 0:52 ` Matthew Wilcox
0 siblings, 1 reply; 30+ messages in thread
From: Kirill A. Shutemov @ 2018-05-02 22:17 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Christopher Lameter, linux-mm, Matthew Wilcox, Andrew Morton,
Lai Jiangshan, Pekka Enberg, Vlastimil Babka, Dave Hansen,
Jérôme Glisse
On Wed, May 02, 2018 at 05:26:39PM +0000, Matthew Wilcox wrote:
> Option 2:
>
> @@ -113,9 +113,14 @@ struct page {
> struct kmem_cache *slub_cache; /* shared with slab */
> /* Double-word boundary */
> void *slub_freelist; /* shared with slab */
> - unsigned inuse:16;
> - unsigned objects:15;
> - unsigned frozen:1;
> + union {
> + unsigned long counters;
> + struct {
> + unsigned inuse:16;
> + unsigned objects:15;
> + unsigned frozen:1;
> + };
> + };
> };
> struct { /* Tail pages of compound page */
> unsigned long compound_head; /* Bit zero is set */
>
> Pro: Expresses exactly what we do.
> Con: Back to five levels of indentation in struct page
The indentation issue can be fixed (to some extend) by declaring the union
outside struct page and just use it inside.
I don't advocate for the approach, just listing the option.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v4 13/16] mm: Add pt_mm to struct page
2018-05-02 8:12 ` Kirill A. Shutemov
@ 2018-05-03 0:11 ` Matthew Wilcox
0 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-05-03 0:11 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: linux-mm, Matthew Wilcox, Andrew Morton, Kirill A . Shutemov,
Christoph Lameter, Lai Jiangshan, Pekka Enberg, Vlastimil Babka,
Dave Hansen, Jérôme Glisse
On Wed, May 02, 2018 at 11:12:17AM +0300, Kirill A. Shutemov wrote:
> On Mon, Apr 30, 2018 at 01:22:44PM -0700, Matthew Wilcox wrote:
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index e0e74e91f3e8..0e6117123737 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -134,7 +134,7 @@ struct page {
> > unsigned long _pt_pad_1; /* compound_head */
> > pgtable_t pmd_huge_pte; /* protected by page->ptl */
> > unsigned long _pt_pad_2; /* mapping */
> > - unsigned long _pt_pad_3;
> > + struct mm_struct *pt_mm;
>
> I guess it worth to have a comment that this field is only used of pgd
> page tables and therefore doesn't conflict with pmd_huge_pte.
Actually, it doesn't conflict with pmd_huge_pte -- it's in different
bits (both before and after this patch). What does 'conflict' with
pmd_huge_pte is the use of page->lru in the pgd. I have a plan to
eliminate that use of pgd->lru, but I need to do a couple of other
things first.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v4 07/16] slub: Remove page->counters
2018-05-02 22:17 ` Kirill A. Shutemov
@ 2018-05-03 0:52 ` Matthew Wilcox
2018-05-03 15:03 ` Christopher Lameter
0 siblings, 1 reply; 30+ messages in thread
From: Matthew Wilcox @ 2018-05-03 0:52 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Christopher Lameter, linux-mm, Matthew Wilcox, Andrew Morton,
Lai Jiangshan, Pekka Enberg, Vlastimil Babka, Dave Hansen,
Jérôme Glisse
On Thu, May 03, 2018 at 01:17:02AM +0300, Kirill A. Shutemov wrote:
> On Wed, May 02, 2018 at 05:26:39PM +0000, Matthew Wilcox wrote:
> > Option 2:
> > + union {
> > + unsigned long counters;
> > + struct {
> > + unsigned inuse:16;
> > + unsigned objects:15;
> > + unsigned frozen:1;
> > + };
> > + };
> >
> > Pro: Expresses exactly what we do.
> > Con: Back to five levels of indentation in struct page
>
> The indentation issue can be fixed (to some extend) by declaring the union
> outside struct page and just use it inside.
>
> I don't advocate for the approach, just listing the option.
Actually, you can't have an anonymous tagged union without -fms-extensions
(which got zero comments when I proposed it to lkml) or -fplan9-extensions
(which would require gcc 4.6)
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v4 07/16] slub: Remove page->counters
2018-05-03 0:52 ` Matthew Wilcox
@ 2018-05-03 15:03 ` Christopher Lameter
2018-05-03 18:28 ` Matthew Wilcox
0 siblings, 1 reply; 30+ messages in thread
From: Christopher Lameter @ 2018-05-03 15:03 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Kirill A. Shutemov, linux-mm, Matthew Wilcox, Andrew Morton,
Lai Jiangshan, Pekka Enberg, Vlastimil Babka, Dave Hansen,
Jérôme Glisse
On Wed, 2 May 2018, Matthew Wilcox wrote:
> > > Option 2:
> > > + union {
> > > + unsigned long counters;
> > > + struct {
> > > + unsigned inuse:16;
> > > + unsigned objects:15;
> > > + unsigned frozen:1;
> > > + };
> > > + };
> > >
> > > Pro: Expresses exactly what we do.
> > > Con: Back to five levels of indentation in struct page
I like that better. Improves readability of the code using struct page. I
think that is more important than the actual definition of struct page.
Given the overloaded overload situation this will require some deep
throught for newbies anyways. ;-)
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v4 07/16] slub: Remove page->counters
2018-05-03 15:03 ` Christopher Lameter
@ 2018-05-03 18:28 ` Matthew Wilcox
2018-05-04 14:55 ` Christopher Lameter
0 siblings, 1 reply; 30+ messages in thread
From: Matthew Wilcox @ 2018-05-03 18:28 UTC (permalink / raw)
To: Christopher Lameter
Cc: Kirill A. Shutemov, linux-mm, Matthew Wilcox, Andrew Morton,
Lai Jiangshan, Pekka Enberg, Vlastimil Babka, Dave Hansen,
Jérôme Glisse
On Thu, May 03, 2018 at 10:03:10AM -0500, Christopher Lameter wrote:
> On Wed, 2 May 2018, Matthew Wilcox wrote:
>
> > > > Option 2:
> > > > + union {
> > > > + unsigned long counters;
> > > > + struct {
> > > > + unsigned inuse:16;
> > > > + unsigned objects:15;
> > > > + unsigned frozen:1;
> > > > + };
> > > > + };
> > > >
> > > > Pro: Expresses exactly what we do.
> > > > Con: Back to five levels of indentation in struct page
>
> I like that better. Improves readability of the code using struct page. I
> think that is more important than the actual definition of struct page.
OK. Do you want the conversion of slub to using slub_freelist and slub_list
as part of this patch series as well, then?
The end result looks like this, btw:
struct { /* slub */
union {
struct list_head slub_list;
struct {
struct page *next; /* Next partial */
#ifdef CONFIG_64BIT
int pages; /* Nr of pages left */
int pobjects; /* Apprx # of objects */
#else
short int pages;
short int pobjects;
#endif
};
};
struct kmem_cache *slub_cache; /* shared with slab */
/* Double-word boundary */
void *slub_freelist; /* shared with slab */
union {
unsigned long counters;
struct {
unsigned inuse:16;
unsigned objects:15;
unsigned frozen:1;
};
};
};
Oh, and what do you want to do about cache_from_obj() in mm/slab.h?
That relies on having slab_cache be in the same location in struct
page as slub_cache. Maybe something like this?
page = virt_to_head_page(x);
#ifdef CONFIG_SLUB
cachep = page->slub_cache;
#else
cachep = page->slab_cache;
#endif
if (slab_equal_or_root(cachep, s))
return cachep;
> Given the overloaded overload situation this will require some deep
> throught for newbies anyways. ;-)
Yes, it's all quite entangled.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v4 07/16] slub: Remove page->counters
2018-05-03 18:28 ` Matthew Wilcox
@ 2018-05-04 14:55 ` Christopher Lameter
2018-05-04 15:15 ` Matthew Wilcox
0 siblings, 1 reply; 30+ messages in thread
From: Christopher Lameter @ 2018-05-04 14:55 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Kirill A. Shutemov, linux-mm, Matthew Wilcox, Andrew Morton,
Lai Jiangshan, Pekka Enberg, Vlastimil Babka, Dave Hansen,
Jérôme Glisse
On Thu, 3 May 2018, Matthew Wilcox wrote:
> OK. Do you want the conversion of slub to using slub_freelist and slub_list
> as part of this patch series as well, then?
Not sure if that is needed. Dont like allocator specific names.
> Oh, and what do you want to do about cache_from_obj() in mm/slab.h?
> That relies on having slab_cache be in the same location in struct
> page as slub_cache. Maybe something like this?
>
> page = virt_to_head_page(x);
> #ifdef CONFIG_SLUB
> cachep = page->slub_cache;
> #else
> cachep = page->slab_cache;
> #endif
> if (slab_equal_or_root(cachep, s))
> return cachep;
Name the field "cache" instead of sl?b_cache?
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v4 07/16] slub: Remove page->counters
2018-05-04 14:55 ` Christopher Lameter
@ 2018-05-04 15:15 ` Matthew Wilcox
2018-05-04 16:29 ` Christopher Lameter
0 siblings, 1 reply; 30+ messages in thread
From: Matthew Wilcox @ 2018-05-04 15:15 UTC (permalink / raw)
To: Christopher Lameter
Cc: Kirill A. Shutemov, linux-mm, Matthew Wilcox, Andrew Morton,
Lai Jiangshan, Pekka Enberg, Vlastimil Babka, Dave Hansen,
Jérôme Glisse
On Fri, May 04, 2018 at 09:55:30AM -0500, Christopher Lameter wrote:
> On Thu, 3 May 2018, Matthew Wilcox wrote:
>
> > OK. Do you want the conversion of slub to using slub_freelist and slub_list
> > as part of this patch series as well, then?
>
> Not sure if that is needed. Dont like allocator specific names.
So you'd rather have one union that's used for slab/slob/slub? Like this?
struct { /* slab, slob and slub */
union {
struct list_head slab_list;
struct { /* Partial pages */
struct page *next;
#ifdef CONFIG_64BIT
int pages; /* Nr of pages left */
int pobjects; /* Approximate count */
#else
short int pages;
short int pobjects;
#endif
};
};
struct kmem_cache *slab_cache;
/* Double-word boundary */
void *freelist; /* first free object */
union {
void *s_mem; /* first object (slab only) */
unsigned long counters; /* slub */
struct {
unsigned inuse:16;
unsigned objects:15;
unsigned frozen:1;
};
};
};
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v4 07/16] slub: Remove page->counters
2018-05-04 15:15 ` Matthew Wilcox
@ 2018-05-04 16:29 ` Christopher Lameter
0 siblings, 0 replies; 30+ messages in thread
From: Christopher Lameter @ 2018-05-04 16:29 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Kirill A. Shutemov, linux-mm, Matthew Wilcox, Andrew Morton,
Lai Jiangshan, Pekka Enberg, Vlastimil Babka, Dave Hansen,
Jérôme Glisse
On Fri, 4 May 2018, Matthew Wilcox wrote:
> So you'd rather have one union that's used for slab/slob/slub? Like this?
Yup that looks better.
>
> struct { /* slab, slob and slub */
> union {
> struct list_head slab_list;
> struct { /* Partial pages */
> struct page *next;
> #ifdef CONFIG_64BIT
> int pages; /* Nr of pages left */
> int pobjects; /* Approximate count */
> #else
> short int pages;
> short int pobjects;
> #endif
> };
> };
> struct kmem_cache *slab_cache;
> /* Double-word boundary */
> void *freelist; /* first free object */
> union {
> void *s_mem; /* first object (slab only) */
> unsigned long counters; /* slub */
> struct {
> unsigned inuse:16;
> unsigned objects:15;
> unsigned frozen:1;
> };
> };
> };
>
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2018-05-04 16:29 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-30 20:22 [PATCH v4 00/16] Rearrange struct page Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 01/16] s390: Use _refcount for pgtables Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 02/16] mm: Split page_type out from _mapcount Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 03/16] mm: Mark pages in use for page tables Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 04/16] mm: Switch s_mem and slab_cache in struct page Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 05/16] mm: Move 'private' union within " Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 06/16] mm: Move _refcount out of struct page union Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 07/16] slub: Remove page->counters Matthew Wilcox
2018-05-01 16:48 ` Christopher Lameter
2018-05-02 17:26 ` Matthew Wilcox
2018-05-02 22:17 ` Kirill A. Shutemov
2018-05-03 0:52 ` Matthew Wilcox
2018-05-03 15:03 ` Christopher Lameter
2018-05-03 18:28 ` Matthew Wilcox
2018-05-04 14:55 ` Christopher Lameter
2018-05-04 15:15 ` Matthew Wilcox
2018-05-04 16:29 ` Christopher Lameter
2018-04-30 20:22 ` [PATCH v4 08/16] mm: Combine first three unions in struct page Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 09/16] mm: Use page->deferred_list Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 10/16] mm: Move lru union within struct page Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 11/16] mm: Combine first two unions in " Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 12/16] mm: Improve struct page documentation Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 13/16] mm: Add pt_mm to struct page Matthew Wilcox
2018-05-02 8:12 ` Kirill A. Shutemov
2018-05-03 0:11 ` Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 14/16] mm: Add hmm_data " Matthew Wilcox
2018-04-30 20:22 ` [PATCH v4 15/16] slab,slub: Remove rcu_head size checks Matthew Wilcox
2018-05-01 16:46 ` Christopher Lameter
2018-04-30 20:22 ` [PATCH v4 16/16] slub: Remove kmem_cache->reserved Matthew Wilcox
2018-05-01 16:43 ` Christopher Lameter
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.