All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2 0/4] Fix compound_head() race
@ 2015-08-17 15:09 ` Kirill A. Shutemov
  0 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2015-08-17 15:09 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins
  Cc: Andrea Arcangeli, Dave Hansen, Vlastimil Babka, Johannes Weiner,
	Michal Hocko, David Rientjes, linux-kernel, linux-mm,
	Kirill A. Shutemov

Here's my attempt on fixing recently discovered race in compound_head().
It should make compound_head() reliable in all contexts.

The patchset is against Linus' tree. Let me know if it need to be rebased
onto different baseline.

It's expected to have conflicts with my page-flags patchset and probably
should be applied before it.

v2: Per Hugh's suggestion page->compound_head is moved into third double
    word. This way we can avoid memory overhead which v1 had in some
    cases.

    This place in struct page is rather overloaded. More testing is
    required to make sure we don't collide with anyone.

Kirill A. Shutemov (4):
  mm: drop page->slab_page
  zsmalloc: use page->private instead of page->first_page
  mm: pack compound_dtor and compound_order into one word in struct page
  mm: make compound_head() robust

 Documentation/vm/split_page_table_lock |  4 +-
 arch/xtensa/configs/iss_defconfig      |  1 -
 include/linux/mm.h                     | 75 +++++++++----------------------
 include/linux/mm_types.h               | 20 ++++++---
 include/linux/page-flags.h             | 80 ++++++++--------------------------
 mm/Kconfig                             | 12 -----
 mm/debug.c                             |  5 ---
 mm/huge_memory.c                       |  3 +-
 mm/hugetlb.c                           | 16 +++----
 mm/internal.h                          |  4 +-
 mm/memory-failure.c                    |  7 ---
 mm/page_alloc.c                        | 37 ++++++++++------
 mm/swap.c                              |  4 +-
 mm/zsmalloc.c                          | 11 +++--
 14 files changed, 94 insertions(+), 185 deletions(-)

-- 
2.5.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCHv2 0/4] Fix compound_head() race
@ 2015-08-17 15:09 ` Kirill A. Shutemov
  0 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2015-08-17 15:09 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins
  Cc: Andrea Arcangeli, Dave Hansen, Vlastimil Babka, Johannes Weiner,
	Michal Hocko, David Rientjes, linux-kernel, linux-mm,
	Kirill A. Shutemov

Here's my attempt on fixing recently discovered race in compound_head().
It should make compound_head() reliable in all contexts.

The patchset is against Linus' tree. Let me know if it need to be rebased
onto different baseline.

It's expected to have conflicts with my page-flags patchset and probably
should be applied before it.

v2: Per Hugh's suggestion page->compound_head is moved into third double
    word. This way we can avoid memory overhead which v1 had in some
    cases.

    This place in struct page is rather overloaded. More testing is
    required to make sure we don't collide with anyone.

Kirill A. Shutemov (4):
  mm: drop page->slab_page
  zsmalloc: use page->private instead of page->first_page
  mm: pack compound_dtor and compound_order into one word in struct page
  mm: make compound_head() robust

 Documentation/vm/split_page_table_lock |  4 +-
 arch/xtensa/configs/iss_defconfig      |  1 -
 include/linux/mm.h                     | 75 +++++++++----------------------
 include/linux/mm_types.h               | 20 ++++++---
 include/linux/page-flags.h             | 80 ++++++++--------------------------
 mm/Kconfig                             | 12 -----
 mm/debug.c                             |  5 ---
 mm/huge_memory.c                       |  3 +-
 mm/hugetlb.c                           | 16 +++----
 mm/internal.h                          |  4 +-
 mm/memory-failure.c                    |  7 ---
 mm/page_alloc.c                        | 37 ++++++++++------
 mm/swap.c                              |  4 +-
 mm/zsmalloc.c                          | 11 +++--
 14 files changed, 94 insertions(+), 185 deletions(-)

-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCHv2 1/4] mm: drop page->slab_page
  2015-08-17 15:09 ` Kirill A. Shutemov
@ 2015-08-17 15:09   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2015-08-17 15:09 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins
  Cc: Andrea Arcangeli, Dave Hansen, Vlastimil Babka, Johannes Weiner,
	Michal Hocko, David Rientjes, linux-kernel, linux-mm,
	Kirill A. Shutemov, Joonsoo Kim, Andi Kleen, Christoph Lameter

Since 8456a648cf44 ("slab: use struct page for slab management") nobody
uses slab_page field in struct page.

Let's drop it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Christoph Lameter <cl@linux.com>
---
 include/linux/mm_types.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 0038ac7466fd..58620ac7f15c 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -140,7 +140,6 @@ struct page {
 #endif
 		};
 
-		struct slab *slab_page; /* slab fields */
 		struct rcu_head rcu_head;	/* Used by SLAB
 						 * when destroying via RCU
 						 */
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCHv2 1/4] mm: drop page->slab_page
@ 2015-08-17 15:09   ` Kirill A. Shutemov
  0 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2015-08-17 15:09 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins
  Cc: Andrea Arcangeli, Dave Hansen, Vlastimil Babka, Johannes Weiner,
	Michal Hocko, David Rientjes, linux-kernel, linux-mm,
	Kirill A. Shutemov, Joonsoo Kim, Andi Kleen, Christoph Lameter

Since 8456a648cf44 ("slab: use struct page for slab management") nobody
uses slab_page field in struct page.

Let's drop it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Christoph Lameter <cl@linux.com>
---
 include/linux/mm_types.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 0038ac7466fd..58620ac7f15c 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -140,7 +140,6 @@ struct page {
 #endif
 		};
 
-		struct slab *slab_page; /* slab fields */
 		struct rcu_head rcu_head;	/* Used by SLAB
 						 * when destroying via RCU
 						 */
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCHv2 2/4] zsmalloc: use page->private instead of page->first_page
  2015-08-17 15:09 ` Kirill A. Shutemov
@ 2015-08-17 15:09   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2015-08-17 15:09 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins
  Cc: Andrea Arcangeli, Dave Hansen, Vlastimil Babka, Johannes Weiner,
	Michal Hocko, David Rientjes, linux-kernel, linux-mm,
	Kirill A. Shutemov

We are going to rework how compound_head() work. It will not use
page->first_page as we have it now.

The only other user of page->fisrt_page beyond compound pages is
zsmalloc.

Let's use page->private instead of page->first_page here. It occupies
the same storage space.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/zsmalloc.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 0a7f81aa2249..a85754e69879 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -16,7 +16,7 @@
  * struct page(s) to form a zspage.
  *
  * Usage of struct page fields:
- *	page->first_page: points to the first component (0-order) page
+ *	page->private: points to the first component (0-order) page
  *	page->index (union with page->freelist): offset of the first object
  *		starting in this page. For the first page, this is
  *		always 0, so we use this field (aka freelist) to point
@@ -26,8 +26,7 @@
  *
  *	For _first_ page only:
  *
- *	page->private (union with page->first_page): refers to the
- *		component page after the first page
+ *	page->private: refers to the component page after the first page
  *		If the page is first_page for huge object, it stores handle.
  *		Look at size_class->huge.
  *	page->freelist: points to the first free object in zspage.
@@ -770,7 +769,7 @@ static struct page *get_first_page(struct page *page)
 	if (is_first_page(page))
 		return page;
 	else
-		return page->first_page;
+		return (struct page *)page_private(page);
 }
 
 static struct page *get_next_page(struct page *page)
@@ -955,7 +954,7 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
 	 * Allocate individual pages and link them together as:
 	 * 1. first page->private = first sub-page
 	 * 2. all sub-pages are linked together using page->lru
-	 * 3. each sub-page is linked to the first page using page->first_page
+	 * 3. each sub-page is linked to the first page using page->private
 	 *
 	 * For each size class, First/Head pages are linked together using
 	 * page->lru. Also, we set PG_private to identify the first page
@@ -980,7 +979,7 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
 		if (i == 1)
 			set_page_private(first_page, (unsigned long)page);
 		if (i >= 1)
-			page->first_page = first_page;
+			set_page_private(first_page, (unsigned long)first_page);
 		if (i >= 2)
 			list_add(&page->lru, &prev_page->lru);
 		if (i == class->pages_per_zspage - 1)	/* last page */
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCHv2 2/4] zsmalloc: use page->private instead of page->first_page
@ 2015-08-17 15:09   ` Kirill A. Shutemov
  0 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2015-08-17 15:09 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins
  Cc: Andrea Arcangeli, Dave Hansen, Vlastimil Babka, Johannes Weiner,
	Michal Hocko, David Rientjes, linux-kernel, linux-mm,
	Kirill A. Shutemov

We are going to rework how compound_head() work. It will not use
page->first_page as we have it now.

The only other user of page->fisrt_page beyond compound pages is
zsmalloc.

Let's use page->private instead of page->first_page here. It occupies
the same storage space.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/zsmalloc.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 0a7f81aa2249..a85754e69879 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -16,7 +16,7 @@
  * struct page(s) to form a zspage.
  *
  * Usage of struct page fields:
- *	page->first_page: points to the first component (0-order) page
+ *	page->private: points to the first component (0-order) page
  *	page->index (union with page->freelist): offset of the first object
  *		starting in this page. For the first page, this is
  *		always 0, so we use this field (aka freelist) to point
@@ -26,8 +26,7 @@
  *
  *	For _first_ page only:
  *
- *	page->private (union with page->first_page): refers to the
- *		component page after the first page
+ *	page->private: refers to the component page after the first page
  *		If the page is first_page for huge object, it stores handle.
  *		Look at size_class->huge.
  *	page->freelist: points to the first free object in zspage.
@@ -770,7 +769,7 @@ static struct page *get_first_page(struct page *page)
 	if (is_first_page(page))
 		return page;
 	else
-		return page->first_page;
+		return (struct page *)page_private(page);
 }
 
 static struct page *get_next_page(struct page *page)
@@ -955,7 +954,7 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
 	 * Allocate individual pages and link them together as:
 	 * 1. first page->private = first sub-page
 	 * 2. all sub-pages are linked together using page->lru
-	 * 3. each sub-page is linked to the first page using page->first_page
+	 * 3. each sub-page is linked to the first page using page->private
 	 *
 	 * For each size class, First/Head pages are linked together using
 	 * page->lru. Also, we set PG_private to identify the first page
@@ -980,7 +979,7 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
 		if (i == 1)
 			set_page_private(first_page, (unsigned long)page);
 		if (i >= 1)
-			page->first_page = first_page;
+			set_page_private(first_page, (unsigned long)first_page);
 		if (i >= 2)
 			list_add(&page->lru, &prev_page->lru);
 		if (i == class->pages_per_zspage - 1)	/* last page */
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCHv2 3/4] mm: pack compound_dtor and compound_order into one word in struct page
  2015-08-17 15:09 ` Kirill A. Shutemov
@ 2015-08-17 15:09   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2015-08-17 15:09 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins
  Cc: Andrea Arcangeli, Dave Hansen, Vlastimil Babka, Johannes Weiner,
	Michal Hocko, David Rientjes, linux-kernel, linux-mm,
	Kirill A. Shutemov

The patch halves space occupied by compound_dtor and compound_order in
struct page.

For compound_order, it's trivial long -> int/short conversion.

For get_compound_page_dtor(), we now use hardcoded table for destructor
lookup and store its index in the struct page instead of direct pointer
to destructor. It shouldn't be a big trouble to maintain the table: we
have only two destructor and NULL currently.

This patch free up one word in tail pages for reuse. This is preparation
for the next patch.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm.h       | 22 +++++++++++++++++-----
 include/linux/mm_types.h | 11 +++++++----
 mm/hugetlb.c             |  8 ++++----
 mm/page_alloc.c          |  9 ++++++++-
 4 files changed, 36 insertions(+), 14 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2e872f92dbac..9c21bbb8875a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -575,18 +575,30 @@ int split_free_page(struct page *page);
 /*
  * Compound pages have a destructor function.  Provide a
  * prototype for that function and accessor functions.
- * These are _only_ valid on the head of a PG_compound page.
+ * These are _only_ valid on the head of a compound page.
  */
+typedef void compound_page_dtor(struct page *);
+
+/* Keep the enum in sync with compound_page_dtors array in mm/page_alloc.c */
+enum {
+	NULL_COMPOUND_DTOR,
+	COMPOUND_PAGE_DTOR,
+	HUGETLB_PAGE_DTOR,
+	NR_COMPOUND_DTORS,
+};
+extern compound_page_dtor * const compound_page_dtors[];
 
 static inline void set_compound_page_dtor(struct page *page,
-						compound_page_dtor *dtor)
+		unsigned int compound_dtor)
 {
-	page[1].compound_dtor = dtor;
+	VM_BUG_ON_PAGE(compound_dtor >= NR_COMPOUND_DTORS, page);
+	page[1].compound_dtor = compound_dtor;
 }
 
 static inline compound_page_dtor *get_compound_page_dtor(struct page *page)
 {
-	return page[1].compound_dtor;
+	VM_BUG_ON_PAGE(page[1].compound_dtor >= NR_COMPOUND_DTORS, page);
+	return compound_page_dtors[page[1].compound_dtor];
 }
 
 static inline int compound_order(struct page *page)
@@ -596,7 +608,7 @@ static inline int compound_order(struct page *page)
 	return page[1].compound_order;
 }
 
-static inline void set_compound_order(struct page *page, unsigned long order)
+static inline void set_compound_order(struct page *page, unsigned int order)
 {
 	page[1].compound_order = order;
 }
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 58620ac7f15c..63cdfe7ec336 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -28,8 +28,6 @@ struct mem_cgroup;
 		IS_ENABLED(CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK))
 #define ALLOC_SPLIT_PTLOCKS	(SPINLOCK_SIZE > BITS_PER_LONG/8)
 
-typedef void compound_page_dtor(struct page *);
-
 /*
  * Each physical page in the system has a struct page associated with
  * it to keep track of whatever it is we are using the page for at the
@@ -145,8 +143,13 @@ struct page {
 						 */
 		/* First tail page of compound page */
 		struct {
-			compound_page_dtor *compound_dtor;
-			unsigned long compound_order;
+#ifdef CONFIG_64BIT
+			unsigned int compound_dtor;
+			unsigned int compound_order;
+#else
+			unsigned short int compound_dtor;
+			unsigned short int compound_order;
+#endif
 		};
 
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a8c3087089d8..8ea74caa1fa8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -969,7 +969,7 @@ static void update_and_free_page(struct hstate *h, struct page *page)
 				1 << PG_writeback);
 	}
 	VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page), page);
-	set_compound_page_dtor(page, NULL);
+	set_compound_page_dtor(page, NULL_COMPOUND_DTOR);
 	set_page_refcounted(page);
 	if (hstate_is_gigantic(h)) {
 		destroy_compound_gigantic_page(page, huge_page_order(h));
@@ -1065,7 +1065,7 @@ void free_huge_page(struct page *page)
 static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
 {
 	INIT_LIST_HEAD(&page->lru);
-	set_compound_page_dtor(page, free_huge_page);
+	set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
 	spin_lock(&hugetlb_lock);
 	set_hugetlb_cgroup(page, NULL);
 	h->nr_huge_pages++;
@@ -1117,7 +1117,7 @@ int PageHuge(struct page *page)
 		return 0;
 
 	page = compound_head(page);
-	return get_compound_page_dtor(page) == free_huge_page;
+	return page[1].compound_dtor == HUGETLB_PAGE_DTOR;
 }
 EXPORT_SYMBOL_GPL(PageHuge);
 
@@ -1314,7 +1314,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
 	if (page) {
 		INIT_LIST_HEAD(&page->lru);
 		r_nid = page_to_nid(page);
-		set_compound_page_dtor(page, free_huge_page);
+		set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
 		set_hugetlb_cgroup(page, NULL);
 		/*
 		 * We incremented the global counters already
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index df959b7d6085..beab86e694b2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -208,6 +208,13 @@ static char * const zone_names[MAX_NR_ZONES] = {
 	 "Movable",
 };
 
+static void free_compound_page(struct page *page);
+compound_page_dtor * const compound_page_dtors[] = {
+	NULL,
+	free_compound_page,
+	free_huge_page,
+};
+
 int min_free_kbytes = 1024;
 int user_min_free_kbytes = -1;
 
@@ -437,7 +444,7 @@ void prep_compound_page(struct page *page, unsigned long order)
 	int i;
 	int nr_pages = 1 << order;
 
-	set_compound_page_dtor(page, free_compound_page);
+	set_compound_page_dtor(page, COMPOUND_PAGE_DTOR);
 	set_compound_order(page, order);
 	__SetPageHead(page);
 	for (i = 1; i < nr_pages; i++) {
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCHv2 3/4] mm: pack compound_dtor and compound_order into one word in struct page
@ 2015-08-17 15:09   ` Kirill A. Shutemov
  0 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2015-08-17 15:09 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins
  Cc: Andrea Arcangeli, Dave Hansen, Vlastimil Babka, Johannes Weiner,
	Michal Hocko, David Rientjes, linux-kernel, linux-mm,
	Kirill A. Shutemov

The patch halves space occupied by compound_dtor and compound_order in
struct page.

For compound_order, it's trivial long -> int/short conversion.

For get_compound_page_dtor(), we now use hardcoded table for destructor
lookup and store its index in the struct page instead of direct pointer
to destructor. It shouldn't be a big trouble to maintain the table: we
have only two destructor and NULL currently.

This patch free up one word in tail pages for reuse. This is preparation
for the next patch.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm.h       | 22 +++++++++++++++++-----
 include/linux/mm_types.h | 11 +++++++----
 mm/hugetlb.c             |  8 ++++----
 mm/page_alloc.c          |  9 ++++++++-
 4 files changed, 36 insertions(+), 14 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2e872f92dbac..9c21bbb8875a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -575,18 +575,30 @@ int split_free_page(struct page *page);
 /*
  * Compound pages have a destructor function.  Provide a
  * prototype for that function and accessor functions.
- * These are _only_ valid on the head of a PG_compound page.
+ * These are _only_ valid on the head of a compound page.
  */
+typedef void compound_page_dtor(struct page *);
+
+/* Keep the enum in sync with compound_page_dtors array in mm/page_alloc.c */
+enum {
+	NULL_COMPOUND_DTOR,
+	COMPOUND_PAGE_DTOR,
+	HUGETLB_PAGE_DTOR,
+	NR_COMPOUND_DTORS,
+};
+extern compound_page_dtor * const compound_page_dtors[];
 
 static inline void set_compound_page_dtor(struct page *page,
-						compound_page_dtor *dtor)
+		unsigned int compound_dtor)
 {
-	page[1].compound_dtor = dtor;
+	VM_BUG_ON_PAGE(compound_dtor >= NR_COMPOUND_DTORS, page);
+	page[1].compound_dtor = compound_dtor;
 }
 
 static inline compound_page_dtor *get_compound_page_dtor(struct page *page)
 {
-	return page[1].compound_dtor;
+	VM_BUG_ON_PAGE(page[1].compound_dtor >= NR_COMPOUND_DTORS, page);
+	return compound_page_dtors[page[1].compound_dtor];
 }
 
 static inline int compound_order(struct page *page)
@@ -596,7 +608,7 @@ static inline int compound_order(struct page *page)
 	return page[1].compound_order;
 }
 
-static inline void set_compound_order(struct page *page, unsigned long order)
+static inline void set_compound_order(struct page *page, unsigned int order)
 {
 	page[1].compound_order = order;
 }
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 58620ac7f15c..63cdfe7ec336 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -28,8 +28,6 @@ struct mem_cgroup;
 		IS_ENABLED(CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK))
 #define ALLOC_SPLIT_PTLOCKS	(SPINLOCK_SIZE > BITS_PER_LONG/8)
 
-typedef void compound_page_dtor(struct page *);
-
 /*
  * Each physical page in the system has a struct page associated with
  * it to keep track of whatever it is we are using the page for at the
@@ -145,8 +143,13 @@ struct page {
 						 */
 		/* First tail page of compound page */
 		struct {
-			compound_page_dtor *compound_dtor;
-			unsigned long compound_order;
+#ifdef CONFIG_64BIT
+			unsigned int compound_dtor;
+			unsigned int compound_order;
+#else
+			unsigned short int compound_dtor;
+			unsigned short int compound_order;
+#endif
 		};
 
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a8c3087089d8..8ea74caa1fa8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -969,7 +969,7 @@ static void update_and_free_page(struct hstate *h, struct page *page)
 				1 << PG_writeback);
 	}
 	VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page), page);
-	set_compound_page_dtor(page, NULL);
+	set_compound_page_dtor(page, NULL_COMPOUND_DTOR);
 	set_page_refcounted(page);
 	if (hstate_is_gigantic(h)) {
 		destroy_compound_gigantic_page(page, huge_page_order(h));
@@ -1065,7 +1065,7 @@ void free_huge_page(struct page *page)
 static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
 {
 	INIT_LIST_HEAD(&page->lru);
-	set_compound_page_dtor(page, free_huge_page);
+	set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
 	spin_lock(&hugetlb_lock);
 	set_hugetlb_cgroup(page, NULL);
 	h->nr_huge_pages++;
@@ -1117,7 +1117,7 @@ int PageHuge(struct page *page)
 		return 0;
 
 	page = compound_head(page);
-	return get_compound_page_dtor(page) == free_huge_page;
+	return page[1].compound_dtor == HUGETLB_PAGE_DTOR;
 }
 EXPORT_SYMBOL_GPL(PageHuge);
 
@@ -1314,7 +1314,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
 	if (page) {
 		INIT_LIST_HEAD(&page->lru);
 		r_nid = page_to_nid(page);
-		set_compound_page_dtor(page, free_huge_page);
+		set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
 		set_hugetlb_cgroup(page, NULL);
 		/*
 		 * We incremented the global counters already
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index df959b7d6085..beab86e694b2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -208,6 +208,13 @@ static char * const zone_names[MAX_NR_ZONES] = {
 	 "Movable",
 };
 
+static void free_compound_page(struct page *page);
+compound_page_dtor * const compound_page_dtors[] = {
+	NULL,
+	free_compound_page,
+	free_huge_page,
+};
+
 int min_free_kbytes = 1024;
 int user_min_free_kbytes = -1;
 
@@ -437,7 +444,7 @@ void prep_compound_page(struct page *page, unsigned long order)
 	int i;
 	int nr_pages = 1 << order;
 
-	set_compound_page_dtor(page, free_compound_page);
+	set_compound_page_dtor(page, COMPOUND_PAGE_DTOR);
 	set_compound_order(page, order);
 	__SetPageHead(page);
 	for (i = 1; i < nr_pages; i++) {
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCHv2 4/4] mm: make compound_head() robust
  2015-08-17 15:09 ` Kirill A. Shutemov
@ 2015-08-17 15:09   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2015-08-17 15:09 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins
  Cc: Andrea Arcangeli, Dave Hansen, Vlastimil Babka, Johannes Weiner,
	Michal Hocko, David Rientjes, linux-kernel, linux-mm,
	Kirill A. Shutemov

Hugh has pointed that compound_head() call can be unsafe in some
context. There's one example:

	CPU0					CPU1

isolate_migratepages_block()
  page_count()
    compound_head()
      !!PageTail() == true
					put_page()
					  tail->first_page = NULL
      head = tail->first_page
					alloc_pages(__GFP_COMP)
					   prep_compound_page()
					     tail->first_page = head
					     __SetPageTail(p);
      !!PageTail() == true
    <head == NULL dereferencing>

The race is pure theoretical. I don't it's possible to trigger it in
practice. But who knows.

We can fix the race by changing how encode PageTail() and compound_head()
within struct page to be able to update them in one shot.

The patch introduces page->compound_head into third double word block in
front of compound_dtor and compound_order. That means it shares storage
space with:

 - page->lru.next;
 - page->next;
 - page->rcu_head.next;
 - page->pmd_huge_pte;

That's too long list to be absolutely sure, but looks like nobody uses
bit 0 of the word. It can be used to encode PageTail(). And if the bit
set, rest of the word is pointer to head page.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/vm/split_page_table_lock |  4 +-
 arch/xtensa/configs/iss_defconfig      |  1 -
 include/linux/mm.h                     | 53 ++--------------------
 include/linux/mm_types.h               |  8 +++-
 include/linux/page-flags.h             | 80 ++++++++--------------------------
 mm/Kconfig                             | 12 -----
 mm/debug.c                             |  5 ---
 mm/huge_memory.c                       |  3 +-
 mm/hugetlb.c                           |  8 +---
 mm/internal.h                          |  4 +-
 mm/memory-failure.c                    |  7 ---
 mm/page_alloc.c                        | 28 +++++++-----
 mm/swap.c                              |  4 +-
 13 files changed, 53 insertions(+), 164 deletions(-)

diff --git a/Documentation/vm/split_page_table_lock b/Documentation/vm/split_page_table_lock
index 6dea4fd5c961..62842a857dab 100644
--- a/Documentation/vm/split_page_table_lock
+++ b/Documentation/vm/split_page_table_lock
@@ -54,8 +54,8 @@ everything required is done by pgtable_page_ctor() and pgtable_page_dtor(),
 which must be called on PTE table allocation / freeing.
 
 Make sure the architecture doesn't use slab allocator for page table
-allocation: slab uses page->slab_cache and page->first_page for its pages.
-These fields share storage with page->ptl.
+allocation: slab uses page->slab_cache for its pages.
+This field shares storage with page->ptl.
 
 PMD split lock only makes sense if you have more than two page table
 levels.
diff --git a/arch/xtensa/configs/iss_defconfig b/arch/xtensa/configs/iss_defconfig
index e4d193e7a300..5c7c385f21c4 100644
--- a/arch/xtensa/configs/iss_defconfig
+++ b/arch/xtensa/configs/iss_defconfig
@@ -169,7 +169,6 @@ CONFIG_FLATMEM_MANUAL=y
 # CONFIG_SPARSEMEM_MANUAL is not set
 CONFIG_FLATMEM=y
 CONFIG_FLAT_NODE_MEM_MAP=y
-CONFIG_PAGEFLAGS_EXTENDED=y
 CONFIG_SPLIT_PTLOCK_CPUS=4
 # CONFIG_PHYS_ADDR_T_64BIT is not set
 CONFIG_ZONE_DMA_FLAG=1
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9c21bbb8875a..5090a0b9bb43 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -437,46 +437,6 @@ static inline void compound_unlock_irqrestore(struct page *page,
 #endif
 }
 
-static inline struct page *compound_head_by_tail(struct page *tail)
-{
-	struct page *head = tail->first_page;
-
-	/*
-	 * page->first_page may be a dangling pointer to an old
-	 * compound page, so recheck that it is still a tail
-	 * page before returning.
-	 */
-	smp_rmb();
-	if (likely(PageTail(tail)))
-		return head;
-	return tail;
-}
-
-/*
- * Since either compound page could be dismantled asynchronously in THP
- * or we access asynchronously arbitrary positioned struct page, there
- * would be tail flag race. To handle this race, we should call
- * smp_rmb() before checking tail flag. compound_head_by_tail() did it.
- */
-static inline struct page *compound_head(struct page *page)
-{
-	if (unlikely(PageTail(page)))
-		return compound_head_by_tail(page);
-	return page;
-}
-
-/*
- * If we access compound page synchronously such as access to
- * allocated page, there is no need to handle tail flag race, so we can
- * check tail flag directly without any synchronization primitive.
- */
-static inline struct page *compound_head_fast(struct page *page)
-{
-	if (unlikely(PageTail(page)))
-		return page->first_page;
-	return page;
-}
-
 /*
  * The atomic page->_mapcount, starts from -1: so that transitions
  * both from it and to it can be tracked, using atomic_inc_and_test
@@ -525,7 +485,7 @@ static inline void get_huge_page_tail(struct page *page)
 	VM_BUG_ON_PAGE(!PageTail(page), page);
 	VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
 	VM_BUG_ON_PAGE(atomic_read(&page->_count) != 0, page);
-	if (compound_tail_refcounted(page->first_page))
+	if (compound_tail_refcounted(compound_head(page)))
 		atomic_inc(&page->_mapcount);
 }
 
@@ -548,13 +508,7 @@ static inline struct page *virt_to_head_page(const void *x)
 {
 	struct page *page = virt_to_page(x);
 
-	/*
-	 * We don't need to worry about synchronization of tail flag
-	 * when we call virt_to_head_page() since it is only called for
-	 * already allocated page and this page won't be freed until
-	 * this virt_to_head_page() is finished. So use _fast variant.
-	 */
-	return compound_head_fast(page);
+	return compound_head(page);
 }
 
 /*
@@ -1494,8 +1448,7 @@ static inline bool ptlock_init(struct page *page)
 	 * with 0. Make sure nobody took it in use in between.
 	 *
 	 * It can happen if arch try to use slab for page table allocation:
-	 * slab code uses page->slab_cache and page->first_page (for tail
-	 * pages), which share storage with page->ptl.
+	 * slab code uses page->slab_cache, which share storage with page->ptl.
 	 */
 	VM_BUG_ON_PAGE(*(unsigned long *)&page->ptl, page);
 	if (!ptlock_alloc(page))
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 63cdfe7ec336..ad47b61d96ce 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -120,7 +120,12 @@ struct page {
 		};
 	};
 
-	/* Third double word block */
+	/*
+	 * Third double word block
+	 *
+	 * WARNING: bit 0 of the first word encode PageTail and *must* be 0
+	 * for non-tail pages.
+	 */
 	union {
 		struct list_head lru;	/* Pageout list, eg. active_list
 					 * protected by zone->lru_lock !
@@ -143,6 +148,7 @@ struct page {
 						 */
 		/* First tail page of compound page */
 		struct {
+			unsigned long compound_head; /* If bit zero is set */
 #ifdef CONFIG_64BIT
 			unsigned int compound_dtor;
 			unsigned int compound_order;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 41c93844fb1d..9b865158e452 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -86,12 +86,7 @@ enum pageflags {
 	PG_private,		/* If pagecache, has fs-private data */
 	PG_private_2,		/* If pagecache, has fs aux data */
 	PG_writeback,		/* Page is under writeback */
-#ifdef CONFIG_PAGEFLAGS_EXTENDED
 	PG_head,		/* A head page */
-	PG_tail,		/* A tail page */
-#else
-	PG_compound,		/* A compound page */
-#endif
 	PG_swapcache,		/* Swap page: swp_entry_t in private */
 	PG_mappedtodisk,	/* Has blocks allocated on-disk */
 	PG_reclaim,		/* To be reclaimed asap */
@@ -387,85 +382,46 @@ static inline void set_page_writeback_keepwrite(struct page *page)
 	test_set_page_writeback_keepwrite(page);
 }
 
-#ifdef CONFIG_PAGEFLAGS_EXTENDED
-/*
- * System with lots of page flags available. This allows separate
- * flags for PageHead() and PageTail() checks of compound pages so that bit
- * tests can be used in performance sensitive paths. PageCompound is
- * generally not used in hot code paths except arch/powerpc/mm/init_64.c
- * and arch/powerpc/kvm/book3s_64_vio_hv.c which use it to detect huge pages
- * and avoid handling those in real mode.
- */
 __PAGEFLAG(Head, head) CLEARPAGEFLAG(Head, head)
-__PAGEFLAG(Tail, tail)
 
-static inline int PageCompound(struct page *page)
-{
-	return page->flags & ((1L << PG_head) | (1L << PG_tail));
-
-}
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static inline void ClearPageCompound(struct page *page)
+static inline int PageTail(struct page *page)
 {
-	BUG_ON(!PageHead(page));
-	ClearPageHead(page);
+	return READ_ONCE(page->compound_head) & 1;
 }
-#endif
-
-#define PG_head_mask ((1L << PG_head))
 
-#else
-/*
- * Reduce page flag use as much as possible by overlapping
- * compound page flags with the flags used for page cache pages. Possible
- * because PageCompound is always set for compound pages and not for
- * pages on the LRU and/or pagecache.
- */
-TESTPAGEFLAG(Compound, compound)
-__SETPAGEFLAG(Head, compound)  __CLEARPAGEFLAG(Head, compound)
-
-/*
- * PG_reclaim is used in combination with PG_compound to mark the
- * head and tail of a compound page. This saves one page flag
- * but makes it impossible to use compound pages for the page cache.
- * The PG_reclaim bit would have to be used for reclaim or readahead
- * if compound pages enter the page cache.
- *
- * PG_compound & PG_reclaim	=> Tail page
- * PG_compound & ~PG_reclaim	=> Head page
- */
-#define PG_head_mask ((1L << PG_compound))
-#define PG_head_tail_mask ((1L << PG_compound) | (1L << PG_reclaim))
-
-static inline int PageHead(struct page *page)
+static inline void set_compound_head(struct page *page, struct page *head)
 {
-	return ((page->flags & PG_head_tail_mask) == PG_head_mask);
+	WRITE_ONCE(page->compound_head, (unsigned long)head + 1);
 }
 
-static inline int PageTail(struct page *page)
+static inline void clear_compound_head(struct page *page)
 {
-	return ((page->flags & PG_head_tail_mask) == PG_head_tail_mask);
+	WRITE_ONCE(page->compound_head, 0);
 }
 
-static inline void __SetPageTail(struct page *page)
+static inline struct page *compound_head(struct page *page)
 {
-	page->flags |= PG_head_tail_mask;
+	unsigned long head = READ_ONCE(page->compound_head);
+
+	if (unlikely(head & 1))
+		return (struct page *) (head - 1);
+	return page;
 }
 
-static inline void __ClearPageTail(struct page *page)
+static inline int PageCompound(struct page *page)
 {
-	page->flags &= ~PG_head_tail_mask;
-}
+	return PageHead(page) || PageTail(page);
 
+}
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static inline void ClearPageCompound(struct page *page)
 {
-	BUG_ON((page->flags & PG_head_tail_mask) != (1 << PG_compound));
-	clear_bit(PG_compound, &page->flags);
+	BUG_ON(!PageHead(page));
+	ClearPageHead(page);
 }
 #endif
 
-#endif /* !PAGEFLAGS_EXTENDED */
+#define PG_head_mask ((1L << PG_head))
 
 #ifdef CONFIG_HUGETLB_PAGE
 int PageHuge(struct page *page);
diff --git a/mm/Kconfig b/mm/Kconfig
index e79de2bd12cd..454579d31081 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -200,18 +200,6 @@ config MEMORY_HOTREMOVE
 	depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE
 	depends on MIGRATION
 
-#
-# If we have space for more page flags then we can enable additional
-# optimizations and functionality.
-#
-# Regular Sparsemem takes page flag bits for the sectionid if it does not
-# use a virtual memmap. Disable extended page flags for 32 bit platforms
-# that require the use of a sectionid in the page flags.
-#
-config PAGEFLAGS_EXTENDED
-	def_bool y
-	depends on 64BIT || SPARSEMEM_VMEMMAP || !SPARSEMEM
-
 # Heavily threaded applications may benefit from splitting the mm-wide
 # page_table_lock, so that faults on different parts of the user address
 # space can be handled with less contention: split it at this NR_CPUS.
diff --git a/mm/debug.c b/mm/debug.c
index 76089ddf99ea..205e5ef957ab 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -25,12 +25,7 @@ static const struct trace_print_flags pageflag_names[] = {
 	{1UL << PG_private,		"private"	},
 	{1UL << PG_private_2,		"private_2"	},
 	{1UL << PG_writeback,		"writeback"	},
-#ifdef CONFIG_PAGEFLAGS_EXTENDED
 	{1UL << PG_head,		"head"		},
-	{1UL << PG_tail,		"tail"		},
-#else
-	{1UL << PG_compound,		"compound"	},
-#endif
 	{1UL << PG_swapcache,		"swapcache"	},
 	{1UL << PG_mappedtodisk,	"mappedtodisk"	},
 	{1UL << PG_reclaim,		"reclaim"	},
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 097c7a4bfbd9..330377f83ac7 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1686,8 +1686,7 @@ static void __split_huge_page_refcount(struct page *page,
 				      (1L << PG_unevictable)));
 		page_tail->flags |= (1L << PG_dirty);
 
-		/* clear PageTail before overwriting first_page */
-		smp_wmb();
+		clear_compound_head(page_tail);
 
 		/*
 		 * __split_huge_page_splitting() already set the
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8ea74caa1fa8..53c0709fd87b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -824,9 +824,8 @@ static void destroy_compound_gigantic_page(struct page *page,
 	struct page *p = page + 1;
 
 	for (i = 1; i < nr_pages; i++, p = mem_map_next(p, page, i)) {
-		__ClearPageTail(p);
+		clear_compound_head(p);
 		set_page_refcounted(p);
-		p->first_page = NULL;
 	}
 
 	set_compound_order(page, 0);
@@ -1099,10 +1098,7 @@ static void prep_compound_gigantic_page(struct page *page, unsigned long order)
 		 */
 		__ClearPageReserved(p);
 		set_page_count(p, 0);
-		p->first_page = page;
-		/* Make sure p->first_page is always valid for PageTail() */
-		smp_wmb();
-		__SetPageTail(p);
+		set_compound_head(p, page);
 	}
 }
 
diff --git a/mm/internal.h b/mm/internal.h
index 36b23f1e2ca6..89e21a07080a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -61,9 +61,9 @@ static inline void __get_page_tail_foll(struct page *page,
 	 * speculative page access (like in
 	 * page_cache_get_speculative()) on tail pages.
 	 */
-	VM_BUG_ON_PAGE(atomic_read(&page->first_page->_count) <= 0, page);
+	VM_BUG_ON_PAGE(atomic_read(&compound_head(page)->_count) <= 0, page);
 	if (get_page_head)
-		atomic_inc(&page->first_page->_count);
+		atomic_inc(&compound_head(page)->_count);
 	get_huge_page_tail(page);
 }
 
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 1f4446a90cef..4d1a5de9653d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -787,8 +787,6 @@ static int me_huge_page(struct page *p, unsigned long pfn)
 #define lru		(1UL << PG_lru)
 #define swapbacked	(1UL << PG_swapbacked)
 #define head		(1UL << PG_head)
-#define tail		(1UL << PG_tail)
-#define compound	(1UL << PG_compound)
 #define slab		(1UL << PG_slab)
 #define reserved	(1UL << PG_reserved)
 
@@ -811,12 +809,7 @@ static struct page_state {
 	 */
 	{ slab,		slab,		MF_MSG_SLAB,	me_kernel },
 
-#ifdef CONFIG_PAGEFLAGS_EXTENDED
 	{ head,		head,		MF_MSG_HUGE,		me_huge_page },
-	{ tail,		tail,		MF_MSG_HUGE,		me_huge_page },
-#else
-	{ compound,	compound,	MF_MSG_HUGE,		me_huge_page },
-#endif
 
 	{ sc|dirty,	sc|dirty,	MF_MSG_DIRTY_SWAPCACHE,	me_swapcache_dirty },
 	{ sc|dirty,	sc,		MF_MSG_CLEAN_SWAPCACHE,	me_swapcache_clean },
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index beab86e694b2..16c3f97a7d30 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -426,7 +426,7 @@ out:
  *
  * The remaining PAGE_SIZE pages are called "tail pages".
  *
- * All pages have PG_compound set.  All tail pages have their ->first_page
+ * All pages have PG_compound set.  All tail pages have their compound_head()
  * pointing at the head page.
  *
  * The first tail page's ->lru.next holds the address of the compound page's
@@ -450,10 +450,7 @@ void prep_compound_page(struct page *page, unsigned long order)
 	for (i = 1; i < nr_pages; i++) {
 		struct page *p = page + i;
 		set_page_count(p, 0);
-		p->first_page = page;
-		/* Make sure p->first_page is always valid for PageTail() */
-		smp_wmb();
-		__SetPageTail(p);
+		set_compound_head(p, page);
 	}
 }
 
@@ -828,17 +825,24 @@ static void free_one_page(struct zone *zone,
 
 static int free_tail_pages_check(struct page *head_page, struct page *page)
 {
-	if (!IS_ENABLED(CONFIG_DEBUG_VM))
-		return 0;
+	int ret = 1;
+
+	if (!IS_ENABLED(CONFIG_DEBUG_VM)) {
+		ret = 0;
+		goto out;
+	}
 	if (unlikely(!PageTail(page))) {
 		bad_page(page, "PageTail not set", 0);
-		return 1;
+		goto out;
 	}
-	if (unlikely(page->first_page != head_page)) {
-		bad_page(page, "first_page not consistent", 0);
-		return 1;
+	if (unlikely(compound_head(page) != head_page)) {
+		bad_page(page, "compound_head not consistent", 0);
+		goto out;
 	}
-	return 0;
+	ret = 0;
+out:
+	clear_compound_head(page);
+	return ret;
 }
 
 static void __meminit __init_single_page(struct page *page, unsigned long pfn,
diff --git a/mm/swap.c b/mm/swap.c
index a3a0a2f1f7c3..faa9e1687dea 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -200,7 +200,7 @@ out_put_single:
 				__put_single_page(page);
 			return;
 		}
-		VM_BUG_ON_PAGE(page_head != page->first_page, page);
+		VM_BUG_ON_PAGE(page_head != compound_head(page), page);
 		/*
 		 * We can release the refcount taken by
 		 * get_page_unless_zero() now that
@@ -261,7 +261,7 @@ static void put_compound_page(struct page *page)
 	 *  Case 3 is possible, as we may race with
 	 *  __split_huge_page_refcount tearing down a THP page.
 	 */
-	page_head = compound_head_by_tail(page);
+	page_head = compound_head(page);
 	if (!__compound_tail_refcounted(page_head))
 		put_unrefcounted_compound_page(page_head, page);
 	else
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCHv2 4/4] mm: make compound_head() robust
@ 2015-08-17 15:09   ` Kirill A. Shutemov
  0 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2015-08-17 15:09 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins
  Cc: Andrea Arcangeli, Dave Hansen, Vlastimil Babka, Johannes Weiner,
	Michal Hocko, David Rientjes, linux-kernel, linux-mm,
	Kirill A. Shutemov

Hugh has pointed that compound_head() call can be unsafe in some
context. There's one example:

	CPU0					CPU1

isolate_migratepages_block()
  page_count()
    compound_head()
      !!PageTail() == true
					put_page()
					  tail->first_page = NULL
      head = tail->first_page
					alloc_pages(__GFP_COMP)
					   prep_compound_page()
					     tail->first_page = head
					     __SetPageTail(p);
      !!PageTail() == true
    <head == NULL dereferencing>

The race is pure theoretical. I don't it's possible to trigger it in
practice. But who knows.

We can fix the race by changing how encode PageTail() and compound_head()
within struct page to be able to update them in one shot.

The patch introduces page->compound_head into third double word block in
front of compound_dtor and compound_order. That means it shares storage
space with:

 - page->lru.next;
 - page->next;
 - page->rcu_head.next;
 - page->pmd_huge_pte;

That's too long list to be absolutely sure, but looks like nobody uses
bit 0 of the word. It can be used to encode PageTail(). And if the bit
set, rest of the word is pointer to head page.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/vm/split_page_table_lock |  4 +-
 arch/xtensa/configs/iss_defconfig      |  1 -
 include/linux/mm.h                     | 53 ++--------------------
 include/linux/mm_types.h               |  8 +++-
 include/linux/page-flags.h             | 80 ++++++++--------------------------
 mm/Kconfig                             | 12 -----
 mm/debug.c                             |  5 ---
 mm/huge_memory.c                       |  3 +-
 mm/hugetlb.c                           |  8 +---
 mm/internal.h                          |  4 +-
 mm/memory-failure.c                    |  7 ---
 mm/page_alloc.c                        | 28 +++++++-----
 mm/swap.c                              |  4 +-
 13 files changed, 53 insertions(+), 164 deletions(-)

diff --git a/Documentation/vm/split_page_table_lock b/Documentation/vm/split_page_table_lock
index 6dea4fd5c961..62842a857dab 100644
--- a/Documentation/vm/split_page_table_lock
+++ b/Documentation/vm/split_page_table_lock
@@ -54,8 +54,8 @@ everything required is done by pgtable_page_ctor() and pgtable_page_dtor(),
 which must be called on PTE table allocation / freeing.
 
 Make sure the architecture doesn't use slab allocator for page table
-allocation: slab uses page->slab_cache and page->first_page for its pages.
-These fields share storage with page->ptl.
+allocation: slab uses page->slab_cache for its pages.
+This field shares storage with page->ptl.
 
 PMD split lock only makes sense if you have more than two page table
 levels.
diff --git a/arch/xtensa/configs/iss_defconfig b/arch/xtensa/configs/iss_defconfig
index e4d193e7a300..5c7c385f21c4 100644
--- a/arch/xtensa/configs/iss_defconfig
+++ b/arch/xtensa/configs/iss_defconfig
@@ -169,7 +169,6 @@ CONFIG_FLATMEM_MANUAL=y
 # CONFIG_SPARSEMEM_MANUAL is not set
 CONFIG_FLATMEM=y
 CONFIG_FLAT_NODE_MEM_MAP=y
-CONFIG_PAGEFLAGS_EXTENDED=y
 CONFIG_SPLIT_PTLOCK_CPUS=4
 # CONFIG_PHYS_ADDR_T_64BIT is not set
 CONFIG_ZONE_DMA_FLAG=1
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9c21bbb8875a..5090a0b9bb43 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -437,46 +437,6 @@ static inline void compound_unlock_irqrestore(struct page *page,
 #endif
 }
 
-static inline struct page *compound_head_by_tail(struct page *tail)
-{
-	struct page *head = tail->first_page;
-
-	/*
-	 * page->first_page may be a dangling pointer to an old
-	 * compound page, so recheck that it is still a tail
-	 * page before returning.
-	 */
-	smp_rmb();
-	if (likely(PageTail(tail)))
-		return head;
-	return tail;
-}
-
-/*
- * Since either compound page could be dismantled asynchronously in THP
- * or we access asynchronously arbitrary positioned struct page, there
- * would be tail flag race. To handle this race, we should call
- * smp_rmb() before checking tail flag. compound_head_by_tail() did it.
- */
-static inline struct page *compound_head(struct page *page)
-{
-	if (unlikely(PageTail(page)))
-		return compound_head_by_tail(page);
-	return page;
-}
-
-/*
- * If we access compound page synchronously such as access to
- * allocated page, there is no need to handle tail flag race, so we can
- * check tail flag directly without any synchronization primitive.
- */
-static inline struct page *compound_head_fast(struct page *page)
-{
-	if (unlikely(PageTail(page)))
-		return page->first_page;
-	return page;
-}
-
 /*
  * The atomic page->_mapcount, starts from -1: so that transitions
  * both from it and to it can be tracked, using atomic_inc_and_test
@@ -525,7 +485,7 @@ static inline void get_huge_page_tail(struct page *page)
 	VM_BUG_ON_PAGE(!PageTail(page), page);
 	VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
 	VM_BUG_ON_PAGE(atomic_read(&page->_count) != 0, page);
-	if (compound_tail_refcounted(page->first_page))
+	if (compound_tail_refcounted(compound_head(page)))
 		atomic_inc(&page->_mapcount);
 }
 
@@ -548,13 +508,7 @@ static inline struct page *virt_to_head_page(const void *x)
 {
 	struct page *page = virt_to_page(x);
 
-	/*
-	 * We don't need to worry about synchronization of tail flag
-	 * when we call virt_to_head_page() since it is only called for
-	 * already allocated page and this page won't be freed until
-	 * this virt_to_head_page() is finished. So use _fast variant.
-	 */
-	return compound_head_fast(page);
+	return compound_head(page);
 }
 
 /*
@@ -1494,8 +1448,7 @@ static inline bool ptlock_init(struct page *page)
 	 * with 0. Make sure nobody took it in use in between.
 	 *
 	 * It can happen if arch try to use slab for page table allocation:
-	 * slab code uses page->slab_cache and page->first_page (for tail
-	 * pages), which share storage with page->ptl.
+	 * slab code uses page->slab_cache, which share storage with page->ptl.
 	 */
 	VM_BUG_ON_PAGE(*(unsigned long *)&page->ptl, page);
 	if (!ptlock_alloc(page))
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 63cdfe7ec336..ad47b61d96ce 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -120,7 +120,12 @@ struct page {
 		};
 	};
 
-	/* Third double word block */
+	/*
+	 * Third double word block
+	 *
+	 * WARNING: bit 0 of the first word encode PageTail and *must* be 0
+	 * for non-tail pages.
+	 */
 	union {
 		struct list_head lru;	/* Pageout list, eg. active_list
 					 * protected by zone->lru_lock !
@@ -143,6 +148,7 @@ struct page {
 						 */
 		/* First tail page of compound page */
 		struct {
+			unsigned long compound_head; /* If bit zero is set */
 #ifdef CONFIG_64BIT
 			unsigned int compound_dtor;
 			unsigned int compound_order;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 41c93844fb1d..9b865158e452 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -86,12 +86,7 @@ enum pageflags {
 	PG_private,		/* If pagecache, has fs-private data */
 	PG_private_2,		/* If pagecache, has fs aux data */
 	PG_writeback,		/* Page is under writeback */
-#ifdef CONFIG_PAGEFLAGS_EXTENDED
 	PG_head,		/* A head page */
-	PG_tail,		/* A tail page */
-#else
-	PG_compound,		/* A compound page */
-#endif
 	PG_swapcache,		/* Swap page: swp_entry_t in private */
 	PG_mappedtodisk,	/* Has blocks allocated on-disk */
 	PG_reclaim,		/* To be reclaimed asap */
@@ -387,85 +382,46 @@ static inline void set_page_writeback_keepwrite(struct page *page)
 	test_set_page_writeback_keepwrite(page);
 }
 
-#ifdef CONFIG_PAGEFLAGS_EXTENDED
-/*
- * System with lots of page flags available. This allows separate
- * flags for PageHead() and PageTail() checks of compound pages so that bit
- * tests can be used in performance sensitive paths. PageCompound is
- * generally not used in hot code paths except arch/powerpc/mm/init_64.c
- * and arch/powerpc/kvm/book3s_64_vio_hv.c which use it to detect huge pages
- * and avoid handling those in real mode.
- */
 __PAGEFLAG(Head, head) CLEARPAGEFLAG(Head, head)
-__PAGEFLAG(Tail, tail)
 
-static inline int PageCompound(struct page *page)
-{
-	return page->flags & ((1L << PG_head) | (1L << PG_tail));
-
-}
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static inline void ClearPageCompound(struct page *page)
+static inline int PageTail(struct page *page)
 {
-	BUG_ON(!PageHead(page));
-	ClearPageHead(page);
+	return READ_ONCE(page->compound_head) & 1;
 }
-#endif
-
-#define PG_head_mask ((1L << PG_head))
 
-#else
-/*
- * Reduce page flag use as much as possible by overlapping
- * compound page flags with the flags used for page cache pages. Possible
- * because PageCompound is always set for compound pages and not for
- * pages on the LRU and/or pagecache.
- */
-TESTPAGEFLAG(Compound, compound)
-__SETPAGEFLAG(Head, compound)  __CLEARPAGEFLAG(Head, compound)
-
-/*
- * PG_reclaim is used in combination with PG_compound to mark the
- * head and tail of a compound page. This saves one page flag
- * but makes it impossible to use compound pages for the page cache.
- * The PG_reclaim bit would have to be used for reclaim or readahead
- * if compound pages enter the page cache.
- *
- * PG_compound & PG_reclaim	=> Tail page
- * PG_compound & ~PG_reclaim	=> Head page
- */
-#define PG_head_mask ((1L << PG_compound))
-#define PG_head_tail_mask ((1L << PG_compound) | (1L << PG_reclaim))
-
-static inline int PageHead(struct page *page)
+static inline void set_compound_head(struct page *page, struct page *head)
 {
-	return ((page->flags & PG_head_tail_mask) == PG_head_mask);
+	WRITE_ONCE(page->compound_head, (unsigned long)head + 1);
 }
 
-static inline int PageTail(struct page *page)
+static inline void clear_compound_head(struct page *page)
 {
-	return ((page->flags & PG_head_tail_mask) == PG_head_tail_mask);
+	WRITE_ONCE(page->compound_head, 0);
 }
 
-static inline void __SetPageTail(struct page *page)
+static inline struct page *compound_head(struct page *page)
 {
-	page->flags |= PG_head_tail_mask;
+	unsigned long head = READ_ONCE(page->compound_head);
+
+	if (unlikely(head & 1))
+		return (struct page *) (head - 1);
+	return page;
 }
 
-static inline void __ClearPageTail(struct page *page)
+static inline int PageCompound(struct page *page)
 {
-	page->flags &= ~PG_head_tail_mask;
-}
+	return PageHead(page) || PageTail(page);
 
+}
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static inline void ClearPageCompound(struct page *page)
 {
-	BUG_ON((page->flags & PG_head_tail_mask) != (1 << PG_compound));
-	clear_bit(PG_compound, &page->flags);
+	BUG_ON(!PageHead(page));
+	ClearPageHead(page);
 }
 #endif
 
-#endif /* !PAGEFLAGS_EXTENDED */
+#define PG_head_mask ((1L << PG_head))
 
 #ifdef CONFIG_HUGETLB_PAGE
 int PageHuge(struct page *page);
diff --git a/mm/Kconfig b/mm/Kconfig
index e79de2bd12cd..454579d31081 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -200,18 +200,6 @@ config MEMORY_HOTREMOVE
 	depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE
 	depends on MIGRATION
 
-#
-# If we have space for more page flags then we can enable additional
-# optimizations and functionality.
-#
-# Regular Sparsemem takes page flag bits for the sectionid if it does not
-# use a virtual memmap. Disable extended page flags for 32 bit platforms
-# that require the use of a sectionid in the page flags.
-#
-config PAGEFLAGS_EXTENDED
-	def_bool y
-	depends on 64BIT || SPARSEMEM_VMEMMAP || !SPARSEMEM
-
 # Heavily threaded applications may benefit from splitting the mm-wide
 # page_table_lock, so that faults on different parts of the user address
 # space can be handled with less contention: split it at this NR_CPUS.
diff --git a/mm/debug.c b/mm/debug.c
index 76089ddf99ea..205e5ef957ab 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -25,12 +25,7 @@ static const struct trace_print_flags pageflag_names[] = {
 	{1UL << PG_private,		"private"	},
 	{1UL << PG_private_2,		"private_2"	},
 	{1UL << PG_writeback,		"writeback"	},
-#ifdef CONFIG_PAGEFLAGS_EXTENDED
 	{1UL << PG_head,		"head"		},
-	{1UL << PG_tail,		"tail"		},
-#else
-	{1UL << PG_compound,		"compound"	},
-#endif
 	{1UL << PG_swapcache,		"swapcache"	},
 	{1UL << PG_mappedtodisk,	"mappedtodisk"	},
 	{1UL << PG_reclaim,		"reclaim"	},
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 097c7a4bfbd9..330377f83ac7 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1686,8 +1686,7 @@ static void __split_huge_page_refcount(struct page *page,
 				      (1L << PG_unevictable)));
 		page_tail->flags |= (1L << PG_dirty);
 
-		/* clear PageTail before overwriting first_page */
-		smp_wmb();
+		clear_compound_head(page_tail);
 
 		/*
 		 * __split_huge_page_splitting() already set the
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8ea74caa1fa8..53c0709fd87b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -824,9 +824,8 @@ static void destroy_compound_gigantic_page(struct page *page,
 	struct page *p = page + 1;
 
 	for (i = 1; i < nr_pages; i++, p = mem_map_next(p, page, i)) {
-		__ClearPageTail(p);
+		clear_compound_head(p);
 		set_page_refcounted(p);
-		p->first_page = NULL;
 	}
 
 	set_compound_order(page, 0);
@@ -1099,10 +1098,7 @@ static void prep_compound_gigantic_page(struct page *page, unsigned long order)
 		 */
 		__ClearPageReserved(p);
 		set_page_count(p, 0);
-		p->first_page = page;
-		/* Make sure p->first_page is always valid for PageTail() */
-		smp_wmb();
-		__SetPageTail(p);
+		set_compound_head(p, page);
 	}
 }
 
diff --git a/mm/internal.h b/mm/internal.h
index 36b23f1e2ca6..89e21a07080a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -61,9 +61,9 @@ static inline void __get_page_tail_foll(struct page *page,
 	 * speculative page access (like in
 	 * page_cache_get_speculative()) on tail pages.
 	 */
-	VM_BUG_ON_PAGE(atomic_read(&page->first_page->_count) <= 0, page);
+	VM_BUG_ON_PAGE(atomic_read(&compound_head(page)->_count) <= 0, page);
 	if (get_page_head)
-		atomic_inc(&page->first_page->_count);
+		atomic_inc(&compound_head(page)->_count);
 	get_huge_page_tail(page);
 }
 
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 1f4446a90cef..4d1a5de9653d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -787,8 +787,6 @@ static int me_huge_page(struct page *p, unsigned long pfn)
 #define lru		(1UL << PG_lru)
 #define swapbacked	(1UL << PG_swapbacked)
 #define head		(1UL << PG_head)
-#define tail		(1UL << PG_tail)
-#define compound	(1UL << PG_compound)
 #define slab		(1UL << PG_slab)
 #define reserved	(1UL << PG_reserved)
 
@@ -811,12 +809,7 @@ static struct page_state {
 	 */
 	{ slab,		slab,		MF_MSG_SLAB,	me_kernel },
 
-#ifdef CONFIG_PAGEFLAGS_EXTENDED
 	{ head,		head,		MF_MSG_HUGE,		me_huge_page },
-	{ tail,		tail,		MF_MSG_HUGE,		me_huge_page },
-#else
-	{ compound,	compound,	MF_MSG_HUGE,		me_huge_page },
-#endif
 
 	{ sc|dirty,	sc|dirty,	MF_MSG_DIRTY_SWAPCACHE,	me_swapcache_dirty },
 	{ sc|dirty,	sc,		MF_MSG_CLEAN_SWAPCACHE,	me_swapcache_clean },
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index beab86e694b2..16c3f97a7d30 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -426,7 +426,7 @@ out:
  *
  * The remaining PAGE_SIZE pages are called "tail pages".
  *
- * All pages have PG_compound set.  All tail pages have their ->first_page
+ * All pages have PG_compound set.  All tail pages have their compound_head()
  * pointing at the head page.
  *
  * The first tail page's ->lru.next holds the address of the compound page's
@@ -450,10 +450,7 @@ void prep_compound_page(struct page *page, unsigned long order)
 	for (i = 1; i < nr_pages; i++) {
 		struct page *p = page + i;
 		set_page_count(p, 0);
-		p->first_page = page;
-		/* Make sure p->first_page is always valid for PageTail() */
-		smp_wmb();
-		__SetPageTail(p);
+		set_compound_head(p, page);
 	}
 }
 
@@ -828,17 +825,24 @@ static void free_one_page(struct zone *zone,
 
 static int free_tail_pages_check(struct page *head_page, struct page *page)
 {
-	if (!IS_ENABLED(CONFIG_DEBUG_VM))
-		return 0;
+	int ret = 1;
+
+	if (!IS_ENABLED(CONFIG_DEBUG_VM)) {
+		ret = 0;
+		goto out;
+	}
 	if (unlikely(!PageTail(page))) {
 		bad_page(page, "PageTail not set", 0);
-		return 1;
+		goto out;
 	}
-	if (unlikely(page->first_page != head_page)) {
-		bad_page(page, "first_page not consistent", 0);
-		return 1;
+	if (unlikely(compound_head(page) != head_page)) {
+		bad_page(page, "compound_head not consistent", 0);
+		goto out;
 	}
-	return 0;
+	ret = 0;
+out:
+	clear_compound_head(page);
+	return ret;
 }
 
 static void __meminit __init_single_page(struct page *page, unsigned long pfn,
diff --git a/mm/swap.c b/mm/swap.c
index a3a0a2f1f7c3..faa9e1687dea 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -200,7 +200,7 @@ out_put_single:
 				__put_single_page(page);
 			return;
 		}
-		VM_BUG_ON_PAGE(page_head != page->first_page, page);
+		VM_BUG_ON_PAGE(page_head != compound_head(page), page);
 		/*
 		 * We can release the refcount taken by
 		 * get_page_unless_zero() now that
@@ -261,7 +261,7 @@ static void put_compound_page(struct page *page)
 	 *  Case 3 is possible, as we may race with
 	 *  __split_huge_page_refcount tearing down a THP page.
 	 */
-	page_head = compound_head_by_tail(page);
+	page_head = compound_head(page);
 	if (!__compound_tail_refcounted(page_head))
 		put_unrefcounted_compound_page(page_head, page);
 	else
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 3/4] mm: pack compound_dtor and compound_order into one word in struct page
  2015-08-17 15:09   ` Kirill A. Shutemov
@ 2015-08-17 22:59     ` Hugh Dickins
  -1 siblings, 0 replies; 34+ messages in thread
From: Hugh Dickins @ 2015-08-17 22:59 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Hugh Dickins, Andrea Arcangeli, Dave Hansen,
	Vlastimil Babka, Johannes Weiner, Michal Hocko, David Rientjes,
	linux-kernel, linux-mm

On Mon, 17 Aug 2015, Kirill A. Shutemov wrote:

> The patch halves space occupied by compound_dtor and compound_order in
> struct page.
> 
> For compound_order, it's trivial long -> int/short conversion.
> 
> For get_compound_page_dtor(), we now use hardcoded table for destructor
> lookup and store its index in the struct page instead of direct pointer
> to destructor. It shouldn't be a big trouble to maintain the table: we
> have only two destructor and NULL currently.
> 
> This patch free up one word in tail pages for reuse. This is preparation
> for the next patch.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Well, yes, that is one way of doing it.  But I'd have thought the time
for complicating it, instead of simplifying it with direct calls,
would be when someone adds another destructor.  Up to Andrew.

> ---
>  include/linux/mm.h       | 22 +++++++++++++++++-----
>  include/linux/mm_types.h | 11 +++++++----
>  mm/hugetlb.c             |  8 ++++----
>  mm/page_alloc.c          |  9 ++++++++-
>  4 files changed, 36 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 2e872f92dbac..9c21bbb8875a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -575,18 +575,30 @@ int split_free_page(struct page *page);
>  /*
>   * Compound pages have a destructor function.  Provide a
>   * prototype for that function and accessor functions.
> - * These are _only_ valid on the head of a PG_compound page.
> + * These are _only_ valid on the head of a compound page.
>   */
> +typedef void compound_page_dtor(struct page *);
> +
> +/* Keep the enum in sync with compound_page_dtors array in mm/page_alloc.c */
> +enum {
> +	NULL_COMPOUND_DTOR,
> +	COMPOUND_PAGE_DTOR,
> +	HUGETLB_PAGE_DTOR,
> +	NR_COMPOUND_DTORS,
> +};
> +extern compound_page_dtor * const compound_page_dtors[];
>  
>  static inline void set_compound_page_dtor(struct page *page,
> -						compound_page_dtor *dtor)
> +		unsigned int compound_dtor)
>  {
> -	page[1].compound_dtor = dtor;
> +	VM_BUG_ON_PAGE(compound_dtor >= NR_COMPOUND_DTORS, page);
> +	page[1].compound_dtor = compound_dtor;
>  }
>  
>  static inline compound_page_dtor *get_compound_page_dtor(struct page *page)
>  {
> -	return page[1].compound_dtor;
> +	VM_BUG_ON_PAGE(page[1].compound_dtor >= NR_COMPOUND_DTORS, page);
> +	return compound_page_dtors[page[1].compound_dtor];
>  }
>  
>  static inline int compound_order(struct page *page)
> @@ -596,7 +608,7 @@ static inline int compound_order(struct page *page)
>  	return page[1].compound_order;
>  }
>  
> -static inline void set_compound_order(struct page *page, unsigned long order)
> +static inline void set_compound_order(struct page *page, unsigned int order)
>  {
>  	page[1].compound_order = order;
>  }
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 58620ac7f15c..63cdfe7ec336 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -28,8 +28,6 @@ struct mem_cgroup;
>  		IS_ENABLED(CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK))
>  #define ALLOC_SPLIT_PTLOCKS	(SPINLOCK_SIZE > BITS_PER_LONG/8)
>  
> -typedef void compound_page_dtor(struct page *);
> -
>  /*
>   * Each physical page in the system has a struct page associated with
>   * it to keep track of whatever it is we are using the page for at the
> @@ -145,8 +143,13 @@ struct page {
>  						 */
>  		/* First tail page of compound page */
>  		struct {
> -			compound_page_dtor *compound_dtor;
> -			unsigned long compound_order;
> +#ifdef CONFIG_64BIT
> +			unsigned int compound_dtor;
> +			unsigned int compound_order;
> +#else
> +			unsigned short int compound_dtor;
> +			unsigned short int compound_order;
> +#endif
>  		};
>  
>  #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index a8c3087089d8..8ea74caa1fa8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -969,7 +969,7 @@ static void update_and_free_page(struct hstate *h, struct page *page)
>  				1 << PG_writeback);
>  	}
>  	VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page), page);
> -	set_compound_page_dtor(page, NULL);
> +	set_compound_page_dtor(page, NULL_COMPOUND_DTOR);
>  	set_page_refcounted(page);
>  	if (hstate_is_gigantic(h)) {
>  		destroy_compound_gigantic_page(page, huge_page_order(h));
> @@ -1065,7 +1065,7 @@ void free_huge_page(struct page *page)
>  static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
>  {
>  	INIT_LIST_HEAD(&page->lru);
> -	set_compound_page_dtor(page, free_huge_page);
> +	set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
>  	spin_lock(&hugetlb_lock);
>  	set_hugetlb_cgroup(page, NULL);
>  	h->nr_huge_pages++;
> @@ -1117,7 +1117,7 @@ int PageHuge(struct page *page)
>  		return 0;
>  
>  	page = compound_head(page);
> -	return get_compound_page_dtor(page) == free_huge_page;
> +	return page[1].compound_dtor == HUGETLB_PAGE_DTOR;
>  }
>  EXPORT_SYMBOL_GPL(PageHuge);
>  
> @@ -1314,7 +1314,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
>  	if (page) {
>  		INIT_LIST_HEAD(&page->lru);
>  		r_nid = page_to_nid(page);
> -		set_compound_page_dtor(page, free_huge_page);
> +		set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
>  		set_hugetlb_cgroup(page, NULL);
>  		/*
>  		 * We incremented the global counters already
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index df959b7d6085..beab86e694b2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -208,6 +208,13 @@ static char * const zone_names[MAX_NR_ZONES] = {
>  	 "Movable",
>  };
>  
> +static void free_compound_page(struct page *page);
> +compound_page_dtor * const compound_page_dtors[] = {
> +	NULL,
> +	free_compound_page,
> +	free_huge_page,
> +};
> +
>  int min_free_kbytes = 1024;
>  int user_min_free_kbytes = -1;
>  
> @@ -437,7 +444,7 @@ void prep_compound_page(struct page *page, unsigned long order)
>  	int i;
>  	int nr_pages = 1 << order;
>  
> -	set_compound_page_dtor(page, free_compound_page);
> +	set_compound_page_dtor(page, COMPOUND_PAGE_DTOR);
>  	set_compound_order(page, order);
>  	__SetPageHead(page);
>  	for (i = 1; i < nr_pages; i++) {
> -- 
> 2.5.0
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 3/4] mm: pack compound_dtor and compound_order into one word in struct page
@ 2015-08-17 22:59     ` Hugh Dickins
  0 siblings, 0 replies; 34+ messages in thread
From: Hugh Dickins @ 2015-08-17 22:59 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Hugh Dickins, Andrea Arcangeli, Dave Hansen,
	Vlastimil Babka, Johannes Weiner, Michal Hocko, David Rientjes,
	linux-kernel, linux-mm

On Mon, 17 Aug 2015, Kirill A. Shutemov wrote:

> The patch halves space occupied by compound_dtor and compound_order in
> struct page.
> 
> For compound_order, it's trivial long -> int/short conversion.
> 
> For get_compound_page_dtor(), we now use hardcoded table for destructor
> lookup and store its index in the struct page instead of direct pointer
> to destructor. It shouldn't be a big trouble to maintain the table: we
> have only two destructor and NULL currently.
> 
> This patch free up one word in tail pages for reuse. This is preparation
> for the next patch.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Well, yes, that is one way of doing it.  But I'd have thought the time
for complicating it, instead of simplifying it with direct calls,
would be when someone adds another destructor.  Up to Andrew.

> ---
>  include/linux/mm.h       | 22 +++++++++++++++++-----
>  include/linux/mm_types.h | 11 +++++++----
>  mm/hugetlb.c             |  8 ++++----
>  mm/page_alloc.c          |  9 ++++++++-
>  4 files changed, 36 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 2e872f92dbac..9c21bbb8875a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -575,18 +575,30 @@ int split_free_page(struct page *page);
>  /*
>   * Compound pages have a destructor function.  Provide a
>   * prototype for that function and accessor functions.
> - * These are _only_ valid on the head of a PG_compound page.
> + * These are _only_ valid on the head of a compound page.
>   */
> +typedef void compound_page_dtor(struct page *);
> +
> +/* Keep the enum in sync with compound_page_dtors array in mm/page_alloc.c */
> +enum {
> +	NULL_COMPOUND_DTOR,
> +	COMPOUND_PAGE_DTOR,
> +	HUGETLB_PAGE_DTOR,
> +	NR_COMPOUND_DTORS,
> +};
> +extern compound_page_dtor * const compound_page_dtors[];
>  
>  static inline void set_compound_page_dtor(struct page *page,
> -						compound_page_dtor *dtor)
> +		unsigned int compound_dtor)
>  {
> -	page[1].compound_dtor = dtor;
> +	VM_BUG_ON_PAGE(compound_dtor >= NR_COMPOUND_DTORS, page);
> +	page[1].compound_dtor = compound_dtor;
>  }
>  
>  static inline compound_page_dtor *get_compound_page_dtor(struct page *page)
>  {
> -	return page[1].compound_dtor;
> +	VM_BUG_ON_PAGE(page[1].compound_dtor >= NR_COMPOUND_DTORS, page);
> +	return compound_page_dtors[page[1].compound_dtor];
>  }
>  
>  static inline int compound_order(struct page *page)
> @@ -596,7 +608,7 @@ static inline int compound_order(struct page *page)
>  	return page[1].compound_order;
>  }
>  
> -static inline void set_compound_order(struct page *page, unsigned long order)
> +static inline void set_compound_order(struct page *page, unsigned int order)
>  {
>  	page[1].compound_order = order;
>  }
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 58620ac7f15c..63cdfe7ec336 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -28,8 +28,6 @@ struct mem_cgroup;
>  		IS_ENABLED(CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK))
>  #define ALLOC_SPLIT_PTLOCKS	(SPINLOCK_SIZE > BITS_PER_LONG/8)
>  
> -typedef void compound_page_dtor(struct page *);
> -
>  /*
>   * Each physical page in the system has a struct page associated with
>   * it to keep track of whatever it is we are using the page for at the
> @@ -145,8 +143,13 @@ struct page {
>  						 */
>  		/* First tail page of compound page */
>  		struct {
> -			compound_page_dtor *compound_dtor;
> -			unsigned long compound_order;
> +#ifdef CONFIG_64BIT
> +			unsigned int compound_dtor;
> +			unsigned int compound_order;
> +#else
> +			unsigned short int compound_dtor;
> +			unsigned short int compound_order;
> +#endif
>  		};
>  
>  #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index a8c3087089d8..8ea74caa1fa8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -969,7 +969,7 @@ static void update_and_free_page(struct hstate *h, struct page *page)
>  				1 << PG_writeback);
>  	}
>  	VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page), page);
> -	set_compound_page_dtor(page, NULL);
> +	set_compound_page_dtor(page, NULL_COMPOUND_DTOR);
>  	set_page_refcounted(page);
>  	if (hstate_is_gigantic(h)) {
>  		destroy_compound_gigantic_page(page, huge_page_order(h));
> @@ -1065,7 +1065,7 @@ void free_huge_page(struct page *page)
>  static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
>  {
>  	INIT_LIST_HEAD(&page->lru);
> -	set_compound_page_dtor(page, free_huge_page);
> +	set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
>  	spin_lock(&hugetlb_lock);
>  	set_hugetlb_cgroup(page, NULL);
>  	h->nr_huge_pages++;
> @@ -1117,7 +1117,7 @@ int PageHuge(struct page *page)
>  		return 0;
>  
>  	page = compound_head(page);
> -	return get_compound_page_dtor(page) == free_huge_page;
> +	return page[1].compound_dtor == HUGETLB_PAGE_DTOR;
>  }
>  EXPORT_SYMBOL_GPL(PageHuge);
>  
> @@ -1314,7 +1314,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
>  	if (page) {
>  		INIT_LIST_HEAD(&page->lru);
>  		r_nid = page_to_nid(page);
> -		set_compound_page_dtor(page, free_huge_page);
> +		set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
>  		set_hugetlb_cgroup(page, NULL);
>  		/*
>  		 * We incremented the global counters already
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index df959b7d6085..beab86e694b2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -208,6 +208,13 @@ static char * const zone_names[MAX_NR_ZONES] = {
>  	 "Movable",
>  };
>  
> +static void free_compound_page(struct page *page);
> +compound_page_dtor * const compound_page_dtors[] = {
> +	NULL,
> +	free_compound_page,
> +	free_huge_page,
> +};
> +
>  int min_free_kbytes = 1024;
>  int user_min_free_kbytes = -1;
>  
> @@ -437,7 +444,7 @@ void prep_compound_page(struct page *page, unsigned long order)
>  	int i;
>  	int nr_pages = 1 << order;
>  
> -	set_compound_page_dtor(page, free_compound_page);
> +	set_compound_page_dtor(page, COMPOUND_PAGE_DTOR);
>  	set_compound_order(page, order);
>  	__SetPageHead(page);
>  	for (i = 1; i < nr_pages; i++) {
> -- 
> 2.5.0
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 1/4] mm: drop page->slab_page
  2015-08-17 15:09   ` Kirill A. Shutemov
@ 2015-08-18  0:43     ` David Rientjes
  -1 siblings, 0 replies; 34+ messages in thread
From: David Rientjes @ 2015-08-18  0:43 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Hugh Dickins, Andrea Arcangeli, Dave Hansen,
	Vlastimil Babka, Johannes Weiner, Michal Hocko, linux-kernel,
	linux-mm, Joonsoo Kim, Andi Kleen, Christoph Lameter

On Mon, 17 Aug 2015, Kirill A. Shutemov wrote:

> Since 8456a648cf44 ("slab: use struct page for slab management") nobody
> uses slab_page field in struct page.
> 
> Let's drop it.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Christoph Lameter <cl@linux.com>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 1/4] mm: drop page->slab_page
@ 2015-08-18  0:43     ` David Rientjes
  0 siblings, 0 replies; 34+ messages in thread
From: David Rientjes @ 2015-08-18  0:43 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Hugh Dickins, Andrea Arcangeli, Dave Hansen,
	Vlastimil Babka, Johannes Weiner, Michal Hocko, linux-kernel,
	linux-mm, Joonsoo Kim, Andi Kleen, Christoph Lameter

On Mon, 17 Aug 2015, Kirill A. Shutemov wrote:

> Since 8456a648cf44 ("slab: use struct page for slab management") nobody
> uses slab_page field in struct page.
> 
> Let's drop it.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Christoph Lameter <cl@linux.com>

Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 4/4] mm: make compound_head() robust
  2015-08-17 15:09   ` Kirill A. Shutemov
@ 2015-08-18 11:20     ` Michal Hocko
  -1 siblings, 0 replies; 34+ messages in thread
From: Michal Hocko @ 2015-08-18 11:20 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Hugh Dickins, Andrea Arcangeli, Dave Hansen,
	Vlastimil Babka, Johannes Weiner, David Rientjes, linux-kernel,
	linux-mm

On Mon 17-08-15 18:09:05, Kirill A. Shutemov wrote:
> Hugh has pointed that compound_head() call can be unsafe in some
> context. There's one example:
> 
> 	CPU0					CPU1
> 
> isolate_migratepages_block()
>   page_count()
>     compound_head()
>       !!PageTail() == true
> 					put_page()
> 					  tail->first_page = NULL
>       head = tail->first_page
> 					alloc_pages(__GFP_COMP)
> 					   prep_compound_page()
> 					     tail->first_page = head
> 					     __SetPageTail(p);
>       !!PageTail() == true
>     <head == NULL dereferencing>
> 
> The race is pure theoretical. I don't it's possible to trigger it in
> practice. But who knows.
> 
> We can fix the race by changing how encode PageTail() and compound_head()
> within struct page to be able to update them in one shot.
> 
> The patch introduces page->compound_head into third double word block in
> front of compound_dtor and compound_order. That means it shares storage
> space with:
> 
>  - page->lru.next;
>  - page->next;
>  - page->rcu_head.next;
>  - page->pmd_huge_pte;
> 
> That's too long list to be absolutely sure, but looks like nobody uses
> bit 0 of the word. It can be used to encode PageTail(). And if the bit
> set, rest of the word is pointer to head page.

I didn't look too closely but the general idea makes sense to me and the
overal code simplification is sound. I will give it more detailed review
after I sort out other stuff.

> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  Documentation/vm/split_page_table_lock |  4 +-
>  arch/xtensa/configs/iss_defconfig      |  1 -
>  include/linux/mm.h                     | 53 ++--------------------
>  include/linux/mm_types.h               |  8 +++-
>  include/linux/page-flags.h             | 80 ++++++++--------------------------
>  mm/Kconfig                             | 12 -----
>  mm/debug.c                             |  5 ---
>  mm/huge_memory.c                       |  3 +-
>  mm/hugetlb.c                           |  8 +---
>  mm/internal.h                          |  4 +-
>  mm/memory-failure.c                    |  7 ---
>  mm/page_alloc.c                        | 28 +++++++-----
>  mm/swap.c                              |  4 +-
>  13 files changed, 53 insertions(+), 164 deletions(-)
> 
> diff --git a/Documentation/vm/split_page_table_lock b/Documentation/vm/split_page_table_lock
> index 6dea4fd5c961..62842a857dab 100644
> --- a/Documentation/vm/split_page_table_lock
> +++ b/Documentation/vm/split_page_table_lock
> @@ -54,8 +54,8 @@ everything required is done by pgtable_page_ctor() and pgtable_page_dtor(),
>  which must be called on PTE table allocation / freeing.
>  
>  Make sure the architecture doesn't use slab allocator for page table
> -allocation: slab uses page->slab_cache and page->first_page for its pages.
> -These fields share storage with page->ptl.
> +allocation: slab uses page->slab_cache for its pages.
> +This field shares storage with page->ptl.
>  
>  PMD split lock only makes sense if you have more than two page table
>  levels.
> diff --git a/arch/xtensa/configs/iss_defconfig b/arch/xtensa/configs/iss_defconfig
> index e4d193e7a300..5c7c385f21c4 100644
> --- a/arch/xtensa/configs/iss_defconfig
> +++ b/arch/xtensa/configs/iss_defconfig
> @@ -169,7 +169,6 @@ CONFIG_FLATMEM_MANUAL=y
>  # CONFIG_SPARSEMEM_MANUAL is not set
>  CONFIG_FLATMEM=y
>  CONFIG_FLAT_NODE_MEM_MAP=y
> -CONFIG_PAGEFLAGS_EXTENDED=y
>  CONFIG_SPLIT_PTLOCK_CPUS=4
>  # CONFIG_PHYS_ADDR_T_64BIT is not set
>  CONFIG_ZONE_DMA_FLAG=1
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 9c21bbb8875a..5090a0b9bb43 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -437,46 +437,6 @@ static inline void compound_unlock_irqrestore(struct page *page,
>  #endif
>  }
>  
> -static inline struct page *compound_head_by_tail(struct page *tail)
> -{
> -	struct page *head = tail->first_page;
> -
> -	/*
> -	 * page->first_page may be a dangling pointer to an old
> -	 * compound page, so recheck that it is still a tail
> -	 * page before returning.
> -	 */
> -	smp_rmb();
> -	if (likely(PageTail(tail)))
> -		return head;
> -	return tail;
> -}
> -
> -/*
> - * Since either compound page could be dismantled asynchronously in THP
> - * or we access asynchronously arbitrary positioned struct page, there
> - * would be tail flag race. To handle this race, we should call
> - * smp_rmb() before checking tail flag. compound_head_by_tail() did it.
> - */
> -static inline struct page *compound_head(struct page *page)
> -{
> -	if (unlikely(PageTail(page)))
> -		return compound_head_by_tail(page);
> -	return page;
> -}
> -
> -/*
> - * If we access compound page synchronously such as access to
> - * allocated page, there is no need to handle tail flag race, so we can
> - * check tail flag directly without any synchronization primitive.
> - */
> -static inline struct page *compound_head_fast(struct page *page)
> -{
> -	if (unlikely(PageTail(page)))
> -		return page->first_page;
> -	return page;
> -}
> -
>  /*
>   * The atomic page->_mapcount, starts from -1: so that transitions
>   * both from it and to it can be tracked, using atomic_inc_and_test
> @@ -525,7 +485,7 @@ static inline void get_huge_page_tail(struct page *page)
>  	VM_BUG_ON_PAGE(!PageTail(page), page);
>  	VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
>  	VM_BUG_ON_PAGE(atomic_read(&page->_count) != 0, page);
> -	if (compound_tail_refcounted(page->first_page))
> +	if (compound_tail_refcounted(compound_head(page)))
>  		atomic_inc(&page->_mapcount);
>  }
>  
> @@ -548,13 +508,7 @@ static inline struct page *virt_to_head_page(const void *x)
>  {
>  	struct page *page = virt_to_page(x);
>  
> -	/*
> -	 * We don't need to worry about synchronization of tail flag
> -	 * when we call virt_to_head_page() since it is only called for
> -	 * already allocated page and this page won't be freed until
> -	 * this virt_to_head_page() is finished. So use _fast variant.
> -	 */
> -	return compound_head_fast(page);
> +	return compound_head(page);
>  }
>  
>  /*
> @@ -1494,8 +1448,7 @@ static inline bool ptlock_init(struct page *page)
>  	 * with 0. Make sure nobody took it in use in between.
>  	 *
>  	 * It can happen if arch try to use slab for page table allocation:
> -	 * slab code uses page->slab_cache and page->first_page (for tail
> -	 * pages), which share storage with page->ptl.
> +	 * slab code uses page->slab_cache, which share storage with page->ptl.
>  	 */
>  	VM_BUG_ON_PAGE(*(unsigned long *)&page->ptl, page);
>  	if (!ptlock_alloc(page))
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 63cdfe7ec336..ad47b61d96ce 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -120,7 +120,12 @@ struct page {
>  		};
>  	};
>  
> -	/* Third double word block */
> +	/*
> +	 * Third double word block
> +	 *
> +	 * WARNING: bit 0 of the first word encode PageTail and *must* be 0
> +	 * for non-tail pages.
> +	 */
>  	union {
>  		struct list_head lru;	/* Pageout list, eg. active_list
>  					 * protected by zone->lru_lock !
> @@ -143,6 +148,7 @@ struct page {
>  						 */
>  		/* First tail page of compound page */
>  		struct {
> +			unsigned long compound_head; /* If bit zero is set */
>  #ifdef CONFIG_64BIT
>  			unsigned int compound_dtor;
>  			unsigned int compound_order;
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 41c93844fb1d..9b865158e452 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -86,12 +86,7 @@ enum pageflags {
>  	PG_private,		/* If pagecache, has fs-private data */
>  	PG_private_2,		/* If pagecache, has fs aux data */
>  	PG_writeback,		/* Page is under writeback */
> -#ifdef CONFIG_PAGEFLAGS_EXTENDED
>  	PG_head,		/* A head page */
> -	PG_tail,		/* A tail page */
> -#else
> -	PG_compound,		/* A compound page */
> -#endif
>  	PG_swapcache,		/* Swap page: swp_entry_t in private */
>  	PG_mappedtodisk,	/* Has blocks allocated on-disk */
>  	PG_reclaim,		/* To be reclaimed asap */
> @@ -387,85 +382,46 @@ static inline void set_page_writeback_keepwrite(struct page *page)
>  	test_set_page_writeback_keepwrite(page);
>  }
>  
> -#ifdef CONFIG_PAGEFLAGS_EXTENDED
> -/*
> - * System with lots of page flags available. This allows separate
> - * flags for PageHead() and PageTail() checks of compound pages so that bit
> - * tests can be used in performance sensitive paths. PageCompound is
> - * generally not used in hot code paths except arch/powerpc/mm/init_64.c
> - * and arch/powerpc/kvm/book3s_64_vio_hv.c which use it to detect huge pages
> - * and avoid handling those in real mode.
> - */
>  __PAGEFLAG(Head, head) CLEARPAGEFLAG(Head, head)
> -__PAGEFLAG(Tail, tail)
>  
> -static inline int PageCompound(struct page *page)
> -{
> -	return page->flags & ((1L << PG_head) | (1L << PG_tail));
> -
> -}
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -static inline void ClearPageCompound(struct page *page)
> +static inline int PageTail(struct page *page)
>  {
> -	BUG_ON(!PageHead(page));
> -	ClearPageHead(page);
> +	return READ_ONCE(page->compound_head) & 1;
>  }
> -#endif
> -
> -#define PG_head_mask ((1L << PG_head))
>  
> -#else
> -/*
> - * Reduce page flag use as much as possible by overlapping
> - * compound page flags with the flags used for page cache pages. Possible
> - * because PageCompound is always set for compound pages and not for
> - * pages on the LRU and/or pagecache.
> - */
> -TESTPAGEFLAG(Compound, compound)
> -__SETPAGEFLAG(Head, compound)  __CLEARPAGEFLAG(Head, compound)
> -
> -/*
> - * PG_reclaim is used in combination with PG_compound to mark the
> - * head and tail of a compound page. This saves one page flag
> - * but makes it impossible to use compound pages for the page cache.
> - * The PG_reclaim bit would have to be used for reclaim or readahead
> - * if compound pages enter the page cache.
> - *
> - * PG_compound & PG_reclaim	=> Tail page
> - * PG_compound & ~PG_reclaim	=> Head page
> - */
> -#define PG_head_mask ((1L << PG_compound))
> -#define PG_head_tail_mask ((1L << PG_compound) | (1L << PG_reclaim))
> -
> -static inline int PageHead(struct page *page)
> +static inline void set_compound_head(struct page *page, struct page *head)
>  {
> -	return ((page->flags & PG_head_tail_mask) == PG_head_mask);
> +	WRITE_ONCE(page->compound_head, (unsigned long)head + 1);
>  }
>  
> -static inline int PageTail(struct page *page)
> +static inline void clear_compound_head(struct page *page)
>  {
> -	return ((page->flags & PG_head_tail_mask) == PG_head_tail_mask);
> +	WRITE_ONCE(page->compound_head, 0);
>  }
>  
> -static inline void __SetPageTail(struct page *page)
> +static inline struct page *compound_head(struct page *page)
>  {
> -	page->flags |= PG_head_tail_mask;
> +	unsigned long head = READ_ONCE(page->compound_head);
> +
> +	if (unlikely(head & 1))
> +		return (struct page *) (head - 1);
> +	return page;
>  }
>  
> -static inline void __ClearPageTail(struct page *page)
> +static inline int PageCompound(struct page *page)
>  {
> -	page->flags &= ~PG_head_tail_mask;
> -}
> +	return PageHead(page) || PageTail(page);
>  
> +}
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  static inline void ClearPageCompound(struct page *page)
>  {
> -	BUG_ON((page->flags & PG_head_tail_mask) != (1 << PG_compound));
> -	clear_bit(PG_compound, &page->flags);
> +	BUG_ON(!PageHead(page));
> +	ClearPageHead(page);
>  }
>  #endif
>  
> -#endif /* !PAGEFLAGS_EXTENDED */
> +#define PG_head_mask ((1L << PG_head))
>  
>  #ifdef CONFIG_HUGETLB_PAGE
>  int PageHuge(struct page *page);
> diff --git a/mm/Kconfig b/mm/Kconfig
> index e79de2bd12cd..454579d31081 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -200,18 +200,6 @@ config MEMORY_HOTREMOVE
>  	depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE
>  	depends on MIGRATION
>  
> -#
> -# If we have space for more page flags then we can enable additional
> -# optimizations and functionality.
> -#
> -# Regular Sparsemem takes page flag bits for the sectionid if it does not
> -# use a virtual memmap. Disable extended page flags for 32 bit platforms
> -# that require the use of a sectionid in the page flags.
> -#
> -config PAGEFLAGS_EXTENDED
> -	def_bool y
> -	depends on 64BIT || SPARSEMEM_VMEMMAP || !SPARSEMEM
> -
>  # Heavily threaded applications may benefit from splitting the mm-wide
>  # page_table_lock, so that faults on different parts of the user address
>  # space can be handled with less contention: split it at this NR_CPUS.
> diff --git a/mm/debug.c b/mm/debug.c
> index 76089ddf99ea..205e5ef957ab 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -25,12 +25,7 @@ static const struct trace_print_flags pageflag_names[] = {
>  	{1UL << PG_private,		"private"	},
>  	{1UL << PG_private_2,		"private_2"	},
>  	{1UL << PG_writeback,		"writeback"	},
> -#ifdef CONFIG_PAGEFLAGS_EXTENDED
>  	{1UL << PG_head,		"head"		},
> -	{1UL << PG_tail,		"tail"		},
> -#else
> -	{1UL << PG_compound,		"compound"	},
> -#endif
>  	{1UL << PG_swapcache,		"swapcache"	},
>  	{1UL << PG_mappedtodisk,	"mappedtodisk"	},
>  	{1UL << PG_reclaim,		"reclaim"	},
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 097c7a4bfbd9..330377f83ac7 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1686,8 +1686,7 @@ static void __split_huge_page_refcount(struct page *page,
>  				      (1L << PG_unevictable)));
>  		page_tail->flags |= (1L << PG_dirty);
>  
> -		/* clear PageTail before overwriting first_page */
> -		smp_wmb();
> +		clear_compound_head(page_tail);
>  
>  		/*
>  		 * __split_huge_page_splitting() already set the
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 8ea74caa1fa8..53c0709fd87b 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -824,9 +824,8 @@ static void destroy_compound_gigantic_page(struct page *page,
>  	struct page *p = page + 1;
>  
>  	for (i = 1; i < nr_pages; i++, p = mem_map_next(p, page, i)) {
> -		__ClearPageTail(p);
> +		clear_compound_head(p);
>  		set_page_refcounted(p);
> -		p->first_page = NULL;
>  	}
>  
>  	set_compound_order(page, 0);
> @@ -1099,10 +1098,7 @@ static void prep_compound_gigantic_page(struct page *page, unsigned long order)
>  		 */
>  		__ClearPageReserved(p);
>  		set_page_count(p, 0);
> -		p->first_page = page;
> -		/* Make sure p->first_page is always valid for PageTail() */
> -		smp_wmb();
> -		__SetPageTail(p);
> +		set_compound_head(p, page);
>  	}
>  }
>  
> diff --git a/mm/internal.h b/mm/internal.h
> index 36b23f1e2ca6..89e21a07080a 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -61,9 +61,9 @@ static inline void __get_page_tail_foll(struct page *page,
>  	 * speculative page access (like in
>  	 * page_cache_get_speculative()) on tail pages.
>  	 */
> -	VM_BUG_ON_PAGE(atomic_read(&page->first_page->_count) <= 0, page);
> +	VM_BUG_ON_PAGE(atomic_read(&compound_head(page)->_count) <= 0, page);
>  	if (get_page_head)
> -		atomic_inc(&page->first_page->_count);
> +		atomic_inc(&compound_head(page)->_count);
>  	get_huge_page_tail(page);
>  }
>  
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 1f4446a90cef..4d1a5de9653d 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -787,8 +787,6 @@ static int me_huge_page(struct page *p, unsigned long pfn)
>  #define lru		(1UL << PG_lru)
>  #define swapbacked	(1UL << PG_swapbacked)
>  #define head		(1UL << PG_head)
> -#define tail		(1UL << PG_tail)
> -#define compound	(1UL << PG_compound)
>  #define slab		(1UL << PG_slab)
>  #define reserved	(1UL << PG_reserved)
>  
> @@ -811,12 +809,7 @@ static struct page_state {
>  	 */
>  	{ slab,		slab,		MF_MSG_SLAB,	me_kernel },
>  
> -#ifdef CONFIG_PAGEFLAGS_EXTENDED
>  	{ head,		head,		MF_MSG_HUGE,		me_huge_page },
> -	{ tail,		tail,		MF_MSG_HUGE,		me_huge_page },
> -#else
> -	{ compound,	compound,	MF_MSG_HUGE,		me_huge_page },
> -#endif
>  
>  	{ sc|dirty,	sc|dirty,	MF_MSG_DIRTY_SWAPCACHE,	me_swapcache_dirty },
>  	{ sc|dirty,	sc,		MF_MSG_CLEAN_SWAPCACHE,	me_swapcache_clean },
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index beab86e694b2..16c3f97a7d30 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -426,7 +426,7 @@ out:
>   *
>   * The remaining PAGE_SIZE pages are called "tail pages".
>   *
> - * All pages have PG_compound set.  All tail pages have their ->first_page
> + * All pages have PG_compound set.  All tail pages have their compound_head()
>   * pointing at the head page.
>   *
>   * The first tail page's ->lru.next holds the address of the compound page's
> @@ -450,10 +450,7 @@ void prep_compound_page(struct page *page, unsigned long order)
>  	for (i = 1; i < nr_pages; i++) {
>  		struct page *p = page + i;
>  		set_page_count(p, 0);
> -		p->first_page = page;
> -		/* Make sure p->first_page is always valid for PageTail() */
> -		smp_wmb();
> -		__SetPageTail(p);
> +		set_compound_head(p, page);
>  	}
>  }
>  
> @@ -828,17 +825,24 @@ static void free_one_page(struct zone *zone,
>  
>  static int free_tail_pages_check(struct page *head_page, struct page *page)
>  {
> -	if (!IS_ENABLED(CONFIG_DEBUG_VM))
> -		return 0;
> +	int ret = 1;
> +
> +	if (!IS_ENABLED(CONFIG_DEBUG_VM)) {
> +		ret = 0;
> +		goto out;
> +	}
>  	if (unlikely(!PageTail(page))) {
>  		bad_page(page, "PageTail not set", 0);
> -		return 1;
> +		goto out;
>  	}
> -	if (unlikely(page->first_page != head_page)) {
> -		bad_page(page, "first_page not consistent", 0);
> -		return 1;
> +	if (unlikely(compound_head(page) != head_page)) {
> +		bad_page(page, "compound_head not consistent", 0);
> +		goto out;
>  	}
> -	return 0;
> +	ret = 0;
> +out:
> +	clear_compound_head(page);
> +	return ret;
>  }
>  
>  static void __meminit __init_single_page(struct page *page, unsigned long pfn,
> diff --git a/mm/swap.c b/mm/swap.c
> index a3a0a2f1f7c3..faa9e1687dea 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -200,7 +200,7 @@ out_put_single:
>  				__put_single_page(page);
>  			return;
>  		}
> -		VM_BUG_ON_PAGE(page_head != page->first_page, page);
> +		VM_BUG_ON_PAGE(page_head != compound_head(page), page);
>  		/*
>  		 * We can release the refcount taken by
>  		 * get_page_unless_zero() now that
> @@ -261,7 +261,7 @@ static void put_compound_page(struct page *page)
>  	 *  Case 3 is possible, as we may race with
>  	 *  __split_huge_page_refcount tearing down a THP page.
>  	 */
> -	page_head = compound_head_by_tail(page);
> +	page_head = compound_head(page);
>  	if (!__compound_tail_refcounted(page_head))
>  		put_unrefcounted_compound_page(page_head, page);
>  	else
> -- 
> 2.5.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 4/4] mm: make compound_head() robust
@ 2015-08-18 11:20     ` Michal Hocko
  0 siblings, 0 replies; 34+ messages in thread
From: Michal Hocko @ 2015-08-18 11:20 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Hugh Dickins, Andrea Arcangeli, Dave Hansen,
	Vlastimil Babka, Johannes Weiner, David Rientjes, linux-kernel,
	linux-mm

On Mon 17-08-15 18:09:05, Kirill A. Shutemov wrote:
> Hugh has pointed that compound_head() call can be unsafe in some
> context. There's one example:
> 
> 	CPU0					CPU1
> 
> isolate_migratepages_block()
>   page_count()
>     compound_head()
>       !!PageTail() == true
> 					put_page()
> 					  tail->first_page = NULL
>       head = tail->first_page
> 					alloc_pages(__GFP_COMP)
> 					   prep_compound_page()
> 					     tail->first_page = head
> 					     __SetPageTail(p);
>       !!PageTail() == true
>     <head == NULL dereferencing>
> 
> The race is pure theoretical. I don't it's possible to trigger it in
> practice. But who knows.
> 
> We can fix the race by changing how encode PageTail() and compound_head()
> within struct page to be able to update them in one shot.
> 
> The patch introduces page->compound_head into third double word block in
> front of compound_dtor and compound_order. That means it shares storage
> space with:
> 
>  - page->lru.next;
>  - page->next;
>  - page->rcu_head.next;
>  - page->pmd_huge_pte;
> 
> That's too long list to be absolutely sure, but looks like nobody uses
> bit 0 of the word. It can be used to encode PageTail(). And if the bit
> set, rest of the word is pointer to head page.

I didn't look too closely but the general idea makes sense to me and the
overal code simplification is sound. I will give it more detailed review
after I sort out other stuff.

> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  Documentation/vm/split_page_table_lock |  4 +-
>  arch/xtensa/configs/iss_defconfig      |  1 -
>  include/linux/mm.h                     | 53 ++--------------------
>  include/linux/mm_types.h               |  8 +++-
>  include/linux/page-flags.h             | 80 ++++++++--------------------------
>  mm/Kconfig                             | 12 -----
>  mm/debug.c                             |  5 ---
>  mm/huge_memory.c                       |  3 +-
>  mm/hugetlb.c                           |  8 +---
>  mm/internal.h                          |  4 +-
>  mm/memory-failure.c                    |  7 ---
>  mm/page_alloc.c                        | 28 +++++++-----
>  mm/swap.c                              |  4 +-
>  13 files changed, 53 insertions(+), 164 deletions(-)
> 
> diff --git a/Documentation/vm/split_page_table_lock b/Documentation/vm/split_page_table_lock
> index 6dea4fd5c961..62842a857dab 100644
> --- a/Documentation/vm/split_page_table_lock
> +++ b/Documentation/vm/split_page_table_lock
> @@ -54,8 +54,8 @@ everything required is done by pgtable_page_ctor() and pgtable_page_dtor(),
>  which must be called on PTE table allocation / freeing.
>  
>  Make sure the architecture doesn't use slab allocator for page table
> -allocation: slab uses page->slab_cache and page->first_page for its pages.
> -These fields share storage with page->ptl.
> +allocation: slab uses page->slab_cache for its pages.
> +This field shares storage with page->ptl.
>  
>  PMD split lock only makes sense if you have more than two page table
>  levels.
> diff --git a/arch/xtensa/configs/iss_defconfig b/arch/xtensa/configs/iss_defconfig
> index e4d193e7a300..5c7c385f21c4 100644
> --- a/arch/xtensa/configs/iss_defconfig
> +++ b/arch/xtensa/configs/iss_defconfig
> @@ -169,7 +169,6 @@ CONFIG_FLATMEM_MANUAL=y
>  # CONFIG_SPARSEMEM_MANUAL is not set
>  CONFIG_FLATMEM=y
>  CONFIG_FLAT_NODE_MEM_MAP=y
> -CONFIG_PAGEFLAGS_EXTENDED=y
>  CONFIG_SPLIT_PTLOCK_CPUS=4
>  # CONFIG_PHYS_ADDR_T_64BIT is not set
>  CONFIG_ZONE_DMA_FLAG=1
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 9c21bbb8875a..5090a0b9bb43 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -437,46 +437,6 @@ static inline void compound_unlock_irqrestore(struct page *page,
>  #endif
>  }
>  
> -static inline struct page *compound_head_by_tail(struct page *tail)
> -{
> -	struct page *head = tail->first_page;
> -
> -	/*
> -	 * page->first_page may be a dangling pointer to an old
> -	 * compound page, so recheck that it is still a tail
> -	 * page before returning.
> -	 */
> -	smp_rmb();
> -	if (likely(PageTail(tail)))
> -		return head;
> -	return tail;
> -}
> -
> -/*
> - * Since either compound page could be dismantled asynchronously in THP
> - * or we access asynchronously arbitrary positioned struct page, there
> - * would be tail flag race. To handle this race, we should call
> - * smp_rmb() before checking tail flag. compound_head_by_tail() did it.
> - */
> -static inline struct page *compound_head(struct page *page)
> -{
> -	if (unlikely(PageTail(page)))
> -		return compound_head_by_tail(page);
> -	return page;
> -}
> -
> -/*
> - * If we access compound page synchronously such as access to
> - * allocated page, there is no need to handle tail flag race, so we can
> - * check tail flag directly without any synchronization primitive.
> - */
> -static inline struct page *compound_head_fast(struct page *page)
> -{
> -	if (unlikely(PageTail(page)))
> -		return page->first_page;
> -	return page;
> -}
> -
>  /*
>   * The atomic page->_mapcount, starts from -1: so that transitions
>   * both from it and to it can be tracked, using atomic_inc_and_test
> @@ -525,7 +485,7 @@ static inline void get_huge_page_tail(struct page *page)
>  	VM_BUG_ON_PAGE(!PageTail(page), page);
>  	VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
>  	VM_BUG_ON_PAGE(atomic_read(&page->_count) != 0, page);
> -	if (compound_tail_refcounted(page->first_page))
> +	if (compound_tail_refcounted(compound_head(page)))
>  		atomic_inc(&page->_mapcount);
>  }
>  
> @@ -548,13 +508,7 @@ static inline struct page *virt_to_head_page(const void *x)
>  {
>  	struct page *page = virt_to_page(x);
>  
> -	/*
> -	 * We don't need to worry about synchronization of tail flag
> -	 * when we call virt_to_head_page() since it is only called for
> -	 * already allocated page and this page won't be freed until
> -	 * this virt_to_head_page() is finished. So use _fast variant.
> -	 */
> -	return compound_head_fast(page);
> +	return compound_head(page);
>  }
>  
>  /*
> @@ -1494,8 +1448,7 @@ static inline bool ptlock_init(struct page *page)
>  	 * with 0. Make sure nobody took it in use in between.
>  	 *
>  	 * It can happen if arch try to use slab for page table allocation:
> -	 * slab code uses page->slab_cache and page->first_page (for tail
> -	 * pages), which share storage with page->ptl.
> +	 * slab code uses page->slab_cache, which share storage with page->ptl.
>  	 */
>  	VM_BUG_ON_PAGE(*(unsigned long *)&page->ptl, page);
>  	if (!ptlock_alloc(page))
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 63cdfe7ec336..ad47b61d96ce 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -120,7 +120,12 @@ struct page {
>  		};
>  	};
>  
> -	/* Third double word block */
> +	/*
> +	 * Third double word block
> +	 *
> +	 * WARNING: bit 0 of the first word encode PageTail and *must* be 0
> +	 * for non-tail pages.
> +	 */
>  	union {
>  		struct list_head lru;	/* Pageout list, eg. active_list
>  					 * protected by zone->lru_lock !
> @@ -143,6 +148,7 @@ struct page {
>  						 */
>  		/* First tail page of compound page */
>  		struct {
> +			unsigned long compound_head; /* If bit zero is set */
>  #ifdef CONFIG_64BIT
>  			unsigned int compound_dtor;
>  			unsigned int compound_order;
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 41c93844fb1d..9b865158e452 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -86,12 +86,7 @@ enum pageflags {
>  	PG_private,		/* If pagecache, has fs-private data */
>  	PG_private_2,		/* If pagecache, has fs aux data */
>  	PG_writeback,		/* Page is under writeback */
> -#ifdef CONFIG_PAGEFLAGS_EXTENDED
>  	PG_head,		/* A head page */
> -	PG_tail,		/* A tail page */
> -#else
> -	PG_compound,		/* A compound page */
> -#endif
>  	PG_swapcache,		/* Swap page: swp_entry_t in private */
>  	PG_mappedtodisk,	/* Has blocks allocated on-disk */
>  	PG_reclaim,		/* To be reclaimed asap */
> @@ -387,85 +382,46 @@ static inline void set_page_writeback_keepwrite(struct page *page)
>  	test_set_page_writeback_keepwrite(page);
>  }
>  
> -#ifdef CONFIG_PAGEFLAGS_EXTENDED
> -/*
> - * System with lots of page flags available. This allows separate
> - * flags for PageHead() and PageTail() checks of compound pages so that bit
> - * tests can be used in performance sensitive paths. PageCompound is
> - * generally not used in hot code paths except arch/powerpc/mm/init_64.c
> - * and arch/powerpc/kvm/book3s_64_vio_hv.c which use it to detect huge pages
> - * and avoid handling those in real mode.
> - */
>  __PAGEFLAG(Head, head) CLEARPAGEFLAG(Head, head)
> -__PAGEFLAG(Tail, tail)
>  
> -static inline int PageCompound(struct page *page)
> -{
> -	return page->flags & ((1L << PG_head) | (1L << PG_tail));
> -
> -}
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -static inline void ClearPageCompound(struct page *page)
> +static inline int PageTail(struct page *page)
>  {
> -	BUG_ON(!PageHead(page));
> -	ClearPageHead(page);
> +	return READ_ONCE(page->compound_head) & 1;
>  }
> -#endif
> -
> -#define PG_head_mask ((1L << PG_head))
>  
> -#else
> -/*
> - * Reduce page flag use as much as possible by overlapping
> - * compound page flags with the flags used for page cache pages. Possible
> - * because PageCompound is always set for compound pages and not for
> - * pages on the LRU and/or pagecache.
> - */
> -TESTPAGEFLAG(Compound, compound)
> -__SETPAGEFLAG(Head, compound)  __CLEARPAGEFLAG(Head, compound)
> -
> -/*
> - * PG_reclaim is used in combination with PG_compound to mark the
> - * head and tail of a compound page. This saves one page flag
> - * but makes it impossible to use compound pages for the page cache.
> - * The PG_reclaim bit would have to be used for reclaim or readahead
> - * if compound pages enter the page cache.
> - *
> - * PG_compound & PG_reclaim	=> Tail page
> - * PG_compound & ~PG_reclaim	=> Head page
> - */
> -#define PG_head_mask ((1L << PG_compound))
> -#define PG_head_tail_mask ((1L << PG_compound) | (1L << PG_reclaim))
> -
> -static inline int PageHead(struct page *page)
> +static inline void set_compound_head(struct page *page, struct page *head)
>  {
> -	return ((page->flags & PG_head_tail_mask) == PG_head_mask);
> +	WRITE_ONCE(page->compound_head, (unsigned long)head + 1);
>  }
>  
> -static inline int PageTail(struct page *page)
> +static inline void clear_compound_head(struct page *page)
>  {
> -	return ((page->flags & PG_head_tail_mask) == PG_head_tail_mask);
> +	WRITE_ONCE(page->compound_head, 0);
>  }
>  
> -static inline void __SetPageTail(struct page *page)
> +static inline struct page *compound_head(struct page *page)
>  {
> -	page->flags |= PG_head_tail_mask;
> +	unsigned long head = READ_ONCE(page->compound_head);
> +
> +	if (unlikely(head & 1))
> +		return (struct page *) (head - 1);
> +	return page;
>  }
>  
> -static inline void __ClearPageTail(struct page *page)
> +static inline int PageCompound(struct page *page)
>  {
> -	page->flags &= ~PG_head_tail_mask;
> -}
> +	return PageHead(page) || PageTail(page);
>  
> +}
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  static inline void ClearPageCompound(struct page *page)
>  {
> -	BUG_ON((page->flags & PG_head_tail_mask) != (1 << PG_compound));
> -	clear_bit(PG_compound, &page->flags);
> +	BUG_ON(!PageHead(page));
> +	ClearPageHead(page);
>  }
>  #endif
>  
> -#endif /* !PAGEFLAGS_EXTENDED */
> +#define PG_head_mask ((1L << PG_head))
>  
>  #ifdef CONFIG_HUGETLB_PAGE
>  int PageHuge(struct page *page);
> diff --git a/mm/Kconfig b/mm/Kconfig
> index e79de2bd12cd..454579d31081 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -200,18 +200,6 @@ config MEMORY_HOTREMOVE
>  	depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE
>  	depends on MIGRATION
>  
> -#
> -# If we have space for more page flags then we can enable additional
> -# optimizations and functionality.
> -#
> -# Regular Sparsemem takes page flag bits for the sectionid if it does not
> -# use a virtual memmap. Disable extended page flags for 32 bit platforms
> -# that require the use of a sectionid in the page flags.
> -#
> -config PAGEFLAGS_EXTENDED
> -	def_bool y
> -	depends on 64BIT || SPARSEMEM_VMEMMAP || !SPARSEMEM
> -
>  # Heavily threaded applications may benefit from splitting the mm-wide
>  # page_table_lock, so that faults on different parts of the user address
>  # space can be handled with less contention: split it at this NR_CPUS.
> diff --git a/mm/debug.c b/mm/debug.c
> index 76089ddf99ea..205e5ef957ab 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -25,12 +25,7 @@ static const struct trace_print_flags pageflag_names[] = {
>  	{1UL << PG_private,		"private"	},
>  	{1UL << PG_private_2,		"private_2"	},
>  	{1UL << PG_writeback,		"writeback"	},
> -#ifdef CONFIG_PAGEFLAGS_EXTENDED
>  	{1UL << PG_head,		"head"		},
> -	{1UL << PG_tail,		"tail"		},
> -#else
> -	{1UL << PG_compound,		"compound"	},
> -#endif
>  	{1UL << PG_swapcache,		"swapcache"	},
>  	{1UL << PG_mappedtodisk,	"mappedtodisk"	},
>  	{1UL << PG_reclaim,		"reclaim"	},
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 097c7a4bfbd9..330377f83ac7 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1686,8 +1686,7 @@ static void __split_huge_page_refcount(struct page *page,
>  				      (1L << PG_unevictable)));
>  		page_tail->flags |= (1L << PG_dirty);
>  
> -		/* clear PageTail before overwriting first_page */
> -		smp_wmb();
> +		clear_compound_head(page_tail);
>  
>  		/*
>  		 * __split_huge_page_splitting() already set the
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 8ea74caa1fa8..53c0709fd87b 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -824,9 +824,8 @@ static void destroy_compound_gigantic_page(struct page *page,
>  	struct page *p = page + 1;
>  
>  	for (i = 1; i < nr_pages; i++, p = mem_map_next(p, page, i)) {
> -		__ClearPageTail(p);
> +		clear_compound_head(p);
>  		set_page_refcounted(p);
> -		p->first_page = NULL;
>  	}
>  
>  	set_compound_order(page, 0);
> @@ -1099,10 +1098,7 @@ static void prep_compound_gigantic_page(struct page *page, unsigned long order)
>  		 */
>  		__ClearPageReserved(p);
>  		set_page_count(p, 0);
> -		p->first_page = page;
> -		/* Make sure p->first_page is always valid for PageTail() */
> -		smp_wmb();
> -		__SetPageTail(p);
> +		set_compound_head(p, page);
>  	}
>  }
>  
> diff --git a/mm/internal.h b/mm/internal.h
> index 36b23f1e2ca6..89e21a07080a 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -61,9 +61,9 @@ static inline void __get_page_tail_foll(struct page *page,
>  	 * speculative page access (like in
>  	 * page_cache_get_speculative()) on tail pages.
>  	 */
> -	VM_BUG_ON_PAGE(atomic_read(&page->first_page->_count) <= 0, page);
> +	VM_BUG_ON_PAGE(atomic_read(&compound_head(page)->_count) <= 0, page);
>  	if (get_page_head)
> -		atomic_inc(&page->first_page->_count);
> +		atomic_inc(&compound_head(page)->_count);
>  	get_huge_page_tail(page);
>  }
>  
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 1f4446a90cef..4d1a5de9653d 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -787,8 +787,6 @@ static int me_huge_page(struct page *p, unsigned long pfn)
>  #define lru		(1UL << PG_lru)
>  #define swapbacked	(1UL << PG_swapbacked)
>  #define head		(1UL << PG_head)
> -#define tail		(1UL << PG_tail)
> -#define compound	(1UL << PG_compound)
>  #define slab		(1UL << PG_slab)
>  #define reserved	(1UL << PG_reserved)
>  
> @@ -811,12 +809,7 @@ static struct page_state {
>  	 */
>  	{ slab,		slab,		MF_MSG_SLAB,	me_kernel },
>  
> -#ifdef CONFIG_PAGEFLAGS_EXTENDED
>  	{ head,		head,		MF_MSG_HUGE,		me_huge_page },
> -	{ tail,		tail,		MF_MSG_HUGE,		me_huge_page },
> -#else
> -	{ compound,	compound,	MF_MSG_HUGE,		me_huge_page },
> -#endif
>  
>  	{ sc|dirty,	sc|dirty,	MF_MSG_DIRTY_SWAPCACHE,	me_swapcache_dirty },
>  	{ sc|dirty,	sc,		MF_MSG_CLEAN_SWAPCACHE,	me_swapcache_clean },
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index beab86e694b2..16c3f97a7d30 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -426,7 +426,7 @@ out:
>   *
>   * The remaining PAGE_SIZE pages are called "tail pages".
>   *
> - * All pages have PG_compound set.  All tail pages have their ->first_page
> + * All pages have PG_compound set.  All tail pages have their compound_head()
>   * pointing at the head page.
>   *
>   * The first tail page's ->lru.next holds the address of the compound page's
> @@ -450,10 +450,7 @@ void prep_compound_page(struct page *page, unsigned long order)
>  	for (i = 1; i < nr_pages; i++) {
>  		struct page *p = page + i;
>  		set_page_count(p, 0);
> -		p->first_page = page;
> -		/* Make sure p->first_page is always valid for PageTail() */
> -		smp_wmb();
> -		__SetPageTail(p);
> +		set_compound_head(p, page);
>  	}
>  }
>  
> @@ -828,17 +825,24 @@ static void free_one_page(struct zone *zone,
>  
>  static int free_tail_pages_check(struct page *head_page, struct page *page)
>  {
> -	if (!IS_ENABLED(CONFIG_DEBUG_VM))
> -		return 0;
> +	int ret = 1;
> +
> +	if (!IS_ENABLED(CONFIG_DEBUG_VM)) {
> +		ret = 0;
> +		goto out;
> +	}
>  	if (unlikely(!PageTail(page))) {
>  		bad_page(page, "PageTail not set", 0);
> -		return 1;
> +		goto out;
>  	}
> -	if (unlikely(page->first_page != head_page)) {
> -		bad_page(page, "first_page not consistent", 0);
> -		return 1;
> +	if (unlikely(compound_head(page) != head_page)) {
> +		bad_page(page, "compound_head not consistent", 0);
> +		goto out;
>  	}
> -	return 0;
> +	ret = 0;
> +out:
> +	clear_compound_head(page);
> +	return ret;
>  }
>  
>  static void __meminit __init_single_page(struct page *page, unsigned long pfn,
> diff --git a/mm/swap.c b/mm/swap.c
> index a3a0a2f1f7c3..faa9e1687dea 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -200,7 +200,7 @@ out_put_single:
>  				__put_single_page(page);
>  			return;
>  		}
> -		VM_BUG_ON_PAGE(page_head != page->first_page, page);
> +		VM_BUG_ON_PAGE(page_head != compound_head(page), page);
>  		/*
>  		 * We can release the refcount taken by
>  		 * get_page_unless_zero() now that
> @@ -261,7 +261,7 @@ static void put_compound_page(struct page *page)
>  	 *  Case 3 is possible, as we may race with
>  	 *  __split_huge_page_refcount tearing down a THP page.
>  	 */
> -	page_head = compound_head_by_tail(page);
> +	page_head = compound_head(page);
>  	if (!__compound_tail_refcounted(page_head))
>  		put_unrefcounted_compound_page(page_head, page);
>  	else
> -- 
> 2.5.0

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 3/4] mm: pack compound_dtor and compound_order into one word in struct page
  2015-08-17 15:09   ` Kirill A. Shutemov
@ 2015-08-18 15:43     ` Michal Hocko
  -1 siblings, 0 replies; 34+ messages in thread
From: Michal Hocko @ 2015-08-18 15:43 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Hugh Dickins, Andrea Arcangeli, Dave Hansen,
	Vlastimil Babka, Johannes Weiner, David Rientjes, linux-kernel,
	linux-mm

On Mon 17-08-15 18:09:04, Kirill A. Shutemov wrote:
> The patch halves space occupied by compound_dtor and compound_order in
> struct page.
> 
> For compound_order, it's trivial long -> int/short conversion.
> 
> For get_compound_page_dtor(), we now use hardcoded table for destructor
> lookup and store its index in the struct page instead of direct pointer
> to destructor. It shouldn't be a big trouble to maintain the table: we
> have only two destructor and NULL currently.
> 
> This patch free up one word in tail pages for reuse. This is preparation
> for the next patch.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Reviewed-by: Michal Hocko <mhocko@suse.com>

[...]
> @@ -145,8 +143,13 @@ struct page {
>  						 */
>  		/* First tail page of compound page */
>  		struct {
> -			compound_page_dtor *compound_dtor;
> -			unsigned long compound_order;
> +#ifdef CONFIG_64BIT
> +			unsigned int compound_dtor;
> +			unsigned int compound_order;
> +#else
> +			unsigned short int compound_dtor;
> +			unsigned short int compound_order;
> +#endif
>  		};

Why do we need this ifdef? We can go with short for both 32b and 64b
AFAICS. We do not use compound_order for anything else than the order,
right?

While I am looking at this, it seems we are jugling with type for order
quite a lot - int, unsing int and even unsigned long.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 3/4] mm: pack compound_dtor and compound_order into one word in struct page
@ 2015-08-18 15:43     ` Michal Hocko
  0 siblings, 0 replies; 34+ messages in thread
From: Michal Hocko @ 2015-08-18 15:43 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Hugh Dickins, Andrea Arcangeli, Dave Hansen,
	Vlastimil Babka, Johannes Weiner, David Rientjes, linux-kernel,
	linux-mm

On Mon 17-08-15 18:09:04, Kirill A. Shutemov wrote:
> The patch halves space occupied by compound_dtor and compound_order in
> struct page.
> 
> For compound_order, it's trivial long -> int/short conversion.
> 
> For get_compound_page_dtor(), we now use hardcoded table for destructor
> lookup and store its index in the struct page instead of direct pointer
> to destructor. It shouldn't be a big trouble to maintain the table: we
> have only two destructor and NULL currently.
> 
> This patch free up one word in tail pages for reuse. This is preparation
> for the next patch.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Reviewed-by: Michal Hocko <mhocko@suse.com>

[...]
> @@ -145,8 +143,13 @@ struct page {
>  						 */
>  		/* First tail page of compound page */
>  		struct {
> -			compound_page_dtor *compound_dtor;
> -			unsigned long compound_order;
> +#ifdef CONFIG_64BIT
> +			unsigned int compound_dtor;
> +			unsigned int compound_order;
> +#else
> +			unsigned short int compound_dtor;
> +			unsigned short int compound_order;
> +#endif
>  		};

Why do we need this ifdef? We can go with short for both 32b and 64b
AFAICS. We do not use compound_order for anything else than the order,
right?

While I am looking at this, it seems we are jugling with type for order
quite a lot - int, unsing int and even unsigned long.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 3/4] mm: pack compound_dtor and compound_order into one word in struct page
  2015-08-17 15:09   ` Kirill A. Shutemov
@ 2015-08-18 16:05     ` Michal Hocko
  -1 siblings, 0 replies; 34+ messages in thread
From: Michal Hocko @ 2015-08-18 16:05 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Hugh Dickins, Andrea Arcangeli, Dave Hansen,
	Vlastimil Babka, Johannes Weiner, David Rientjes, linux-kernel,
	linux-mm

On Mon 17-08-15 18:09:04, Kirill A. Shutemov wrote:
[...]
> +/* Keep the enum in sync with compound_page_dtors array in mm/page_alloc.c */
> +enum {
> +	NULL_COMPOUND_DTOR,
> +	COMPOUND_PAGE_DTOR,
> +	HUGETLB_PAGE_DTOR,
> +	NR_COMPOUND_DTORS,
> +};
[...]
> +static void free_compound_page(struct page *page);
> +compound_page_dtor * const compound_page_dtors[] = {
> +	NULL,
> +	free_compound_page,
> +	free_huge_page,
> +};
> +

Both need ifdef CONFIG_HUGETLB_PAGE as my compile test batter just found
out.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 3/4] mm: pack compound_dtor and compound_order into one word in struct page
@ 2015-08-18 16:05     ` Michal Hocko
  0 siblings, 0 replies; 34+ messages in thread
From: Michal Hocko @ 2015-08-18 16:05 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Hugh Dickins, Andrea Arcangeli, Dave Hansen,
	Vlastimil Babka, Johannes Weiner, David Rientjes, linux-kernel,
	linux-mm

On Mon 17-08-15 18:09:04, Kirill A. Shutemov wrote:
[...]
> +/* Keep the enum in sync with compound_page_dtors array in mm/page_alloc.c */
> +enum {
> +	NULL_COMPOUND_DTOR,
> +	COMPOUND_PAGE_DTOR,
> +	HUGETLB_PAGE_DTOR,
> +	NR_COMPOUND_DTORS,
> +};
[...]
> +static void free_compound_page(struct page *page);
> +compound_page_dtor * const compound_page_dtors[] = {
> +	NULL,
> +	free_compound_page,
> +	free_huge_page,
> +};
> +

Both need ifdef CONFIG_HUGETLB_PAGE as my compile test batter just found
out.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 4/4] mm: make compound_head() robust
  2015-08-18 11:20     ` Michal Hocko
@ 2015-08-18 16:41       ` Michal Hocko
  -1 siblings, 0 replies; 34+ messages in thread
From: Michal Hocko @ 2015-08-18 16:41 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Hugh Dickins, Andrea Arcangeli, Dave Hansen,
	Vlastimil Babka, Johannes Weiner, David Rientjes, linux-kernel,
	linux-mm

On Tue 18-08-15 13:20:22, Michal Hocko wrote:
> On Mon 17-08-15 18:09:05, Kirill A. Shutemov wrote:
> > Hugh has pointed that compound_head() call can be unsafe in some
> > context. There's one example:
> > 
> > 	CPU0					CPU1
> > 
> > isolate_migratepages_block()
> >   page_count()
> >     compound_head()
> >       !!PageTail() == true
> > 					put_page()
> > 					  tail->first_page = NULL
> >       head = tail->first_page
> > 					alloc_pages(__GFP_COMP)
> > 					   prep_compound_page()
> > 					     tail->first_page = head
> > 					     __SetPageTail(p);
> >       !!PageTail() == true
> >     <head == NULL dereferencing>
> > 
> > The race is pure theoretical. I don't it's possible to trigger it in
> > practice. But who knows.
> > 
> > We can fix the race by changing how encode PageTail() and compound_head()
> > within struct page to be able to update them in one shot.
> > 
> > The patch introduces page->compound_head into third double word block in
> > front of compound_dtor and compound_order. That means it shares storage
> > space with:
> > 
> >  - page->lru.next;
> >  - page->next;
> >  - page->rcu_head.next;
> >  - page->pmd_huge_pte;
> > 
> > That's too long list to be absolutely sure, but looks like nobody uses
> > bit 0 of the word. It can be used to encode PageTail(). And if the bit
> > set, rest of the word is pointer to head page.
> 
> I didn't look too closely but the general idea makes sense to me and the
> overal code simplification is sound. I will give it more detailed review
> after I sort out other stuff.

AFICS page::first_page wasn't used outside of compound page logic so you
should remove it in this patch. The rest looks good to me.

Acked-by: Michal Hocko <mhocko@suse.com>
 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  Documentation/vm/split_page_table_lock |  4 +-
> >  arch/xtensa/configs/iss_defconfig      |  1 -
> >  include/linux/mm.h                     | 53 ++--------------------
> >  include/linux/mm_types.h               |  8 +++-
> >  include/linux/page-flags.h             | 80 ++++++++--------------------------
> >  mm/Kconfig                             | 12 -----
> >  mm/debug.c                             |  5 ---
> >  mm/huge_memory.c                       |  3 +-
> >  mm/hugetlb.c                           |  8 +---
> >  mm/internal.h                          |  4 +-
> >  mm/memory-failure.c                    |  7 ---
> >  mm/page_alloc.c                        | 28 +++++++-----
> >  mm/swap.c                              |  4 +-
> >  13 files changed, 53 insertions(+), 164 deletions(-)
> > 
> > diff --git a/Documentation/vm/split_page_table_lock b/Documentation/vm/split_page_table_lock
> > index 6dea4fd5c961..62842a857dab 100644
> > --- a/Documentation/vm/split_page_table_lock
> > +++ b/Documentation/vm/split_page_table_lock
> > @@ -54,8 +54,8 @@ everything required is done by pgtable_page_ctor() and pgtable_page_dtor(),
> >  which must be called on PTE table allocation / freeing.
> >  
> >  Make sure the architecture doesn't use slab allocator for page table
> > -allocation: slab uses page->slab_cache and page->first_page for its pages.
> > -These fields share storage with page->ptl.
> > +allocation: slab uses page->slab_cache for its pages.
> > +This field shares storage with page->ptl.
> >  
> >  PMD split lock only makes sense if you have more than two page table
> >  levels.
> > diff --git a/arch/xtensa/configs/iss_defconfig b/arch/xtensa/configs/iss_defconfig
> > index e4d193e7a300..5c7c385f21c4 100644
> > --- a/arch/xtensa/configs/iss_defconfig
> > +++ b/arch/xtensa/configs/iss_defconfig
> > @@ -169,7 +169,6 @@ CONFIG_FLATMEM_MANUAL=y
> >  # CONFIG_SPARSEMEM_MANUAL is not set
> >  CONFIG_FLATMEM=y
> >  CONFIG_FLAT_NODE_MEM_MAP=y
> > -CONFIG_PAGEFLAGS_EXTENDED=y
> >  CONFIG_SPLIT_PTLOCK_CPUS=4
> >  # CONFIG_PHYS_ADDR_T_64BIT is not set
> >  CONFIG_ZONE_DMA_FLAG=1
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 9c21bbb8875a..5090a0b9bb43 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -437,46 +437,6 @@ static inline void compound_unlock_irqrestore(struct page *page,
> >  #endif
> >  }
> >  
> > -static inline struct page *compound_head_by_tail(struct page *tail)
> > -{
> > -	struct page *head = tail->first_page;
> > -
> > -	/*
> > -	 * page->first_page may be a dangling pointer to an old
> > -	 * compound page, so recheck that it is still a tail
> > -	 * page before returning.
> > -	 */
> > -	smp_rmb();
> > -	if (likely(PageTail(tail)))
> > -		return head;
> > -	return tail;
> > -}
> > -
> > -/*
> > - * Since either compound page could be dismantled asynchronously in THP
> > - * or we access asynchronously arbitrary positioned struct page, there
> > - * would be tail flag race. To handle this race, we should call
> > - * smp_rmb() before checking tail flag. compound_head_by_tail() did it.
> > - */
> > -static inline struct page *compound_head(struct page *page)
> > -{
> > -	if (unlikely(PageTail(page)))
> > -		return compound_head_by_tail(page);
> > -	return page;
> > -}
> > -
> > -/*
> > - * If we access compound page synchronously such as access to
> > - * allocated page, there is no need to handle tail flag race, so we can
> > - * check tail flag directly without any synchronization primitive.
> > - */
> > -static inline struct page *compound_head_fast(struct page *page)
> > -{
> > -	if (unlikely(PageTail(page)))
> > -		return page->first_page;
> > -	return page;
> > -}
> > -
> >  /*
> >   * The atomic page->_mapcount, starts from -1: so that transitions
> >   * both from it and to it can be tracked, using atomic_inc_and_test
> > @@ -525,7 +485,7 @@ static inline void get_huge_page_tail(struct page *page)
> >  	VM_BUG_ON_PAGE(!PageTail(page), page);
> >  	VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
> >  	VM_BUG_ON_PAGE(atomic_read(&page->_count) != 0, page);
> > -	if (compound_tail_refcounted(page->first_page))
> > +	if (compound_tail_refcounted(compound_head(page)))
> >  		atomic_inc(&page->_mapcount);
> >  }
> >  
> > @@ -548,13 +508,7 @@ static inline struct page *virt_to_head_page(const void *x)
> >  {
> >  	struct page *page = virt_to_page(x);
> >  
> > -	/*
> > -	 * We don't need to worry about synchronization of tail flag
> > -	 * when we call virt_to_head_page() since it is only called for
> > -	 * already allocated page and this page won't be freed until
> > -	 * this virt_to_head_page() is finished. So use _fast variant.
> > -	 */
> > -	return compound_head_fast(page);
> > +	return compound_head(page);
> >  }
> >  
> >  /*
> > @@ -1494,8 +1448,7 @@ static inline bool ptlock_init(struct page *page)
> >  	 * with 0. Make sure nobody took it in use in between.
> >  	 *
> >  	 * It can happen if arch try to use slab for page table allocation:
> > -	 * slab code uses page->slab_cache and page->first_page (for tail
> > -	 * pages), which share storage with page->ptl.
> > +	 * slab code uses page->slab_cache, which share storage with page->ptl.
> >  	 */
> >  	VM_BUG_ON_PAGE(*(unsigned long *)&page->ptl, page);
> >  	if (!ptlock_alloc(page))
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 63cdfe7ec336..ad47b61d96ce 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -120,7 +120,12 @@ struct page {
> >  		};
> >  	};
> >  
> > -	/* Third double word block */
> > +	/*
> > +	 * Third double word block
> > +	 *
> > +	 * WARNING: bit 0 of the first word encode PageTail and *must* be 0
> > +	 * for non-tail pages.
> > +	 */
> >  	union {
> >  		struct list_head lru;	/* Pageout list, eg. active_list
> >  					 * protected by zone->lru_lock !
> > @@ -143,6 +148,7 @@ struct page {
> >  						 */
> >  		/* First tail page of compound page */
> >  		struct {
> > +			unsigned long compound_head; /* If bit zero is set */
> >  #ifdef CONFIG_64BIT
> >  			unsigned int compound_dtor;
> >  			unsigned int compound_order;
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index 41c93844fb1d..9b865158e452 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -86,12 +86,7 @@ enum pageflags {
> >  	PG_private,		/* If pagecache, has fs-private data */
> >  	PG_private_2,		/* If pagecache, has fs aux data */
> >  	PG_writeback,		/* Page is under writeback */
> > -#ifdef CONFIG_PAGEFLAGS_EXTENDED
> >  	PG_head,		/* A head page */
> > -	PG_tail,		/* A tail page */
> > -#else
> > -	PG_compound,		/* A compound page */
> > -#endif
> >  	PG_swapcache,		/* Swap page: swp_entry_t in private */
> >  	PG_mappedtodisk,	/* Has blocks allocated on-disk */
> >  	PG_reclaim,		/* To be reclaimed asap */
> > @@ -387,85 +382,46 @@ static inline void set_page_writeback_keepwrite(struct page *page)
> >  	test_set_page_writeback_keepwrite(page);
> >  }
> >  
> > -#ifdef CONFIG_PAGEFLAGS_EXTENDED
> > -/*
> > - * System with lots of page flags available. This allows separate
> > - * flags for PageHead() and PageTail() checks of compound pages so that bit
> > - * tests can be used in performance sensitive paths. PageCompound is
> > - * generally not used in hot code paths except arch/powerpc/mm/init_64.c
> > - * and arch/powerpc/kvm/book3s_64_vio_hv.c which use it to detect huge pages
> > - * and avoid handling those in real mode.
> > - */
> >  __PAGEFLAG(Head, head) CLEARPAGEFLAG(Head, head)
> > -__PAGEFLAG(Tail, tail)
> >  
> > -static inline int PageCompound(struct page *page)
> > -{
> > -	return page->flags & ((1L << PG_head) | (1L << PG_tail));
> > -
> > -}
> > -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > -static inline void ClearPageCompound(struct page *page)
> > +static inline int PageTail(struct page *page)
> >  {
> > -	BUG_ON(!PageHead(page));
> > -	ClearPageHead(page);
> > +	return READ_ONCE(page->compound_head) & 1;
> >  }
> > -#endif
> > -
> > -#define PG_head_mask ((1L << PG_head))
> >  
> > -#else
> > -/*
> > - * Reduce page flag use as much as possible by overlapping
> > - * compound page flags with the flags used for page cache pages. Possible
> > - * because PageCompound is always set for compound pages and not for
> > - * pages on the LRU and/or pagecache.
> > - */
> > -TESTPAGEFLAG(Compound, compound)
> > -__SETPAGEFLAG(Head, compound)  __CLEARPAGEFLAG(Head, compound)
> > -
> > -/*
> > - * PG_reclaim is used in combination with PG_compound to mark the
> > - * head and tail of a compound page. This saves one page flag
> > - * but makes it impossible to use compound pages for the page cache.
> > - * The PG_reclaim bit would have to be used for reclaim or readahead
> > - * if compound pages enter the page cache.
> > - *
> > - * PG_compound & PG_reclaim	=> Tail page
> > - * PG_compound & ~PG_reclaim	=> Head page
> > - */
> > -#define PG_head_mask ((1L << PG_compound))
> > -#define PG_head_tail_mask ((1L << PG_compound) | (1L << PG_reclaim))
> > -
> > -static inline int PageHead(struct page *page)
> > +static inline void set_compound_head(struct page *page, struct page *head)
> >  {
> > -	return ((page->flags & PG_head_tail_mask) == PG_head_mask);
> > +	WRITE_ONCE(page->compound_head, (unsigned long)head + 1);
> >  }
> >  
> > -static inline int PageTail(struct page *page)
> > +static inline void clear_compound_head(struct page *page)
> >  {
> > -	return ((page->flags & PG_head_tail_mask) == PG_head_tail_mask);
> > +	WRITE_ONCE(page->compound_head, 0);
> >  }
> >  
> > -static inline void __SetPageTail(struct page *page)
> > +static inline struct page *compound_head(struct page *page)
> >  {
> > -	page->flags |= PG_head_tail_mask;
> > +	unsigned long head = READ_ONCE(page->compound_head);
> > +
> > +	if (unlikely(head & 1))
> > +		return (struct page *) (head - 1);
> > +	return page;
> >  }
> >  
> > -static inline void __ClearPageTail(struct page *page)
> > +static inline int PageCompound(struct page *page)
> >  {
> > -	page->flags &= ~PG_head_tail_mask;
> > -}
> > +	return PageHead(page) || PageTail(page);
> >  
> > +}
> >  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >  static inline void ClearPageCompound(struct page *page)
> >  {
> > -	BUG_ON((page->flags & PG_head_tail_mask) != (1 << PG_compound));
> > -	clear_bit(PG_compound, &page->flags);
> > +	BUG_ON(!PageHead(page));
> > +	ClearPageHead(page);
> >  }
> >  #endif
> >  
> > -#endif /* !PAGEFLAGS_EXTENDED */
> > +#define PG_head_mask ((1L << PG_head))
> >  
> >  #ifdef CONFIG_HUGETLB_PAGE
> >  int PageHuge(struct page *page);
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index e79de2bd12cd..454579d31081 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -200,18 +200,6 @@ config MEMORY_HOTREMOVE
> >  	depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE
> >  	depends on MIGRATION
> >  
> > -#
> > -# If we have space for more page flags then we can enable additional
> > -# optimizations and functionality.
> > -#
> > -# Regular Sparsemem takes page flag bits for the sectionid if it does not
> > -# use a virtual memmap. Disable extended page flags for 32 bit platforms
> > -# that require the use of a sectionid in the page flags.
> > -#
> > -config PAGEFLAGS_EXTENDED
> > -	def_bool y
> > -	depends on 64BIT || SPARSEMEM_VMEMMAP || !SPARSEMEM
> > -
> >  # Heavily threaded applications may benefit from splitting the mm-wide
> >  # page_table_lock, so that faults on different parts of the user address
> >  # space can be handled with less contention: split it at this NR_CPUS.
> > diff --git a/mm/debug.c b/mm/debug.c
> > index 76089ddf99ea..205e5ef957ab 100644
> > --- a/mm/debug.c
> > +++ b/mm/debug.c
> > @@ -25,12 +25,7 @@ static const struct trace_print_flags pageflag_names[] = {
> >  	{1UL << PG_private,		"private"	},
> >  	{1UL << PG_private_2,		"private_2"	},
> >  	{1UL << PG_writeback,		"writeback"	},
> > -#ifdef CONFIG_PAGEFLAGS_EXTENDED
> >  	{1UL << PG_head,		"head"		},
> > -	{1UL << PG_tail,		"tail"		},
> > -#else
> > -	{1UL << PG_compound,		"compound"	},
> > -#endif
> >  	{1UL << PG_swapcache,		"swapcache"	},
> >  	{1UL << PG_mappedtodisk,	"mappedtodisk"	},
> >  	{1UL << PG_reclaim,		"reclaim"	},
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 097c7a4bfbd9..330377f83ac7 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -1686,8 +1686,7 @@ static void __split_huge_page_refcount(struct page *page,
> >  				      (1L << PG_unevictable)));
> >  		page_tail->flags |= (1L << PG_dirty);
> >  
> > -		/* clear PageTail before overwriting first_page */
> > -		smp_wmb();
> > +		clear_compound_head(page_tail);
> >  
> >  		/*
> >  		 * __split_huge_page_splitting() already set the
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 8ea74caa1fa8..53c0709fd87b 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -824,9 +824,8 @@ static void destroy_compound_gigantic_page(struct page *page,
> >  	struct page *p = page + 1;
> >  
> >  	for (i = 1; i < nr_pages; i++, p = mem_map_next(p, page, i)) {
> > -		__ClearPageTail(p);
> > +		clear_compound_head(p);
> >  		set_page_refcounted(p);
> > -		p->first_page = NULL;
> >  	}
> >  
> >  	set_compound_order(page, 0);
> > @@ -1099,10 +1098,7 @@ static void prep_compound_gigantic_page(struct page *page, unsigned long order)
> >  		 */
> >  		__ClearPageReserved(p);
> >  		set_page_count(p, 0);
> > -		p->first_page = page;
> > -		/* Make sure p->first_page is always valid for PageTail() */
> > -		smp_wmb();
> > -		__SetPageTail(p);
> > +		set_compound_head(p, page);
> >  	}
> >  }
> >  
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 36b23f1e2ca6..89e21a07080a 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -61,9 +61,9 @@ static inline void __get_page_tail_foll(struct page *page,
> >  	 * speculative page access (like in
> >  	 * page_cache_get_speculative()) on tail pages.
> >  	 */
> > -	VM_BUG_ON_PAGE(atomic_read(&page->first_page->_count) <= 0, page);
> > +	VM_BUG_ON_PAGE(atomic_read(&compound_head(page)->_count) <= 0, page);
> >  	if (get_page_head)
> > -		atomic_inc(&page->first_page->_count);
> > +		atomic_inc(&compound_head(page)->_count);
> >  	get_huge_page_tail(page);
> >  }
> >  
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index 1f4446a90cef..4d1a5de9653d 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -787,8 +787,6 @@ static int me_huge_page(struct page *p, unsigned long pfn)
> >  #define lru		(1UL << PG_lru)
> >  #define swapbacked	(1UL << PG_swapbacked)
> >  #define head		(1UL << PG_head)
> > -#define tail		(1UL << PG_tail)
> > -#define compound	(1UL << PG_compound)
> >  #define slab		(1UL << PG_slab)
> >  #define reserved	(1UL << PG_reserved)
> >  
> > @@ -811,12 +809,7 @@ static struct page_state {
> >  	 */
> >  	{ slab,		slab,		MF_MSG_SLAB,	me_kernel },
> >  
> > -#ifdef CONFIG_PAGEFLAGS_EXTENDED
> >  	{ head,		head,		MF_MSG_HUGE,		me_huge_page },
> > -	{ tail,		tail,		MF_MSG_HUGE,		me_huge_page },
> > -#else
> > -	{ compound,	compound,	MF_MSG_HUGE,		me_huge_page },
> > -#endif
> >  
> >  	{ sc|dirty,	sc|dirty,	MF_MSG_DIRTY_SWAPCACHE,	me_swapcache_dirty },
> >  	{ sc|dirty,	sc,		MF_MSG_CLEAN_SWAPCACHE,	me_swapcache_clean },
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index beab86e694b2..16c3f97a7d30 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -426,7 +426,7 @@ out:
> >   *
> >   * The remaining PAGE_SIZE pages are called "tail pages".
> >   *
> > - * All pages have PG_compound set.  All tail pages have their ->first_page
> > + * All pages have PG_compound set.  All tail pages have their compound_head()
> >   * pointing at the head page.
> >   *
> >   * The first tail page's ->lru.next holds the address of the compound page's
> > @@ -450,10 +450,7 @@ void prep_compound_page(struct page *page, unsigned long order)
> >  	for (i = 1; i < nr_pages; i++) {
> >  		struct page *p = page + i;
> >  		set_page_count(p, 0);
> > -		p->first_page = page;
> > -		/* Make sure p->first_page is always valid for PageTail() */
> > -		smp_wmb();
> > -		__SetPageTail(p);
> > +		set_compound_head(p, page);
> >  	}
> >  }
> >  
> > @@ -828,17 +825,24 @@ static void free_one_page(struct zone *zone,
> >  
> >  static int free_tail_pages_check(struct page *head_page, struct page *page)
> >  {
> > -	if (!IS_ENABLED(CONFIG_DEBUG_VM))
> > -		return 0;
> > +	int ret = 1;
> > +
> > +	if (!IS_ENABLED(CONFIG_DEBUG_VM)) {
> > +		ret = 0;
> > +		goto out;
> > +	}
> >  	if (unlikely(!PageTail(page))) {
> >  		bad_page(page, "PageTail not set", 0);
> > -		return 1;
> > +		goto out;
> >  	}
> > -	if (unlikely(page->first_page != head_page)) {
> > -		bad_page(page, "first_page not consistent", 0);
> > -		return 1;
> > +	if (unlikely(compound_head(page) != head_page)) {
> > +		bad_page(page, "compound_head not consistent", 0);
> > +		goto out;
> >  	}
> > -	return 0;
> > +	ret = 0;
> > +out:
> > +	clear_compound_head(page);
> > +	return ret;
> >  }
> >  
> >  static void __meminit __init_single_page(struct page *page, unsigned long pfn,
> > diff --git a/mm/swap.c b/mm/swap.c
> > index a3a0a2f1f7c3..faa9e1687dea 100644
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -200,7 +200,7 @@ out_put_single:
> >  				__put_single_page(page);
> >  			return;
> >  		}
> > -		VM_BUG_ON_PAGE(page_head != page->first_page, page);
> > +		VM_BUG_ON_PAGE(page_head != compound_head(page), page);
> >  		/*
> >  		 * We can release the refcount taken by
> >  		 * get_page_unless_zero() now that
> > @@ -261,7 +261,7 @@ static void put_compound_page(struct page *page)
> >  	 *  Case 3 is possible, as we may race with
> >  	 *  __split_huge_page_refcount tearing down a THP page.
> >  	 */
> > -	page_head = compound_head_by_tail(page);
> > +	page_head = compound_head(page);
> >  	if (!__compound_tail_refcounted(page_head))
> >  		put_unrefcounted_compound_page(page_head, page);
> >  	else
> > -- 
> > 2.5.0
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 4/4] mm: make compound_head() robust
@ 2015-08-18 16:41       ` Michal Hocko
  0 siblings, 0 replies; 34+ messages in thread
From: Michal Hocko @ 2015-08-18 16:41 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Hugh Dickins, Andrea Arcangeli, Dave Hansen,
	Vlastimil Babka, Johannes Weiner, David Rientjes, linux-kernel,
	linux-mm

On Tue 18-08-15 13:20:22, Michal Hocko wrote:
> On Mon 17-08-15 18:09:05, Kirill A. Shutemov wrote:
> > Hugh has pointed that compound_head() call can be unsafe in some
> > context. There's one example:
> > 
> > 	CPU0					CPU1
> > 
> > isolate_migratepages_block()
> >   page_count()
> >     compound_head()
> >       !!PageTail() == true
> > 					put_page()
> > 					  tail->first_page = NULL
> >       head = tail->first_page
> > 					alloc_pages(__GFP_COMP)
> > 					   prep_compound_page()
> > 					     tail->first_page = head
> > 					     __SetPageTail(p);
> >       !!PageTail() == true
> >     <head == NULL dereferencing>
> > 
> > The race is pure theoretical. I don't it's possible to trigger it in
> > practice. But who knows.
> > 
> > We can fix the race by changing how encode PageTail() and compound_head()
> > within struct page to be able to update them in one shot.
> > 
> > The patch introduces page->compound_head into third double word block in
> > front of compound_dtor and compound_order. That means it shares storage
> > space with:
> > 
> >  - page->lru.next;
> >  - page->next;
> >  - page->rcu_head.next;
> >  - page->pmd_huge_pte;
> > 
> > That's too long list to be absolutely sure, but looks like nobody uses
> > bit 0 of the word. It can be used to encode PageTail(). And if the bit
> > set, rest of the word is pointer to head page.
> 
> I didn't look too closely but the general idea makes sense to me and the
> overal code simplification is sound. I will give it more detailed review
> after I sort out other stuff.

AFICS page::first_page wasn't used outside of compound page logic so you
should remove it in this patch. The rest looks good to me.

Acked-by: Michal Hocko <mhocko@suse.com>
 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  Documentation/vm/split_page_table_lock |  4 +-
> >  arch/xtensa/configs/iss_defconfig      |  1 -
> >  include/linux/mm.h                     | 53 ++--------------------
> >  include/linux/mm_types.h               |  8 +++-
> >  include/linux/page-flags.h             | 80 ++++++++--------------------------
> >  mm/Kconfig                             | 12 -----
> >  mm/debug.c                             |  5 ---
> >  mm/huge_memory.c                       |  3 +-
> >  mm/hugetlb.c                           |  8 +---
> >  mm/internal.h                          |  4 +-
> >  mm/memory-failure.c                    |  7 ---
> >  mm/page_alloc.c                        | 28 +++++++-----
> >  mm/swap.c                              |  4 +-
> >  13 files changed, 53 insertions(+), 164 deletions(-)
> > 
> > diff --git a/Documentation/vm/split_page_table_lock b/Documentation/vm/split_page_table_lock
> > index 6dea4fd5c961..62842a857dab 100644
> > --- a/Documentation/vm/split_page_table_lock
> > +++ b/Documentation/vm/split_page_table_lock
> > @@ -54,8 +54,8 @@ everything required is done by pgtable_page_ctor() and pgtable_page_dtor(),
> >  which must be called on PTE table allocation / freeing.
> >  
> >  Make sure the architecture doesn't use slab allocator for page table
> > -allocation: slab uses page->slab_cache and page->first_page for its pages.
> > -These fields share storage with page->ptl.
> > +allocation: slab uses page->slab_cache for its pages.
> > +This field shares storage with page->ptl.
> >  
> >  PMD split lock only makes sense if you have more than two page table
> >  levels.
> > diff --git a/arch/xtensa/configs/iss_defconfig b/arch/xtensa/configs/iss_defconfig
> > index e4d193e7a300..5c7c385f21c4 100644
> > --- a/arch/xtensa/configs/iss_defconfig
> > +++ b/arch/xtensa/configs/iss_defconfig
> > @@ -169,7 +169,6 @@ CONFIG_FLATMEM_MANUAL=y
> >  # CONFIG_SPARSEMEM_MANUAL is not set
> >  CONFIG_FLATMEM=y
> >  CONFIG_FLAT_NODE_MEM_MAP=y
> > -CONFIG_PAGEFLAGS_EXTENDED=y
> >  CONFIG_SPLIT_PTLOCK_CPUS=4
> >  # CONFIG_PHYS_ADDR_T_64BIT is not set
> >  CONFIG_ZONE_DMA_FLAG=1
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 9c21bbb8875a..5090a0b9bb43 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -437,46 +437,6 @@ static inline void compound_unlock_irqrestore(struct page *page,
> >  #endif
> >  }
> >  
> > -static inline struct page *compound_head_by_tail(struct page *tail)
> > -{
> > -	struct page *head = tail->first_page;
> > -
> > -	/*
> > -	 * page->first_page may be a dangling pointer to an old
> > -	 * compound page, so recheck that it is still a tail
> > -	 * page before returning.
> > -	 */
> > -	smp_rmb();
> > -	if (likely(PageTail(tail)))
> > -		return head;
> > -	return tail;
> > -}
> > -
> > -/*
> > - * Since either compound page could be dismantled asynchronously in THP
> > - * or we access asynchronously arbitrary positioned struct page, there
> > - * would be tail flag race. To handle this race, we should call
> > - * smp_rmb() before checking tail flag. compound_head_by_tail() did it.
> > - */
> > -static inline struct page *compound_head(struct page *page)
> > -{
> > -	if (unlikely(PageTail(page)))
> > -		return compound_head_by_tail(page);
> > -	return page;
> > -}
> > -
> > -/*
> > - * If we access compound page synchronously such as access to
> > - * allocated page, there is no need to handle tail flag race, so we can
> > - * check tail flag directly without any synchronization primitive.
> > - */
> > -static inline struct page *compound_head_fast(struct page *page)
> > -{
> > -	if (unlikely(PageTail(page)))
> > -		return page->first_page;
> > -	return page;
> > -}
> > -
> >  /*
> >   * The atomic page->_mapcount, starts from -1: so that transitions
> >   * both from it and to it can be tracked, using atomic_inc_and_test
> > @@ -525,7 +485,7 @@ static inline void get_huge_page_tail(struct page *page)
> >  	VM_BUG_ON_PAGE(!PageTail(page), page);
> >  	VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
> >  	VM_BUG_ON_PAGE(atomic_read(&page->_count) != 0, page);
> > -	if (compound_tail_refcounted(page->first_page))
> > +	if (compound_tail_refcounted(compound_head(page)))
> >  		atomic_inc(&page->_mapcount);
> >  }
> >  
> > @@ -548,13 +508,7 @@ static inline struct page *virt_to_head_page(const void *x)
> >  {
> >  	struct page *page = virt_to_page(x);
> >  
> > -	/*
> > -	 * We don't need to worry about synchronization of tail flag
> > -	 * when we call virt_to_head_page() since it is only called for
> > -	 * already allocated page and this page won't be freed until
> > -	 * this virt_to_head_page() is finished. So use _fast variant.
> > -	 */
> > -	return compound_head_fast(page);
> > +	return compound_head(page);
> >  }
> >  
> >  /*
> > @@ -1494,8 +1448,7 @@ static inline bool ptlock_init(struct page *page)
> >  	 * with 0. Make sure nobody took it in use in between.
> >  	 *
> >  	 * It can happen if arch try to use slab for page table allocation:
> > -	 * slab code uses page->slab_cache and page->first_page (for tail
> > -	 * pages), which share storage with page->ptl.
> > +	 * slab code uses page->slab_cache, which share storage with page->ptl.
> >  	 */
> >  	VM_BUG_ON_PAGE(*(unsigned long *)&page->ptl, page);
> >  	if (!ptlock_alloc(page))
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 63cdfe7ec336..ad47b61d96ce 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -120,7 +120,12 @@ struct page {
> >  		};
> >  	};
> >  
> > -	/* Third double word block */
> > +	/*
> > +	 * Third double word block
> > +	 *
> > +	 * WARNING: bit 0 of the first word encode PageTail and *must* be 0
> > +	 * for non-tail pages.
> > +	 */
> >  	union {
> >  		struct list_head lru;	/* Pageout list, eg. active_list
> >  					 * protected by zone->lru_lock !
> > @@ -143,6 +148,7 @@ struct page {
> >  						 */
> >  		/* First tail page of compound page */
> >  		struct {
> > +			unsigned long compound_head; /* If bit zero is set */
> >  #ifdef CONFIG_64BIT
> >  			unsigned int compound_dtor;
> >  			unsigned int compound_order;
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index 41c93844fb1d..9b865158e452 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -86,12 +86,7 @@ enum pageflags {
> >  	PG_private,		/* If pagecache, has fs-private data */
> >  	PG_private_2,		/* If pagecache, has fs aux data */
> >  	PG_writeback,		/* Page is under writeback */
> > -#ifdef CONFIG_PAGEFLAGS_EXTENDED
> >  	PG_head,		/* A head page */
> > -	PG_tail,		/* A tail page */
> > -#else
> > -	PG_compound,		/* A compound page */
> > -#endif
> >  	PG_swapcache,		/* Swap page: swp_entry_t in private */
> >  	PG_mappedtodisk,	/* Has blocks allocated on-disk */
> >  	PG_reclaim,		/* To be reclaimed asap */
> > @@ -387,85 +382,46 @@ static inline void set_page_writeback_keepwrite(struct page *page)
> >  	test_set_page_writeback_keepwrite(page);
> >  }
> >  
> > -#ifdef CONFIG_PAGEFLAGS_EXTENDED
> > -/*
> > - * System with lots of page flags available. This allows separate
> > - * flags for PageHead() and PageTail() checks of compound pages so that bit
> > - * tests can be used in performance sensitive paths. PageCompound is
> > - * generally not used in hot code paths except arch/powerpc/mm/init_64.c
> > - * and arch/powerpc/kvm/book3s_64_vio_hv.c which use it to detect huge pages
> > - * and avoid handling those in real mode.
> > - */
> >  __PAGEFLAG(Head, head) CLEARPAGEFLAG(Head, head)
> > -__PAGEFLAG(Tail, tail)
> >  
> > -static inline int PageCompound(struct page *page)
> > -{
> > -	return page->flags & ((1L << PG_head) | (1L << PG_tail));
> > -
> > -}
> > -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > -static inline void ClearPageCompound(struct page *page)
> > +static inline int PageTail(struct page *page)
> >  {
> > -	BUG_ON(!PageHead(page));
> > -	ClearPageHead(page);
> > +	return READ_ONCE(page->compound_head) & 1;
> >  }
> > -#endif
> > -
> > -#define PG_head_mask ((1L << PG_head))
> >  
> > -#else
> > -/*
> > - * Reduce page flag use as much as possible by overlapping
> > - * compound page flags with the flags used for page cache pages. Possible
> > - * because PageCompound is always set for compound pages and not for
> > - * pages on the LRU and/or pagecache.
> > - */
> > -TESTPAGEFLAG(Compound, compound)
> > -__SETPAGEFLAG(Head, compound)  __CLEARPAGEFLAG(Head, compound)
> > -
> > -/*
> > - * PG_reclaim is used in combination with PG_compound to mark the
> > - * head and tail of a compound page. This saves one page flag
> > - * but makes it impossible to use compound pages for the page cache.
> > - * The PG_reclaim bit would have to be used for reclaim or readahead
> > - * if compound pages enter the page cache.
> > - *
> > - * PG_compound & PG_reclaim	=> Tail page
> > - * PG_compound & ~PG_reclaim	=> Head page
> > - */
> > -#define PG_head_mask ((1L << PG_compound))
> > -#define PG_head_tail_mask ((1L << PG_compound) | (1L << PG_reclaim))
> > -
> > -static inline int PageHead(struct page *page)
> > +static inline void set_compound_head(struct page *page, struct page *head)
> >  {
> > -	return ((page->flags & PG_head_tail_mask) == PG_head_mask);
> > +	WRITE_ONCE(page->compound_head, (unsigned long)head + 1);
> >  }
> >  
> > -static inline int PageTail(struct page *page)
> > +static inline void clear_compound_head(struct page *page)
> >  {
> > -	return ((page->flags & PG_head_tail_mask) == PG_head_tail_mask);
> > +	WRITE_ONCE(page->compound_head, 0);
> >  }
> >  
> > -static inline void __SetPageTail(struct page *page)
> > +static inline struct page *compound_head(struct page *page)
> >  {
> > -	page->flags |= PG_head_tail_mask;
> > +	unsigned long head = READ_ONCE(page->compound_head);
> > +
> > +	if (unlikely(head & 1))
> > +		return (struct page *) (head - 1);
> > +	return page;
> >  }
> >  
> > -static inline void __ClearPageTail(struct page *page)
> > +static inline int PageCompound(struct page *page)
> >  {
> > -	page->flags &= ~PG_head_tail_mask;
> > -}
> > +	return PageHead(page) || PageTail(page);
> >  
> > +}
> >  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >  static inline void ClearPageCompound(struct page *page)
> >  {
> > -	BUG_ON((page->flags & PG_head_tail_mask) != (1 << PG_compound));
> > -	clear_bit(PG_compound, &page->flags);
> > +	BUG_ON(!PageHead(page));
> > +	ClearPageHead(page);
> >  }
> >  #endif
> >  
> > -#endif /* !PAGEFLAGS_EXTENDED */
> > +#define PG_head_mask ((1L << PG_head))
> >  
> >  #ifdef CONFIG_HUGETLB_PAGE
> >  int PageHuge(struct page *page);
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index e79de2bd12cd..454579d31081 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -200,18 +200,6 @@ config MEMORY_HOTREMOVE
> >  	depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE
> >  	depends on MIGRATION
> >  
> > -#
> > -# If we have space for more page flags then we can enable additional
> > -# optimizations and functionality.
> > -#
> > -# Regular Sparsemem takes page flag bits for the sectionid if it does not
> > -# use a virtual memmap. Disable extended page flags for 32 bit platforms
> > -# that require the use of a sectionid in the page flags.
> > -#
> > -config PAGEFLAGS_EXTENDED
> > -	def_bool y
> > -	depends on 64BIT || SPARSEMEM_VMEMMAP || !SPARSEMEM
> > -
> >  # Heavily threaded applications may benefit from splitting the mm-wide
> >  # page_table_lock, so that faults on different parts of the user address
> >  # space can be handled with less contention: split it at this NR_CPUS.
> > diff --git a/mm/debug.c b/mm/debug.c
> > index 76089ddf99ea..205e5ef957ab 100644
> > --- a/mm/debug.c
> > +++ b/mm/debug.c
> > @@ -25,12 +25,7 @@ static const struct trace_print_flags pageflag_names[] = {
> >  	{1UL << PG_private,		"private"	},
> >  	{1UL << PG_private_2,		"private_2"	},
> >  	{1UL << PG_writeback,		"writeback"	},
> > -#ifdef CONFIG_PAGEFLAGS_EXTENDED
> >  	{1UL << PG_head,		"head"		},
> > -	{1UL << PG_tail,		"tail"		},
> > -#else
> > -	{1UL << PG_compound,		"compound"	},
> > -#endif
> >  	{1UL << PG_swapcache,		"swapcache"	},
> >  	{1UL << PG_mappedtodisk,	"mappedtodisk"	},
> >  	{1UL << PG_reclaim,		"reclaim"	},
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 097c7a4bfbd9..330377f83ac7 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -1686,8 +1686,7 @@ static void __split_huge_page_refcount(struct page *page,
> >  				      (1L << PG_unevictable)));
> >  		page_tail->flags |= (1L << PG_dirty);
> >  
> > -		/* clear PageTail before overwriting first_page */
> > -		smp_wmb();
> > +		clear_compound_head(page_tail);
> >  
> >  		/*
> >  		 * __split_huge_page_splitting() already set the
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 8ea74caa1fa8..53c0709fd87b 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -824,9 +824,8 @@ static void destroy_compound_gigantic_page(struct page *page,
> >  	struct page *p = page + 1;
> >  
> >  	for (i = 1; i < nr_pages; i++, p = mem_map_next(p, page, i)) {
> > -		__ClearPageTail(p);
> > +		clear_compound_head(p);
> >  		set_page_refcounted(p);
> > -		p->first_page = NULL;
> >  	}
> >  
> >  	set_compound_order(page, 0);
> > @@ -1099,10 +1098,7 @@ static void prep_compound_gigantic_page(struct page *page, unsigned long order)
> >  		 */
> >  		__ClearPageReserved(p);
> >  		set_page_count(p, 0);
> > -		p->first_page = page;
> > -		/* Make sure p->first_page is always valid for PageTail() */
> > -		smp_wmb();
> > -		__SetPageTail(p);
> > +		set_compound_head(p, page);
> >  	}
> >  }
> >  
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 36b23f1e2ca6..89e21a07080a 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -61,9 +61,9 @@ static inline void __get_page_tail_foll(struct page *page,
> >  	 * speculative page access (like in
> >  	 * page_cache_get_speculative()) on tail pages.
> >  	 */
> > -	VM_BUG_ON_PAGE(atomic_read(&page->first_page->_count) <= 0, page);
> > +	VM_BUG_ON_PAGE(atomic_read(&compound_head(page)->_count) <= 0, page);
> >  	if (get_page_head)
> > -		atomic_inc(&page->first_page->_count);
> > +		atomic_inc(&compound_head(page)->_count);
> >  	get_huge_page_tail(page);
> >  }
> >  
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index 1f4446a90cef..4d1a5de9653d 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -787,8 +787,6 @@ static int me_huge_page(struct page *p, unsigned long pfn)
> >  #define lru		(1UL << PG_lru)
> >  #define swapbacked	(1UL << PG_swapbacked)
> >  #define head		(1UL << PG_head)
> > -#define tail		(1UL << PG_tail)
> > -#define compound	(1UL << PG_compound)
> >  #define slab		(1UL << PG_slab)
> >  #define reserved	(1UL << PG_reserved)
> >  
> > @@ -811,12 +809,7 @@ static struct page_state {
> >  	 */
> >  	{ slab,		slab,		MF_MSG_SLAB,	me_kernel },
> >  
> > -#ifdef CONFIG_PAGEFLAGS_EXTENDED
> >  	{ head,		head,		MF_MSG_HUGE,		me_huge_page },
> > -	{ tail,		tail,		MF_MSG_HUGE,		me_huge_page },
> > -#else
> > -	{ compound,	compound,	MF_MSG_HUGE,		me_huge_page },
> > -#endif
> >  
> >  	{ sc|dirty,	sc|dirty,	MF_MSG_DIRTY_SWAPCACHE,	me_swapcache_dirty },
> >  	{ sc|dirty,	sc,		MF_MSG_CLEAN_SWAPCACHE,	me_swapcache_clean },
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index beab86e694b2..16c3f97a7d30 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -426,7 +426,7 @@ out:
> >   *
> >   * The remaining PAGE_SIZE pages are called "tail pages".
> >   *
> > - * All pages have PG_compound set.  All tail pages have their ->first_page
> > + * All pages have PG_compound set.  All tail pages have their compound_head()
> >   * pointing at the head page.
> >   *
> >   * The first tail page's ->lru.next holds the address of the compound page's
> > @@ -450,10 +450,7 @@ void prep_compound_page(struct page *page, unsigned long order)
> >  	for (i = 1; i < nr_pages; i++) {
> >  		struct page *p = page + i;
> >  		set_page_count(p, 0);
> > -		p->first_page = page;
> > -		/* Make sure p->first_page is always valid for PageTail() */
> > -		smp_wmb();
> > -		__SetPageTail(p);
> > +		set_compound_head(p, page);
> >  	}
> >  }
> >  
> > @@ -828,17 +825,24 @@ static void free_one_page(struct zone *zone,
> >  
> >  static int free_tail_pages_check(struct page *head_page, struct page *page)
> >  {
> > -	if (!IS_ENABLED(CONFIG_DEBUG_VM))
> > -		return 0;
> > +	int ret = 1;
> > +
> > +	if (!IS_ENABLED(CONFIG_DEBUG_VM)) {
> > +		ret = 0;
> > +		goto out;
> > +	}
> >  	if (unlikely(!PageTail(page))) {
> >  		bad_page(page, "PageTail not set", 0);
> > -		return 1;
> > +		goto out;
> >  	}
> > -	if (unlikely(page->first_page != head_page)) {
> > -		bad_page(page, "first_page not consistent", 0);
> > -		return 1;
> > +	if (unlikely(compound_head(page) != head_page)) {
> > +		bad_page(page, "compound_head not consistent", 0);
> > +		goto out;
> >  	}
> > -	return 0;
> > +	ret = 0;
> > +out:
> > +	clear_compound_head(page);
> > +	return ret;
> >  }
> >  
> >  static void __meminit __init_single_page(struct page *page, unsigned long pfn,
> > diff --git a/mm/swap.c b/mm/swap.c
> > index a3a0a2f1f7c3..faa9e1687dea 100644
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -200,7 +200,7 @@ out_put_single:
> >  				__put_single_page(page);
> >  			return;
> >  		}
> > -		VM_BUG_ON_PAGE(page_head != page->first_page, page);
> > +		VM_BUG_ON_PAGE(page_head != compound_head(page), page);
> >  		/*
> >  		 * We can release the refcount taken by
> >  		 * get_page_unless_zero() now that
> > @@ -261,7 +261,7 @@ static void put_compound_page(struct page *page)
> >  	 *  Case 3 is possible, as we may race with
> >  	 *  __split_huge_page_refcount tearing down a THP page.
> >  	 */
> > -	page_head = compound_head_by_tail(page);
> > +	page_head = compound_head(page);
> >  	if (!__compound_tail_refcounted(page_head))
> >  		put_unrefcounted_compound_page(page_head, page);
> >  	else
> > -- 
> > 2.5.0
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 3/4] mm: pack compound_dtor and compound_order into one word in struct page
  2015-08-18 15:43     ` Michal Hocko
@ 2015-08-18 18:22       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2015-08-18 18:22 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kirill A. Shutemov, Andrew Morton, Hugh Dickins,
	Andrea Arcangeli, Dave Hansen, Vlastimil Babka, Johannes Weiner,
	David Rientjes, linux-kernel, linux-mm

On Tue, Aug 18, 2015 at 05:43:00PM +0200, Michal Hocko wrote:
> On Mon 17-08-15 18:09:04, Kirill A. Shutemov wrote:
> > The patch halves space occupied by compound_dtor and compound_order in
> > struct page.
> > 
> > For compound_order, it's trivial long -> int/short conversion.
> > 
> > For get_compound_page_dtor(), we now use hardcoded table for destructor
> > lookup and store its index in the struct page instead of direct pointer
> > to destructor. It shouldn't be a big trouble to maintain the table: we
> > have only two destructor and NULL currently.
> > 
> > This patch free up one word in tail pages for reuse. This is preparation
> > for the next patch.
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> 
> Reviewed-by: Michal Hocko <mhocko@suse.com>
> 
> [...]
> > @@ -145,8 +143,13 @@ struct page {
> >  						 */
> >  		/* First tail page of compound page */
> >  		struct {
> > -			compound_page_dtor *compound_dtor;
> > -			unsigned long compound_order;
> > +#ifdef CONFIG_64BIT
> > +			unsigned int compound_dtor;
> > +			unsigned int compound_order;
> > +#else
> > +			unsigned short int compound_dtor;
> > +			unsigned short int compound_order;
> > +#endif
> >  		};
> 
> Why do we need this ifdef? We can go with short for both 32b and 64b
> AFAICS.

My assumption was that operations on ints can be faster on some
[micro]arhictectures. I'm not sure if it's ever true.

> We do not use compound_order for anything else than the order, right?

Right.

> While I am looking at this, it seems we are jugling with type for order
> quite a lot - int, unsing int and even unsigned long.

Yeah. It's mess. I'll check if I can fix anything of it in v3.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 3/4] mm: pack compound_dtor and compound_order into one word in struct page
@ 2015-08-18 18:22       ` Kirill A. Shutemov
  0 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2015-08-18 18:22 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kirill A. Shutemov, Andrew Morton, Hugh Dickins,
	Andrea Arcangeli, Dave Hansen, Vlastimil Babka, Johannes Weiner,
	David Rientjes, linux-kernel, linux-mm

On Tue, Aug 18, 2015 at 05:43:00PM +0200, Michal Hocko wrote:
> On Mon 17-08-15 18:09:04, Kirill A. Shutemov wrote:
> > The patch halves space occupied by compound_dtor and compound_order in
> > struct page.
> > 
> > For compound_order, it's trivial long -> int/short conversion.
> > 
> > For get_compound_page_dtor(), we now use hardcoded table for destructor
> > lookup and store its index in the struct page instead of direct pointer
> > to destructor. It shouldn't be a big trouble to maintain the table: we
> > have only two destructor and NULL currently.
> > 
> > This patch free up one word in tail pages for reuse. This is preparation
> > for the next patch.
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> 
> Reviewed-by: Michal Hocko <mhocko@suse.com>
> 
> [...]
> > @@ -145,8 +143,13 @@ struct page {
> >  						 */
> >  		/* First tail page of compound page */
> >  		struct {
> > -			compound_page_dtor *compound_dtor;
> > -			unsigned long compound_order;
> > +#ifdef CONFIG_64BIT
> > +			unsigned int compound_dtor;
> > +			unsigned int compound_order;
> > +#else
> > +			unsigned short int compound_dtor;
> > +			unsigned short int compound_order;
> > +#endif
> >  		};
> 
> Why do we need this ifdef? We can go with short for both 32b and 64b
> AFAICS.

My assumption was that operations on ints can be faster on some
[micro]arhictectures. I'm not sure if it's ever true.

> We do not use compound_order for anything else than the order, right?

Right.

> While I am looking at this, it seems we are jugling with type for order
> quite a lot - int, unsing int and even unsigned long.

Yeah. It's mess. I'll check if I can fix anything of it in v3.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 3/4] mm: pack compound_dtor and compound_order into one word in struct page
  2015-08-18 16:05     ` Michal Hocko
@ 2015-08-18 18:22       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2015-08-18 18:22 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kirill A. Shutemov, Andrew Morton, Hugh Dickins,
	Andrea Arcangeli, Dave Hansen, Vlastimil Babka, Johannes Weiner,
	David Rientjes, linux-kernel, linux-mm

On Tue, Aug 18, 2015 at 06:05:31PM +0200, Michal Hocko wrote:
> On Mon 17-08-15 18:09:04, Kirill A. Shutemov wrote:
> [...]
> > +/* Keep the enum in sync with compound_page_dtors array in mm/page_alloc.c */
> > +enum {
> > +	NULL_COMPOUND_DTOR,
> > +	COMPOUND_PAGE_DTOR,
> > +	HUGETLB_PAGE_DTOR,
> > +	NR_COMPOUND_DTORS,
> > +};
> [...]
> > +static void free_compound_page(struct page *page);
> > +compound_page_dtor * const compound_page_dtors[] = {
> > +	NULL,
> > +	free_compound_page,
> > +	free_huge_page,
> > +};
> > +
> 
> Both need ifdef CONFIG_HUGETLB_PAGE as my compile test batter just found
> out.

I'll fix that.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 3/4] mm: pack compound_dtor and compound_order into one word in struct page
@ 2015-08-18 18:22       ` Kirill A. Shutemov
  0 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2015-08-18 18:22 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kirill A. Shutemov, Andrew Morton, Hugh Dickins,
	Andrea Arcangeli, Dave Hansen, Vlastimil Babka, Johannes Weiner,
	David Rientjes, linux-kernel, linux-mm

On Tue, Aug 18, 2015 at 06:05:31PM +0200, Michal Hocko wrote:
> On Mon 17-08-15 18:09:04, Kirill A. Shutemov wrote:
> [...]
> > +/* Keep the enum in sync with compound_page_dtors array in mm/page_alloc.c */
> > +enum {
> > +	NULL_COMPOUND_DTOR,
> > +	COMPOUND_PAGE_DTOR,
> > +	HUGETLB_PAGE_DTOR,
> > +	NR_COMPOUND_DTORS,
> > +};
> [...]
> > +static void free_compound_page(struct page *page);
> > +compound_page_dtor * const compound_page_dtors[] = {
> > +	NULL,
> > +	free_compound_page,
> > +	free_huge_page,
> > +};
> > +
> 
> Both need ifdef CONFIG_HUGETLB_PAGE as my compile test batter just found
> out.

I'll fix that.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 4/4] mm: make compound_head() robust
  2015-08-18 16:41       ` Michal Hocko
@ 2015-08-18 18:24         ` Kirill A. Shutemov
  -1 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2015-08-18 18:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kirill A. Shutemov, Andrew Morton, Hugh Dickins,
	Andrea Arcangeli, Dave Hansen, Vlastimil Babka, Johannes Weiner,
	David Rientjes, linux-kernel, linux-mm

On Tue, Aug 18, 2015 at 06:41:13PM +0200, Michal Hocko wrote:
> On Tue 18-08-15 13:20:22, Michal Hocko wrote:
> > On Mon 17-08-15 18:09:05, Kirill A. Shutemov wrote:
> > > Hugh has pointed that compound_head() call can be unsafe in some
> > > context. There's one example:
> > > 
> > > 	CPU0					CPU1
> > > 
> > > isolate_migratepages_block()
> > >   page_count()
> > >     compound_head()
> > >       !!PageTail() == true
> > > 					put_page()
> > > 					  tail->first_page = NULL
> > >       head = tail->first_page
> > > 					alloc_pages(__GFP_COMP)
> > > 					   prep_compound_page()
> > > 					     tail->first_page = head
> > > 					     __SetPageTail(p);
> > >       !!PageTail() == true
> > >     <head == NULL dereferencing>
> > > 
> > > The race is pure theoretical. I don't it's possible to trigger it in
> > > practice. But who knows.
> > > 
> > > We can fix the race by changing how encode PageTail() and compound_head()
> > > within struct page to be able to update them in one shot.
> > > 
> > > The patch introduces page->compound_head into third double word block in
> > > front of compound_dtor and compound_order. That means it shares storage
> > > space with:
> > > 
> > >  - page->lru.next;
> > >  - page->next;
> > >  - page->rcu_head.next;
> > >  - page->pmd_huge_pte;
> > > 
> > > That's too long list to be absolutely sure, but looks like nobody uses
> > > bit 0 of the word. It can be used to encode PageTail(). And if the bit
> > > set, rest of the word is pointer to head page.
> > 
> > I didn't look too closely but the general idea makes sense to me and the
> > overal code simplification is sound. I will give it more detailed review
> > after I sort out other stuff.
> 
> AFICS page::first_page wasn't used outside of compound page logic so you
> should remove it in this patch. The rest looks good to me.

I missed it by accident during rework for v2. Will fix.

> Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 4/4] mm: make compound_head() robust
@ 2015-08-18 18:24         ` Kirill A. Shutemov
  0 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2015-08-18 18:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kirill A. Shutemov, Andrew Morton, Hugh Dickins,
	Andrea Arcangeli, Dave Hansen, Vlastimil Babka, Johannes Weiner,
	David Rientjes, linux-kernel, linux-mm

On Tue, Aug 18, 2015 at 06:41:13PM +0200, Michal Hocko wrote:
> On Tue 18-08-15 13:20:22, Michal Hocko wrote:
> > On Mon 17-08-15 18:09:05, Kirill A. Shutemov wrote:
> > > Hugh has pointed that compound_head() call can be unsafe in some
> > > context. There's one example:
> > > 
> > > 	CPU0					CPU1
> > > 
> > > isolate_migratepages_block()
> > >   page_count()
> > >     compound_head()
> > >       !!PageTail() == true
> > > 					put_page()
> > > 					  tail->first_page = NULL
> > >       head = tail->first_page
> > > 					alloc_pages(__GFP_COMP)
> > > 					   prep_compound_page()
> > > 					     tail->first_page = head
> > > 					     __SetPageTail(p);
> > >       !!PageTail() == true
> > >     <head == NULL dereferencing>
> > > 
> > > The race is pure theoretical. I don't it's possible to trigger it in
> > > practice. But who knows.
> > > 
> > > We can fix the race by changing how encode PageTail() and compound_head()
> > > within struct page to be able to update them in one shot.
> > > 
> > > The patch introduces page->compound_head into third double word block in
> > > front of compound_dtor and compound_order. That means it shares storage
> > > space with:
> > > 
> > >  - page->lru.next;
> > >  - page->next;
> > >  - page->rcu_head.next;
> > >  - page->pmd_huge_pte;
> > > 
> > > That's too long list to be absolutely sure, but looks like nobody uses
> > > bit 0 of the word. It can be used to encode PageTail(). And if the bit
> > > set, rest of the word is pointer to head page.
> > 
> > I didn't look too closely but the general idea makes sense to me and the
> > overal code simplification is sound. I will give it more detailed review
> > after I sort out other stuff.
> 
> AFICS page::first_page wasn't used outside of compound page logic so you
> should remove it in this patch. The rest looks good to me.

I missed it by accident during rework for v2. Will fix.

> Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 2/4] zsmalloc: use page->private instead of page->first_page
  2015-08-17 15:09   ` Kirill A. Shutemov
@ 2015-08-25  1:49     ` Sergey Senozhatsky
  -1 siblings, 0 replies; 34+ messages in thread
From: Sergey Senozhatsky @ 2015-08-25  1:49 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Kirill A. Shutemov, Andrew Morton, Hugh Dickins,
	Andrea Arcangeli, Dave Hansen, Vlastimil Babka, Johannes Weiner,
	Michal Hocko, David Rientjes, linux-kernel, linux-mm

Add Minchan to the thread.

	-ss

On (08/17/15 18:09), Kirill A. Shutemov wrote:
> We are going to rework how compound_head() work. It will not use
> page->first_page as we have it now.
> 
> The only other user of page->fisrt_page beyond compound pages is
> zsmalloc.
> 
> Let's use page->private instead of page->first_page here. It occupies
> the same storage space.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  mm/zsmalloc.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 0a7f81aa2249..a85754e69879 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -16,7 +16,7 @@
>   * struct page(s) to form a zspage.
>   *
>   * Usage of struct page fields:
> - *	page->first_page: points to the first component (0-order) page
> + *	page->private: points to the first component (0-order) page
>   *	page->index (union with page->freelist): offset of the first object
>   *		starting in this page. For the first page, this is
>   *		always 0, so we use this field (aka freelist) to point
> @@ -26,8 +26,7 @@
>   *
>   *	For _first_ page only:
>   *
> - *	page->private (union with page->first_page): refers to the
> - *		component page after the first page
> + *	page->private: refers to the component page after the first page
>   *		If the page is first_page for huge object, it stores handle.
>   *		Look at size_class->huge.
>   *	page->freelist: points to the first free object in zspage.
> @@ -770,7 +769,7 @@ static struct page *get_first_page(struct page *page)
>  	if (is_first_page(page))
>  		return page;
>  	else
> -		return page->first_page;
> +		return (struct page *)page_private(page);
>  }
>  
>  static struct page *get_next_page(struct page *page)
> @@ -955,7 +954,7 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
>  	 * Allocate individual pages and link them together as:
>  	 * 1. first page->private = first sub-page
>  	 * 2. all sub-pages are linked together using page->lru
> -	 * 3. each sub-page is linked to the first page using page->first_page
> +	 * 3. each sub-page is linked to the first page using page->private
>  	 *
>  	 * For each size class, First/Head pages are linked together using
>  	 * page->lru. Also, we set PG_private to identify the first page
> @@ -980,7 +979,7 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
>  		if (i == 1)
>  			set_page_private(first_page, (unsigned long)page);
>  		if (i >= 1)
> -			page->first_page = first_page;
> +			set_page_private(first_page, (unsigned long)first_page);
>  		if (i >= 2)
>  			list_add(&page->lru, &prev_page->lru);
>  		if (i == class->pages_per_zspage - 1)	/* last page */
> -- 
> 2.5.0
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 2/4] zsmalloc: use page->private instead of page->first_page
@ 2015-08-25  1:49     ` Sergey Senozhatsky
  0 siblings, 0 replies; 34+ messages in thread
From: Sergey Senozhatsky @ 2015-08-25  1:49 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Kirill A. Shutemov, Andrew Morton, Hugh Dickins,
	Andrea Arcangeli, Dave Hansen, Vlastimil Babka, Johannes Weiner,
	Michal Hocko, David Rientjes, linux-kernel, linux-mm

Add Minchan to the thread.

	-ss

On (08/17/15 18:09), Kirill A. Shutemov wrote:
> We are going to rework how compound_head() work. It will not use
> page->first_page as we have it now.
> 
> The only other user of page->fisrt_page beyond compound pages is
> zsmalloc.
> 
> Let's use page->private instead of page->first_page here. It occupies
> the same storage space.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  mm/zsmalloc.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 0a7f81aa2249..a85754e69879 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -16,7 +16,7 @@
>   * struct page(s) to form a zspage.
>   *
>   * Usage of struct page fields:
> - *	page->first_page: points to the first component (0-order) page
> + *	page->private: points to the first component (0-order) page
>   *	page->index (union with page->freelist): offset of the first object
>   *		starting in this page. For the first page, this is
>   *		always 0, so we use this field (aka freelist) to point
> @@ -26,8 +26,7 @@
>   *
>   *	For _first_ page only:
>   *
> - *	page->private (union with page->first_page): refers to the
> - *		component page after the first page
> + *	page->private: refers to the component page after the first page
>   *		If the page is first_page for huge object, it stores handle.
>   *		Look at size_class->huge.
>   *	page->freelist: points to the first free object in zspage.
> @@ -770,7 +769,7 @@ static struct page *get_first_page(struct page *page)
>  	if (is_first_page(page))
>  		return page;
>  	else
> -		return page->first_page;
> +		return (struct page *)page_private(page);
>  }
>  
>  static struct page *get_next_page(struct page *page)
> @@ -955,7 +954,7 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
>  	 * Allocate individual pages and link them together as:
>  	 * 1. first page->private = first sub-page
>  	 * 2. all sub-pages are linked together using page->lru
> -	 * 3. each sub-page is linked to the first page using page->first_page
> +	 * 3. each sub-page is linked to the first page using page->private
>  	 *
>  	 * For each size class, First/Head pages are linked together using
>  	 * page->lru. Also, we set PG_private to identify the first page
> @@ -980,7 +979,7 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
>  		if (i == 1)
>  			set_page_private(first_page, (unsigned long)page);
>  		if (i >= 1)
> -			page->first_page = first_page;
> +			set_page_private(first_page, (unsigned long)first_page);
>  		if (i >= 2)
>  			list_add(&page->lru, &prev_page->lru);
>  		if (i == class->pages_per_zspage - 1)	/* last page */
> -- 
> 2.5.0
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 2/4] zsmalloc: use page->private instead of page->first_page
  2015-08-17 15:09   ` Kirill A. Shutemov
@ 2015-08-25  2:17     ` Sergey Senozhatsky
  -1 siblings, 0 replies; 34+ messages in thread
From: Sergey Senozhatsky @ 2015-08-25  2:17 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Minchan Kim, Hugh Dickins, Andrea Arcangeli,
	Dave Hansen, Vlastimil Babka, Johannes Weiner, Michal Hocko,
	David Rientjes, linux-kernel, linux-mm, Sergey Senozhatsky

On (08/17/15 18:09), Kirill A. Shutemov wrote:
[..]
> @@ -980,7 +979,7 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
>  		if (i == 1)
>  			set_page_private(first_page, (unsigned long)page);
>  		if (i >= 1)
> -			page->first_page = first_page;
> +			set_page_private(first_page, (unsigned long)first_page);

This patch breaks zram/zsmalloc.

Shouldn't it be `page->private = first_page' instead of
`first_page->private = first_page'? IOW:

-	set_page_private(first_page, (unsigned long)first_page);
+	set_page_private(page, (unsigned long)first_page);

?

	-ss

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 2/4] zsmalloc: use page->private instead of page->first_page
@ 2015-08-25  2:17     ` Sergey Senozhatsky
  0 siblings, 0 replies; 34+ messages in thread
From: Sergey Senozhatsky @ 2015-08-25  2:17 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Minchan Kim, Hugh Dickins, Andrea Arcangeli,
	Dave Hansen, Vlastimil Babka, Johannes Weiner, Michal Hocko,
	David Rientjes, linux-kernel, linux-mm, Sergey Senozhatsky

On (08/17/15 18:09), Kirill A. Shutemov wrote:
[..]
> @@ -980,7 +979,7 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
>  		if (i == 1)
>  			set_page_private(first_page, (unsigned long)page);
>  		if (i >= 1)
> -			page->first_page = first_page;
> +			set_page_private(first_page, (unsigned long)first_page);

This patch breaks zram/zsmalloc.

Shouldn't it be `page->private = first_page' instead of
`first_page->private = first_page'? IOW:

-	set_page_private(first_page, (unsigned long)first_page);
+	set_page_private(page, (unsigned long)first_page);

?

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 2/4] zsmalloc: use page->private instead of page->first_page
  2015-08-25  2:17     ` Sergey Senozhatsky
@ 2015-08-25 17:10       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2015-08-25 17:10 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Kirill A. Shutemov, Andrew Morton, Minchan Kim, Hugh Dickins,
	Andrea Arcangeli, Dave Hansen, Vlastimil Babka, Johannes Weiner,
	Michal Hocko, David Rientjes, linux-kernel, linux-mm,
	Sergey Senozhatsky

On Tue, Aug 25, 2015 at 11:17:35AM +0900, Sergey Senozhatsky wrote:
> On (08/17/15 18:09), Kirill A. Shutemov wrote:
> [..]
> > @@ -980,7 +979,7 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
> >  		if (i == 1)
> >  			set_page_private(first_page, (unsigned long)page);
> >  		if (i >= 1)
> > -			page->first_page = first_page;
> > +			set_page_private(first_page, (unsigned long)first_page);
> 
> This patch breaks zram/zsmalloc.
> 
> Shouldn't it be `page->private = first_page' instead of
> `first_page->private = first_page'? IOW:
> 
> -	set_page_private(first_page, (unsigned long)first_page);
> +	set_page_private(page, (unsigned long)first_page);
> 
> ?

Good catch. Thanks!

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2 2/4] zsmalloc: use page->private instead of page->first_page
@ 2015-08-25 17:10       ` Kirill A. Shutemov
  0 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2015-08-25 17:10 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Kirill A. Shutemov, Andrew Morton, Minchan Kim, Hugh Dickins,
	Andrea Arcangeli, Dave Hansen, Vlastimil Babka, Johannes Weiner,
	Michal Hocko, David Rientjes, linux-kernel, linux-mm,
	Sergey Senozhatsky

On Tue, Aug 25, 2015 at 11:17:35AM +0900, Sergey Senozhatsky wrote:
> On (08/17/15 18:09), Kirill A. Shutemov wrote:
> [..]
> > @@ -980,7 +979,7 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
> >  		if (i == 1)
> >  			set_page_private(first_page, (unsigned long)page);
> >  		if (i >= 1)
> > -			page->first_page = first_page;
> > +			set_page_private(first_page, (unsigned long)first_page);
> 
> This patch breaks zram/zsmalloc.
> 
> Shouldn't it be `page->private = first_page' instead of
> `first_page->private = first_page'? IOW:
> 
> -	set_page_private(first_page, (unsigned long)first_page);
> +	set_page_private(page, (unsigned long)first_page);
> 
> ?

Good catch. Thanks!

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2015-08-25 17:10 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-17 15:09 [PATCHv2 0/4] Fix compound_head() race Kirill A. Shutemov
2015-08-17 15:09 ` Kirill A. Shutemov
2015-08-17 15:09 ` [PATCHv2 1/4] mm: drop page->slab_page Kirill A. Shutemov
2015-08-17 15:09   ` Kirill A. Shutemov
2015-08-18  0:43   ` David Rientjes
2015-08-18  0:43     ` David Rientjes
2015-08-17 15:09 ` [PATCHv2 2/4] zsmalloc: use page->private instead of page->first_page Kirill A. Shutemov
2015-08-17 15:09   ` Kirill A. Shutemov
2015-08-25  1:49   ` Sergey Senozhatsky
2015-08-25  1:49     ` Sergey Senozhatsky
2015-08-25  2:17   ` Sergey Senozhatsky
2015-08-25  2:17     ` Sergey Senozhatsky
2015-08-25 17:10     ` Kirill A. Shutemov
2015-08-25 17:10       ` Kirill A. Shutemov
2015-08-17 15:09 ` [PATCHv2 3/4] mm: pack compound_dtor and compound_order into one word in struct page Kirill A. Shutemov
2015-08-17 15:09   ` Kirill A. Shutemov
2015-08-17 22:59   ` Hugh Dickins
2015-08-17 22:59     ` Hugh Dickins
2015-08-18 15:43   ` Michal Hocko
2015-08-18 15:43     ` Michal Hocko
2015-08-18 18:22     ` Kirill A. Shutemov
2015-08-18 18:22       ` Kirill A. Shutemov
2015-08-18 16:05   ` Michal Hocko
2015-08-18 16:05     ` Michal Hocko
2015-08-18 18:22     ` Kirill A. Shutemov
2015-08-18 18:22       ` Kirill A. Shutemov
2015-08-17 15:09 ` [PATCHv2 4/4] mm: make compound_head() robust Kirill A. Shutemov
2015-08-17 15:09   ` Kirill A. Shutemov
2015-08-18 11:20   ` Michal Hocko
2015-08-18 11:20     ` Michal Hocko
2015-08-18 16:41     ` Michal Hocko
2015-08-18 16:41       ` Michal Hocko
2015-08-18 18:24       ` Kirill A. Shutemov
2015-08-18 18:24         ` Kirill A. Shutemov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.