All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/16] Sanitize usage of ->flags and ->mapping for tail pages
@ 2015-03-19 17:08 ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

Currently we take naive approach to page flags on compound -- we set the
flag on the page without consideration if the flag makes sense for tail
page or for compound page in general. This patchset try to sort this out
by defining per-flag policy on what need to be done if page-flag helper
operate on compound page.

The last patch in patchset also sanitize usege of page->mapping for tail
pages. We don't define meaning of page->mapping for tail pages. Currently
it's always NULL, which can be inconsistent with head page and potentially
lead to problems.

For now I catched one case of illigal usage of page flags or ->mapping:
sound subsystem allocates pages with __GFP_COMP and maps them with PTEs.
It leads to setting dirty bit on tail pages and access to tail_page's
->mapping. I don't see any bad behaviour caused by this, but worth fixing
anyway.

This patchset makes more sense if you take my THP refcounting into
account: we will see more compound pages mapped with PTEs and we need to
define behaviour of flags on compound pages to avoid bugs.

Kirill A. Shutemov (16):
  mm: consolidate all page-flags helpers in <linux/page-flags.h>
  page-flags: trivial cleanup for PageTrans* helpers
  page-flags: introduce page flags policies wrt compound pages
  page-flags: define PG_locked behavior on compound pages
  page-flags: define behavior of FS/IO-related flags on compound pages
  page-flags: define behavior of LRU-related flags on compound pages
  page-flags: define behavior SL*B-related flags on compound pages
  page-flags: define behavior of Xen-related flags on compound pages
  page-flags: define PG_reserved behavior on compound pages
  page-flags: define PG_swapbacked behavior on compound pages
  page-flags: define PG_swapcache behavior on compound pages
  page-flags: define PG_mlocked behavior on compound pages
  page-flags: define PG_uncached behavior on compound pages
  page-flags: define PG_uptodate behavior on compound pages
  page-flags: look on head page if the flag is encoded in page->mapping
  mm: sanitize page->mapping for tail pages

 fs/cifs/file.c             |   8 +-
 include/linux/hugetlb.h    |   7 -
 include/linux/ksm.h        |  17 ---
 include/linux/mm.h         | 122 +----------------
 include/linux/page-flags.h | 317 ++++++++++++++++++++++++++++++++++-----------
 include/linux/pagemap.h    |  25 ++--
 include/linux/poison.h     |   4 +
 mm/filemap.c               |  15 ++-
 mm/huge_memory.c           |   2 +-
 mm/ksm.c                   |   2 +-
 mm/memory-failure.c        |   2 +-
 mm/memory.c                |   2 +-
 mm/migrate.c               |   2 +-
 mm/page_alloc.c            |   7 +
 mm/shmem.c                 |   4 +-
 mm/slub.c                  |   2 +
 mm/swap_state.c            |   4 +-
 mm/util.c                  |   5 +-
 mm/vmscan.c                |   4 +-
 mm/zswap.c                 |   4 +-
 20 files changed, 294 insertions(+), 261 deletions(-)

-- 
2.1.4


^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH 00/16] Sanitize usage of ->flags and ->mapping for tail pages
@ 2015-03-19 17:08 ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

Currently we take naive approach to page flags on compound -- we set the
flag on the page without consideration if the flag makes sense for tail
page or for compound page in general. This patchset try to sort this out
by defining per-flag policy on what need to be done if page-flag helper
operate on compound page.

The last patch in patchset also sanitize usege of page->mapping for tail
pages. We don't define meaning of page->mapping for tail pages. Currently
it's always NULL, which can be inconsistent with head page and potentially
lead to problems.

For now I catched one case of illigal usage of page flags or ->mapping:
sound subsystem allocates pages with __GFP_COMP and maps them with PTEs.
It leads to setting dirty bit on tail pages and access to tail_page's
->mapping. I don't see any bad behaviour caused by this, but worth fixing
anyway.

This patchset makes more sense if you take my THP refcounting into
account: we will see more compound pages mapped with PTEs and we need to
define behaviour of flags on compound pages to avoid bugs.

Kirill A. Shutemov (16):
  mm: consolidate all page-flags helpers in <linux/page-flags.h>
  page-flags: trivial cleanup for PageTrans* helpers
  page-flags: introduce page flags policies wrt compound pages
  page-flags: define PG_locked behavior on compound pages
  page-flags: define behavior of FS/IO-related flags on compound pages
  page-flags: define behavior of LRU-related flags on compound pages
  page-flags: define behavior SL*B-related flags on compound pages
  page-flags: define behavior of Xen-related flags on compound pages
  page-flags: define PG_reserved behavior on compound pages
  page-flags: define PG_swapbacked behavior on compound pages
  page-flags: define PG_swapcache behavior on compound pages
  page-flags: define PG_mlocked behavior on compound pages
  page-flags: define PG_uncached behavior on compound pages
  page-flags: define PG_uptodate behavior on compound pages
  page-flags: look on head page if the flag is encoded in page->mapping
  mm: sanitize page->mapping for tail pages

 fs/cifs/file.c             |   8 +-
 include/linux/hugetlb.h    |   7 -
 include/linux/ksm.h        |  17 ---
 include/linux/mm.h         | 122 +----------------
 include/linux/page-flags.h | 317 ++++++++++++++++++++++++++++++++++-----------
 include/linux/pagemap.h    |  25 ++--
 include/linux/poison.h     |   4 +
 mm/filemap.c               |  15 ++-
 mm/huge_memory.c           |   2 +-
 mm/ksm.c                   |   2 +-
 mm/memory-failure.c        |   2 +-
 mm/memory.c                |   2 +-
 mm/migrate.c               |   2 +-
 mm/page_alloc.c            |   7 +
 mm/shmem.c                 |   4 +-
 mm/slub.c                  |   2 +
 mm/swap_state.c            |   4 +-
 mm/util.c                  |   5 +-
 mm/vmscan.c                |   4 +-
 mm/zswap.c                 |   4 +-
 20 files changed, 294 insertions(+), 261 deletions(-)

-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH 01/16] mm: consolidate all page-flags helpers in <linux/page-flags.h>
  2015-03-19 17:08 ` Kirill A. Shutemov
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

We have page-flags helper function declarations/definitions spread over
several header files. Let's consolidate them in <linux/page-flags.h>.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/hugetlb.h    |  7 ----
 include/linux/ksm.h        | 17 --------
 include/linux/mm.h         | 81 --------------------------------------
 include/linux/page-flags.h | 96 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 96 insertions(+), 105 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 7b5785032049..1a782733a420 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -41,8 +41,6 @@ extern int hugetlb_max_hstate __read_mostly;
 struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
 void hugepage_put_subpool(struct hugepage_subpool *spool);
 
-int PageHuge(struct page *page);
-
 void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
 int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
 int hugetlb_overcommit_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
@@ -109,11 +107,6 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 
 #else /* !CONFIG_HUGETLB_PAGE */
 
-static inline int PageHuge(struct page *page)
-{
-	return 0;
-}
-
 static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
 {
 }
diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index 3be6bb18562d..7ae216a39c9e 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -35,18 +35,6 @@ static inline void ksm_exit(struct mm_struct *mm)
 		__ksm_exit(mm);
 }
 
-/*
- * A KSM page is one of those write-protected "shared pages" or "merged pages"
- * which KSM maps into multiple mms, wherever identical anonymous page content
- * is found in VM_MERGEABLE vmas.  It's a PageAnon page, pointing not to any
- * anon_vma, but to that page's node of the stable tree.
- */
-static inline int PageKsm(struct page *page)
-{
-	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
-				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
-}
-
 static inline struct stable_node *page_stable_node(struct page *page)
 {
 	return PageKsm(page) ? page_rmapping(page) : NULL;
@@ -87,11 +75,6 @@ static inline void ksm_exit(struct mm_struct *mm)
 {
 }
 
-static inline int PageKsm(struct page *page)
-{
-	return 0;
-}
-
 #ifdef CONFIG_MMU
 static inline int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
 		unsigned long end, int advice, unsigned long *vm_flags)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6571dd78e984..fb1fc38b01ce 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -494,15 +494,6 @@ static inline int page_count(struct page *page)
 	return atomic_read(&compound_head(page)->_count);
 }
 
-#ifdef CONFIG_HUGETLB_PAGE
-extern int PageHeadHuge(struct page *page_head);
-#else /* CONFIG_HUGETLB_PAGE */
-static inline int PageHeadHuge(struct page *page_head)
-{
-	return 0;
-}
-#endif /* CONFIG_HUGETLB_PAGE */
-
 static inline bool __compound_tail_refcounted(struct page *page)
 {
 	return !PageSlab(page) && !PageHeadHuge(page);
@@ -571,53 +562,6 @@ static inline void init_page_count(struct page *page)
 	atomic_set(&page->_count, 1);
 }
 
-/*
- * PageBuddy() indicate that the page is free and in the buddy system
- * (see mm/page_alloc.c).
- *
- * PAGE_BUDDY_MAPCOUNT_VALUE must be <= -2 but better not too close to
- * -2 so that an underflow of the page_mapcount() won't be mistaken
- * for a genuine PAGE_BUDDY_MAPCOUNT_VALUE. -128 can be created very
- * efficiently by most CPU architectures.
- */
-#define PAGE_BUDDY_MAPCOUNT_VALUE (-128)
-
-static inline int PageBuddy(struct page *page)
-{
-	return atomic_read(&page->_mapcount) == PAGE_BUDDY_MAPCOUNT_VALUE;
-}
-
-static inline void __SetPageBuddy(struct page *page)
-{
-	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
-	atomic_set(&page->_mapcount, PAGE_BUDDY_MAPCOUNT_VALUE);
-}
-
-static inline void __ClearPageBuddy(struct page *page)
-{
-	VM_BUG_ON_PAGE(!PageBuddy(page), page);
-	atomic_set(&page->_mapcount, -1);
-}
-
-#define PAGE_BALLOON_MAPCOUNT_VALUE (-256)
-
-static inline int PageBalloon(struct page *page)
-{
-	return atomic_read(&page->_mapcount) == PAGE_BALLOON_MAPCOUNT_VALUE;
-}
-
-static inline void __SetPageBalloon(struct page *page)
-{
-	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
-	atomic_set(&page->_mapcount, PAGE_BALLOON_MAPCOUNT_VALUE);
-}
-
-static inline void __ClearPageBalloon(struct page *page)
-{
-	VM_BUG_ON_PAGE(!PageBalloon(page), page);
-	atomic_set(&page->_mapcount, -1);
-}
-
 void put_page(struct page *page);
 void put_pages_list(struct list_head *pages);
 
@@ -1006,26 +950,6 @@ void page_address_init(void);
 #define page_address_init()  do { } while(0)
 #endif
 
-/*
- * On an anonymous page mapped into a user virtual memory area,
- * page->mapping points to its anon_vma, not to a struct address_space;
- * with the PAGE_MAPPING_ANON bit set to distinguish it.  See rmap.h.
- *
- * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled,
- * the PAGE_MAPPING_KSM bit may be set along with the PAGE_MAPPING_ANON bit;
- * and then page->mapping points, not to an anon_vma, but to a private
- * structure which KSM associates with that merged page.  See ksm.h.
- *
- * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is currently never used.
- *
- * Please note that, confusingly, "page_mapping" refers to the inode
- * address_space which maps the page from disk; whereas "page_mapped"
- * refers to user virtual address space into which the page is mapped.
- */
-#define PAGE_MAPPING_ANON	1
-#define PAGE_MAPPING_KSM	2
-#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
-
 extern struct address_space *page_mapping(struct page *page);
 
 /* Neutral page->mapping pointer to address_space or anon_vma or other */
@@ -1045,11 +969,6 @@ struct address_space *page_file_mapping(struct page *page)
 	return page->mapping;
 }
 
-static inline int PageAnon(struct page *page)
-{
-	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
-}
-
 /*
  * Return the pagecache index of the passed page.  Regular pagecache pages
  * use ->index whereas swapcache pages use ->private
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index c851ff92d5b3..84d10b65cec6 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -289,6 +289,47 @@ PAGEFLAG_FALSE(HWPoison)
 #define __PG_HWPOISON 0
 #endif
 
+/*
+ * On an anonymous page mapped into a user virtual memory area,
+ * page->mapping points to its anon_vma, not to a struct address_space;
+ * with the PAGE_MAPPING_ANON bit set to distinguish it.  See rmap.h.
+ *
+ * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled,
+ * the PAGE_MAPPING_KSM bit may be set along with the PAGE_MAPPING_ANON bit;
+ * and then page->mapping points, not to an anon_vma, but to a private
+ * structure which KSM associates with that merged page.  See ksm.h.
+ *
+ * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is currently never used.
+ *
+ * Please note that, confusingly, "page_mapping" refers to the inode
+ * address_space which maps the page from disk; whereas "page_mapped"
+ * refers to user virtual address space into which the page is mapped.
+ */
+#define PAGE_MAPPING_ANON	1
+#define PAGE_MAPPING_KSM	2
+#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
+
+static inline int PageAnon(struct page *page)
+{
+	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
+}
+
+#ifdef CONFIG_KSM
+/*
+ * A KSM page is one of those write-protected "shared pages" or "merged pages"
+ * which KSM maps into multiple mms, wherever identical anonymous page content
+ * is found in VM_MERGEABLE vmas.  It's a PageAnon page, pointing not to any
+ * anon_vma, but to that page's node of the stable tree.
+ */
+static inline int PageKsm(struct page *page)
+{
+	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
+				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
+}
+#else
+TESTPAGEFLAG_FALSE(Ksm)
+#endif
+
 u64 stable_page_flags(struct page *page);
 
 static inline int PageUptodate(struct page *page)
@@ -426,6 +467,14 @@ static inline void ClearPageCompound(struct page *page)
 
 #endif /* !PAGEFLAGS_EXTENDED */
 
+#ifdef CONFIG_HUGETLB_PAGE
+int PageHuge(struct page *page);
+int PageHeadHuge(struct page *page);
+#else
+TESTPAGEFLAG_FALSE(Huge)
+TESTPAGEFLAG_FALSE(HeadHuge)
+#endif
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 /*
  * PageHuge() only returns true for hugetlbfs pages, but not for
@@ -480,6 +529,53 @@ static inline int PageTransTail(struct page *page)
 #endif
 
 /*
+ * PageBuddy() indicate that the page is free and in the buddy system
+ * (see mm/page_alloc.c).
+ *
+ * PAGE_BUDDY_MAPCOUNT_VALUE must be <= -2 but better not too close to
+ * -2 so that an underflow of the page_mapcount() won't be mistaken
+ * for a genuine PAGE_BUDDY_MAPCOUNT_VALUE. -128 can be created very
+ * efficiently by most CPU architectures.
+ */
+#define PAGE_BUDDY_MAPCOUNT_VALUE (-128)
+
+static inline int PageBuddy(struct page *page)
+{
+	return atomic_read(&page->_mapcount) == PAGE_BUDDY_MAPCOUNT_VALUE;
+}
+
+static inline void __SetPageBuddy(struct page *page)
+{
+	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
+	atomic_set(&page->_mapcount, PAGE_BUDDY_MAPCOUNT_VALUE);
+}
+
+static inline void __ClearPageBuddy(struct page *page)
+{
+	VM_BUG_ON_PAGE(!PageBuddy(page), page);
+	atomic_set(&page->_mapcount, -1);
+}
+
+#define PAGE_BALLOON_MAPCOUNT_VALUE (-256)
+
+static inline int PageBalloon(struct page *page)
+{
+	return atomic_read(&page->_mapcount) == PAGE_BALLOON_MAPCOUNT_VALUE;
+}
+
+static inline void __SetPageBalloon(struct page *page)
+{
+	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
+	atomic_set(&page->_mapcount, PAGE_BALLOON_MAPCOUNT_VALUE);
+}
+
+static inline void __ClearPageBalloon(struct page *page)
+{
+	VM_BUG_ON_PAGE(!PageBalloon(page), page);
+	atomic_set(&page->_mapcount, -1);
+}
+
+/*
  * If network-based swap is enabled, sl*b must keep track of whether pages
  * were allocated from pfmemalloc reserves.
  */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 01/16] mm: consolidate all page-flags helpers in <linux/page-flags.h>
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

We have page-flags helper function declarations/definitions spread over
several header files. Let's consolidate them in <linux/page-flags.h>.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/hugetlb.h    |  7 ----
 include/linux/ksm.h        | 17 --------
 include/linux/mm.h         | 81 --------------------------------------
 include/linux/page-flags.h | 96 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 96 insertions(+), 105 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 7b5785032049..1a782733a420 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -41,8 +41,6 @@ extern int hugetlb_max_hstate __read_mostly;
 struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
 void hugepage_put_subpool(struct hugepage_subpool *spool);
 
-int PageHuge(struct page *page);
-
 void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
 int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
 int hugetlb_overcommit_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
@@ -109,11 +107,6 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 
 #else /* !CONFIG_HUGETLB_PAGE */
 
-static inline int PageHuge(struct page *page)
-{
-	return 0;
-}
-
 static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
 {
 }
diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index 3be6bb18562d..7ae216a39c9e 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -35,18 +35,6 @@ static inline void ksm_exit(struct mm_struct *mm)
 		__ksm_exit(mm);
 }
 
-/*
- * A KSM page is one of those write-protected "shared pages" or "merged pages"
- * which KSM maps into multiple mms, wherever identical anonymous page content
- * is found in VM_MERGEABLE vmas.  It's a PageAnon page, pointing not to any
- * anon_vma, but to that page's node of the stable tree.
- */
-static inline int PageKsm(struct page *page)
-{
-	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
-				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
-}
-
 static inline struct stable_node *page_stable_node(struct page *page)
 {
 	return PageKsm(page) ? page_rmapping(page) : NULL;
@@ -87,11 +75,6 @@ static inline void ksm_exit(struct mm_struct *mm)
 {
 }
 
-static inline int PageKsm(struct page *page)
-{
-	return 0;
-}
-
 #ifdef CONFIG_MMU
 static inline int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
 		unsigned long end, int advice, unsigned long *vm_flags)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6571dd78e984..fb1fc38b01ce 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -494,15 +494,6 @@ static inline int page_count(struct page *page)
 	return atomic_read(&compound_head(page)->_count);
 }
 
-#ifdef CONFIG_HUGETLB_PAGE
-extern int PageHeadHuge(struct page *page_head);
-#else /* CONFIG_HUGETLB_PAGE */
-static inline int PageHeadHuge(struct page *page_head)
-{
-	return 0;
-}
-#endif /* CONFIG_HUGETLB_PAGE */
-
 static inline bool __compound_tail_refcounted(struct page *page)
 {
 	return !PageSlab(page) && !PageHeadHuge(page);
@@ -571,53 +562,6 @@ static inline void init_page_count(struct page *page)
 	atomic_set(&page->_count, 1);
 }
 
-/*
- * PageBuddy() indicate that the page is free and in the buddy system
- * (see mm/page_alloc.c).
- *
- * PAGE_BUDDY_MAPCOUNT_VALUE must be <= -2 but better not too close to
- * -2 so that an underflow of the page_mapcount() won't be mistaken
- * for a genuine PAGE_BUDDY_MAPCOUNT_VALUE. -128 can be created very
- * efficiently by most CPU architectures.
- */
-#define PAGE_BUDDY_MAPCOUNT_VALUE (-128)
-
-static inline int PageBuddy(struct page *page)
-{
-	return atomic_read(&page->_mapcount) == PAGE_BUDDY_MAPCOUNT_VALUE;
-}
-
-static inline void __SetPageBuddy(struct page *page)
-{
-	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
-	atomic_set(&page->_mapcount, PAGE_BUDDY_MAPCOUNT_VALUE);
-}
-
-static inline void __ClearPageBuddy(struct page *page)
-{
-	VM_BUG_ON_PAGE(!PageBuddy(page), page);
-	atomic_set(&page->_mapcount, -1);
-}
-
-#define PAGE_BALLOON_MAPCOUNT_VALUE (-256)
-
-static inline int PageBalloon(struct page *page)
-{
-	return atomic_read(&page->_mapcount) == PAGE_BALLOON_MAPCOUNT_VALUE;
-}
-
-static inline void __SetPageBalloon(struct page *page)
-{
-	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
-	atomic_set(&page->_mapcount, PAGE_BALLOON_MAPCOUNT_VALUE);
-}
-
-static inline void __ClearPageBalloon(struct page *page)
-{
-	VM_BUG_ON_PAGE(!PageBalloon(page), page);
-	atomic_set(&page->_mapcount, -1);
-}
-
 void put_page(struct page *page);
 void put_pages_list(struct list_head *pages);
 
@@ -1006,26 +950,6 @@ void page_address_init(void);
 #define page_address_init()  do { } while(0)
 #endif
 
-/*
- * On an anonymous page mapped into a user virtual memory area,
- * page->mapping points to its anon_vma, not to a struct address_space;
- * with the PAGE_MAPPING_ANON bit set to distinguish it.  See rmap.h.
- *
- * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled,
- * the PAGE_MAPPING_KSM bit may be set along with the PAGE_MAPPING_ANON bit;
- * and then page->mapping points, not to an anon_vma, but to a private
- * structure which KSM associates with that merged page.  See ksm.h.
- *
- * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is currently never used.
- *
- * Please note that, confusingly, "page_mapping" refers to the inode
- * address_space which maps the page from disk; whereas "page_mapped"
- * refers to user virtual address space into which the page is mapped.
- */
-#define PAGE_MAPPING_ANON	1
-#define PAGE_MAPPING_KSM	2
-#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
-
 extern struct address_space *page_mapping(struct page *page);
 
 /* Neutral page->mapping pointer to address_space or anon_vma or other */
@@ -1045,11 +969,6 @@ struct address_space *page_file_mapping(struct page *page)
 	return page->mapping;
 }
 
-static inline int PageAnon(struct page *page)
-{
-	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
-}
-
 /*
  * Return the pagecache index of the passed page.  Regular pagecache pages
  * use ->index whereas swapcache pages use ->private
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index c851ff92d5b3..84d10b65cec6 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -289,6 +289,47 @@ PAGEFLAG_FALSE(HWPoison)
 #define __PG_HWPOISON 0
 #endif
 
+/*
+ * On an anonymous page mapped into a user virtual memory area,
+ * page->mapping points to its anon_vma, not to a struct address_space;
+ * with the PAGE_MAPPING_ANON bit set to distinguish it.  See rmap.h.
+ *
+ * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled,
+ * the PAGE_MAPPING_KSM bit may be set along with the PAGE_MAPPING_ANON bit;
+ * and then page->mapping points, not to an anon_vma, but to a private
+ * structure which KSM associates with that merged page.  See ksm.h.
+ *
+ * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is currently never used.
+ *
+ * Please note that, confusingly, "page_mapping" refers to the inode
+ * address_space which maps the page from disk; whereas "page_mapped"
+ * refers to user virtual address space into which the page is mapped.
+ */
+#define PAGE_MAPPING_ANON	1
+#define PAGE_MAPPING_KSM	2
+#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
+
+static inline int PageAnon(struct page *page)
+{
+	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
+}
+
+#ifdef CONFIG_KSM
+/*
+ * A KSM page is one of those write-protected "shared pages" or "merged pages"
+ * which KSM maps into multiple mms, wherever identical anonymous page content
+ * is found in VM_MERGEABLE vmas.  It's a PageAnon page, pointing not to any
+ * anon_vma, but to that page's node of the stable tree.
+ */
+static inline int PageKsm(struct page *page)
+{
+	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
+				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
+}
+#else
+TESTPAGEFLAG_FALSE(Ksm)
+#endif
+
 u64 stable_page_flags(struct page *page);
 
 static inline int PageUptodate(struct page *page)
@@ -426,6 +467,14 @@ static inline void ClearPageCompound(struct page *page)
 
 #endif /* !PAGEFLAGS_EXTENDED */
 
+#ifdef CONFIG_HUGETLB_PAGE
+int PageHuge(struct page *page);
+int PageHeadHuge(struct page *page);
+#else
+TESTPAGEFLAG_FALSE(Huge)
+TESTPAGEFLAG_FALSE(HeadHuge)
+#endif
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 /*
  * PageHuge() only returns true for hugetlbfs pages, but not for
@@ -480,6 +529,53 @@ static inline int PageTransTail(struct page *page)
 #endif
 
 /*
+ * PageBuddy() indicate that the page is free and in the buddy system
+ * (see mm/page_alloc.c).
+ *
+ * PAGE_BUDDY_MAPCOUNT_VALUE must be <= -2 but better not too close to
+ * -2 so that an underflow of the page_mapcount() won't be mistaken
+ * for a genuine PAGE_BUDDY_MAPCOUNT_VALUE. -128 can be created very
+ * efficiently by most CPU architectures.
+ */
+#define PAGE_BUDDY_MAPCOUNT_VALUE (-128)
+
+static inline int PageBuddy(struct page *page)
+{
+	return atomic_read(&page->_mapcount) == PAGE_BUDDY_MAPCOUNT_VALUE;
+}
+
+static inline void __SetPageBuddy(struct page *page)
+{
+	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
+	atomic_set(&page->_mapcount, PAGE_BUDDY_MAPCOUNT_VALUE);
+}
+
+static inline void __ClearPageBuddy(struct page *page)
+{
+	VM_BUG_ON_PAGE(!PageBuddy(page), page);
+	atomic_set(&page->_mapcount, -1);
+}
+
+#define PAGE_BALLOON_MAPCOUNT_VALUE (-256)
+
+static inline int PageBalloon(struct page *page)
+{
+	return atomic_read(&page->_mapcount) == PAGE_BALLOON_MAPCOUNT_VALUE;
+}
+
+static inline void __SetPageBalloon(struct page *page)
+{
+	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
+	atomic_set(&page->_mapcount, PAGE_BALLOON_MAPCOUNT_VALUE);
+}
+
+static inline void __ClearPageBalloon(struct page *page)
+{
+	VM_BUG_ON_PAGE(!PageBalloon(page), page);
+	atomic_set(&page->_mapcount, -1);
+}
+
+/*
  * If network-based swap is enabled, sl*b must keep track of whether pages
  * were allocated from pfmemalloc reserves.
  */
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 02/16] page-flags: trivial cleanup for PageTrans* helpers
  2015-03-19 17:08 ` Kirill A. Shutemov
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

Use TESTPAGEFLAG_FALSE() to get it a bit cleaner.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 18 +++---------------
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 84d10b65cec6..327aabd9792e 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -511,21 +511,9 @@ static inline int PageTransTail(struct page *page)
 }
 
 #else
-
-static inline int PageTransHuge(struct page *page)
-{
-	return 0;
-}
-
-static inline int PageTransCompound(struct page *page)
-{
-	return 0;
-}
-
-static inline int PageTransTail(struct page *page)
-{
-	return 0;
-}
+TESTPAGEFLAG_FALSE(TransHuge)
+TESTPAGEFLAG_FALSE(TransCompound)
+TESTPAGEFLAG_FALSE(TransTail)
 #endif
 
 /*
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 02/16] page-flags: trivial cleanup for PageTrans* helpers
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

Use TESTPAGEFLAG_FALSE() to get it a bit cleaner.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 18 +++---------------
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 84d10b65cec6..327aabd9792e 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -511,21 +511,9 @@ static inline int PageTransTail(struct page *page)
 }
 
 #else
-
-static inline int PageTransHuge(struct page *page)
-{
-	return 0;
-}
-
-static inline int PageTransCompound(struct page *page)
-{
-	return 0;
-}
-
-static inline int PageTransTail(struct page *page)
-{
-	return 0;
-}
+TESTPAGEFLAG_FALSE(TransHuge)
+TESTPAGEFLAG_FALSE(TransCompound)
+TESTPAGEFLAG_FALSE(TransTail)
 #endif
 
 /*
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 03/16] page-flags: introduce page flags policies wrt compound pages
  2015-03-19 17:08 ` Kirill A. Shutemov
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

This patch third argument to macros which create function definitions
for page flags. This arguments defines how page-flags helpers behave
on compound functions.

For now we define four policies:

 - ANY: the helper function operates on the page it gets, regardless if
   it's non-compound, head or tail.

 - HEAD: the helper function operates on the head page of the compound
   page if it gets tail page.

 - NO_TAIL: only head and non-compond pages are acceptable for this
   helper function.

 - NO_COMPOUND: only non-compound pages are acceptable for this helper
   function.

For now we use policy ANY for all helpers, which match current
behaviour.

We do not enforce the policy for TESTPAGEFLAG, because we have flags
checked for random pages all over the kernel. Noticeable exception to
this is PageTransHuge() which triggers VM_BUG_ON() for tail page.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm.h         |  40 ---------
 include/linux/page-flags.h | 198 ++++++++++++++++++++++++++++++---------------
 2 files changed, 134 insertions(+), 104 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index fb1fc38b01ce..bcf37dacbee3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -433,46 +433,6 @@ static inline void compound_unlock_irqrestore(struct page *page,
 #endif
 }
 
-static inline struct page *compound_head_by_tail(struct page *tail)
-{
-	struct page *head = tail->first_page;
-
-	/*
-	 * page->first_page may be a dangling pointer to an old
-	 * compound page, so recheck that it is still a tail
-	 * page before returning.
-	 */
-	smp_rmb();
-	if (likely(PageTail(tail)))
-		return head;
-	return tail;
-}
-
-/*
- * Since either compound page could be dismantled asynchronously in THP
- * or we access asynchronously arbitrary positioned struct page, there
- * would be tail flag race. To handle this race, we should call
- * smp_rmb() before checking tail flag. compound_head_by_tail() did it.
- */
-static inline struct page *compound_head(struct page *page)
-{
-	if (unlikely(PageTail(page)))
-		return compound_head_by_tail(page);
-	return page;
-}
-
-/*
- * If we access compound page synchronously such as access to
- * allocated page, there is no need to handle tail flag race, so we can
- * check tail flag directly without any synchronization primitive.
- */
-static inline struct page *compound_head_fast(struct page *page)
-{
-	if (unlikely(PageTail(page)))
-		return page->first_page;
-	return page;
-}
-
 /*
  * The atomic page->_mapcount, starts from -1: so that transitions
  * both from it and to it can be tracked, using atomic_inc_and_test
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 327aabd9792e..32ea62c0ad30 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -134,49 +134,68 @@ enum pageflags {
 
 #ifndef __GENERATING_BOUNDS_H
 
+/* Page flags policies wrt compound pages */
+#define ANY(page, enforce)	page
+#define HEAD(page, enforce)	compound_head(page)
+#define NO_TAIL(page, enforce) ({					\
+		if (enforce)						\
+			VM_BUG_ON_PAGE(PageTail(page), page);		\
+		else							\
+			page = compound_head(page);			\
+		page;})
+#define NO_COMPOUND(page, enforce) ({					\
+		if (enforce)						\
+			VM_BUG_ON_PAGE(PageCompound(page), page);	\
+		page;})
+
 /*
  * Macros to create function definitions for page flags
  */
-#define TESTPAGEFLAG(uname, lname)					\
-static inline int Page##uname(const struct page *page)			\
-			{ return test_bit(PG_##lname, &page->flags); }
+#define TESTPAGEFLAG(uname, lname, policy)				\
+static inline int Page##uname(struct page *page)			\
+	{ return test_bit(PG_##lname, &policy(page, 0)->flags); }
 
-#define SETPAGEFLAG(uname, lname)					\
+#define SETPAGEFLAG(uname, lname, policy)				\
 static inline void SetPage##uname(struct page *page)			\
-			{ set_bit(PG_##lname, &page->flags); }
+	{ set_bit(PG_##lname, &policy(page, 1)->flags); }
 
-#define CLEARPAGEFLAG(uname, lname)					\
+#define CLEARPAGEFLAG(uname, lname, policy)				\
 static inline void ClearPage##uname(struct page *page)			\
-			{ clear_bit(PG_##lname, &page->flags); }
+	{ clear_bit(PG_##lname, &policy(page, 1)->flags); }
 
-#define __SETPAGEFLAG(uname, lname)					\
+#define __SETPAGEFLAG(uname, lname, policy)				\
 static inline void __SetPage##uname(struct page *page)			\
-			{ __set_bit(PG_##lname, &page->flags); }
+	{ __set_bit(PG_##lname, &policy(page, 1)->flags); }
 
-#define __CLEARPAGEFLAG(uname, lname)					\
+#define __CLEARPAGEFLAG(uname, lname, policy)				\
 static inline void __ClearPage##uname(struct page *page)		\
-			{ __clear_bit(PG_##lname, &page->flags); }
+	{ __clear_bit(PG_##lname, &policy(page, 1)->flags); }
 
-#define TESTSETFLAG(uname, lname)					\
+#define TESTSETFLAG(uname, lname, policy)				\
 static inline int TestSetPage##uname(struct page *page)			\
-		{ return test_and_set_bit(PG_##lname, &page->flags); }
+	{ return test_and_set_bit(PG_##lname, &policy(page, 1)->flags); }
 
-#define TESTCLEARFLAG(uname, lname)					\
+#define TESTCLEARFLAG(uname, lname, policy)				\
 static inline int TestClearPage##uname(struct page *page)		\
-		{ return test_and_clear_bit(PG_##lname, &page->flags); }
+	{ return test_and_clear_bit(PG_##lname, &policy(page, 1)->flags); }
 
-#define __TESTCLEARFLAG(uname, lname)					\
+#define __TESTCLEARFLAG(uname, lname, policy)				\
 static inline int __TestClearPage##uname(struct page *page)		\
-		{ return __test_and_clear_bit(PG_##lname, &page->flags); }
+	{ return __test_and_clear_bit(PG_##lname, &policy(page, 1)->flags); }
 
-#define PAGEFLAG(uname, lname) TESTPAGEFLAG(uname, lname)		\
-	SETPAGEFLAG(uname, lname) CLEARPAGEFLAG(uname, lname)
+#define PAGEFLAG(uname, lname, policy)					\
+	TESTPAGEFLAG(uname, lname, policy)				\
+	SETPAGEFLAG(uname, lname, policy)				\
+	CLEARPAGEFLAG(uname, lname, policy)
 
-#define __PAGEFLAG(uname, lname) TESTPAGEFLAG(uname, lname)		\
-	__SETPAGEFLAG(uname, lname)  __CLEARPAGEFLAG(uname, lname)
+#define __PAGEFLAG(uname, lname, policy)				\
+	TESTPAGEFLAG(uname, lname, policy)				\
+	__SETPAGEFLAG(uname, lname, policy)				\
+	__CLEARPAGEFLAG(uname, lname, policy)
 
-#define TESTSCFLAG(uname, lname)					\
-	TESTSETFLAG(uname, lname) TESTCLEARFLAG(uname, lname)
+#define TESTSCFLAG(uname, lname, policy)				\
+	TESTSETFLAG(uname, lname, policy)				\
+	TESTCLEARFLAG(uname, lname, policy)
 
 #define TESTPAGEFLAG_FALSE(uname)					\
 static inline int Page##uname(const struct page *page) { return 0; }
@@ -205,47 +224,93 @@ static inline int __TestClearPage##uname(struct page *page) { return 0; }
 #define TESTSCFLAG_FALSE(uname)						\
 	TESTSETFLAG_FALSE(uname) TESTCLEARFLAG_FALSE(uname)
 
-struct page;	/* forward declaration */
-
-TESTPAGEFLAG(Locked, locked)
-PAGEFLAG(Error, error) TESTCLEARFLAG(Error, error)
-PAGEFLAG(Referenced, referenced) TESTCLEARFLAG(Referenced, referenced)
-	__SETPAGEFLAG(Referenced, referenced)
-PAGEFLAG(Dirty, dirty) TESTSCFLAG(Dirty, dirty) __CLEARPAGEFLAG(Dirty, dirty)
-PAGEFLAG(LRU, lru) __CLEARPAGEFLAG(LRU, lru)
-PAGEFLAG(Active, active) __CLEARPAGEFLAG(Active, active)
-	TESTCLEARFLAG(Active, active)
-__PAGEFLAG(Slab, slab)
-PAGEFLAG(Checked, checked)		/* Used by some filesystems */
-PAGEFLAG(Pinned, pinned) TESTSCFLAG(Pinned, pinned)	/* Xen */
-PAGEFLAG(SavePinned, savepinned);			/* Xen */
-PAGEFLAG(Foreign, foreign);				/* Xen */
-PAGEFLAG(Reserved, reserved) __CLEARPAGEFLAG(Reserved, reserved)
-PAGEFLAG(SwapBacked, swapbacked) __CLEARPAGEFLAG(SwapBacked, swapbacked)
-	__SETPAGEFLAG(SwapBacked, swapbacked)
-
-__PAGEFLAG(SlobFree, slob_free)
+/* Forward declarations */
+struct page;
+static inline int PageCompound(struct page *page);
+static inline int PageTail(struct page *page);
+
+static inline struct page *compound_head_by_tail(struct page *tail)
+{
+	struct page *head = tail->first_page;
+
+	/*
+	 * page->first_page may be a dangling pointer to an old
+	 * compound page, so recheck that it is still a tail
+	 * page before returning.
+	 */
+	smp_rmb();
+	if (likely(PageTail(tail)))
+		return head;
+	return tail;
+}
+
+/*
+ * Since either compound page could be dismantled asynchronously in THP
+ * or we access asynchronously arbitrary positioned struct page, there
+ * would be tail flag race. To handle this race, we should call
+ * smp_rmb() before checking tail flag. compound_head_by_tail() did it.
+ */
+static inline struct page *compound_head(struct page *page)
+{
+	if (unlikely(PageTail(page)))
+		return compound_head_by_tail(page);
+	return page;
+}
+
+/*
+ * If we access compound page synchronously such as access to
+ * allocated page, there is no need to handle tail flag race, so we can
+ * check tail flag directly without any synchronization primitive.
+ */
+static inline struct page *compound_head_fast(struct page *page)
+{
+	if (unlikely(PageTail(page)))
+		return page->first_page;
+	return page;
+}
+
+TESTPAGEFLAG(Locked, locked, ANY)
+PAGEFLAG(Error, error, ANY) TESTCLEARFLAG(Error, error, ANY)
+PAGEFLAG(Referenced, referenced, ANY) TESTCLEARFLAG(Referenced, referenced, ANY)
+	__SETPAGEFLAG(Referenced, referenced, ANY)
+PAGEFLAG(Dirty, dirty, ANY) TESTSCFLAG(Dirty, dirty, ANY)
+	__CLEARPAGEFLAG(Dirty, dirty, ANY)
+PAGEFLAG(LRU, lru, ANY) __CLEARPAGEFLAG(LRU, lru, ANY)
+PAGEFLAG(Active, active, ANY) __CLEARPAGEFLAG(Active, active, ANY)
+	TESTCLEARFLAG(Active, active, ANY)
+__PAGEFLAG(Slab, slab, ANY)
+PAGEFLAG(Checked, checked, ANY)		/* Used by some filesystems */
+PAGEFLAG(Pinned, pinned, ANY) TESTSCFLAG(Pinned, pinned, ANY)	/* Xen */
+PAGEFLAG(SavePinned, savepinned, ANY);			/* Xen */
+PAGEFLAG(Foreign, foreign, ANY);				/* Xen */
+PAGEFLAG(Reserved, reserved, ANY) __CLEARPAGEFLAG(Reserved, reserved, ANY)
+PAGEFLAG(SwapBacked, swapbacked, ANY)
+	__CLEARPAGEFLAG(SwapBacked, swapbacked, ANY)
+	__SETPAGEFLAG(SwapBacked, swapbacked, ANY)
+
+__PAGEFLAG(SlobFree, slob_free, ANY)
 
 /*
  * Private page markings that may be used by the filesystem that owns the page
  * for its own purposes.
  * - PG_private and PG_private_2 cause releasepage() and co to be invoked
  */
-PAGEFLAG(Private, private) __SETPAGEFLAG(Private, private)
-	__CLEARPAGEFLAG(Private, private)
-PAGEFLAG(Private2, private_2) TESTSCFLAG(Private2, private_2)
-PAGEFLAG(OwnerPriv1, owner_priv_1) TESTCLEARFLAG(OwnerPriv1, owner_priv_1)
+PAGEFLAG(Private, private, ANY) __SETPAGEFLAG(Private, private, ANY)
+	__CLEARPAGEFLAG(Private, private, ANY)
+PAGEFLAG(Private2, private_2, ANY) TESTSCFLAG(Private2, private_2, ANY)
+PAGEFLAG(OwnerPriv1, owner_priv_1, ANY)
+	TESTCLEARFLAG(OwnerPriv1, owner_priv_1, ANY)
 
 /*
  * Only test-and-set exist for PG_writeback.  The unconditional operators are
  * risky: they bypass page accounting.
  */
-TESTPAGEFLAG(Writeback, writeback) TESTSCFLAG(Writeback, writeback)
-PAGEFLAG(MappedToDisk, mappedtodisk)
+TESTPAGEFLAG(Writeback, writeback, ANY) TESTSCFLAG(Writeback, writeback, ANY)
+PAGEFLAG(MappedToDisk, mappedtodisk, ANY)
 
 /* PG_readahead is only used for reads; PG_reclaim is only for writes */
-PAGEFLAG(Reclaim, reclaim) TESTCLEARFLAG(Reclaim, reclaim)
-PAGEFLAG(Readahead, reclaim) TESTCLEARFLAG(Readahead, reclaim)
+PAGEFLAG(Reclaim, reclaim, ANY) TESTCLEARFLAG(Reclaim, reclaim, ANY)
+PAGEFLAG(Readahead, reclaim, ANY) TESTCLEARFLAG(Readahead, reclaim, ANY)
 
 #ifdef CONFIG_HIGHMEM
 /*
@@ -258,31 +323,32 @@ PAGEFLAG_FALSE(HighMem)
 #endif
 
 #ifdef CONFIG_SWAP
-PAGEFLAG(SwapCache, swapcache)
+PAGEFLAG(SwapCache, swapcache, ANY)
 #else
 PAGEFLAG_FALSE(SwapCache)
 #endif
 
-PAGEFLAG(Unevictable, unevictable) __CLEARPAGEFLAG(Unevictable, unevictable)
-	TESTCLEARFLAG(Unevictable, unevictable)
+PAGEFLAG(Unevictable, unevictable, ANY)
+	__CLEARPAGEFLAG(Unevictable, unevictable, ANY)
+	TESTCLEARFLAG(Unevictable, unevictable, ANY)
 
 #ifdef CONFIG_MMU
-PAGEFLAG(Mlocked, mlocked) __CLEARPAGEFLAG(Mlocked, mlocked)
-	TESTSCFLAG(Mlocked, mlocked) __TESTCLEARFLAG(Mlocked, mlocked)
+PAGEFLAG(Mlocked, mlocked, ANY) __CLEARPAGEFLAG(Mlocked, mlocked, ANY)
+	TESTSCFLAG(Mlocked, mlocked, ANY) __TESTCLEARFLAG(Mlocked, mlocked, ANY)
 #else
 PAGEFLAG_FALSE(Mlocked) __CLEARPAGEFLAG_NOOP(Mlocked)
 	TESTSCFLAG_FALSE(Mlocked) __TESTCLEARFLAG_FALSE(Mlocked)
 #endif
 
 #ifdef CONFIG_ARCH_USES_PG_UNCACHED
-PAGEFLAG(Uncached, uncached)
+PAGEFLAG(Uncached, uncached, ANY)
 #else
 PAGEFLAG_FALSE(Uncached)
 #endif
 
 #ifdef CONFIG_MEMORY_FAILURE
-PAGEFLAG(HWPoison, hwpoison)
-TESTSCFLAG(HWPoison, hwpoison)
+PAGEFLAG(HWPoison, hwpoison, ANY)
+TESTSCFLAG(HWPoison, hwpoison, ANY)
 #define __PG_HWPOISON (1UL << PG_hwpoison)
 #else
 PAGEFLAG_FALSE(HWPoison)
@@ -367,7 +433,7 @@ static inline void SetPageUptodate(struct page *page)
 	set_bit(PG_uptodate, &(page)->flags);
 }
 
-CLEARPAGEFLAG(Uptodate, uptodate)
+CLEARPAGEFLAG(Uptodate, uptodate, ANY)
 
 int test_clear_page_writeback(struct page *page);
 int __test_set_page_writeback(struct page *page, bool keep_write);
@@ -396,8 +462,8 @@ static inline void set_page_writeback_keepwrite(struct page *page)
  * and arch/powerpc/kvm/book3s_64_vio_hv.c which use it to detect huge pages
  * and avoid handling those in real mode.
  */
-__PAGEFLAG(Head, head) CLEARPAGEFLAG(Head, head)
-__PAGEFLAG(Tail, tail)
+__PAGEFLAG(Head, head, ANY) CLEARPAGEFLAG(Head, head, ANY)
+__PAGEFLAG(Tail, tail, ANY)
 
 static inline int PageCompound(struct page *page)
 {
@@ -421,8 +487,8 @@ static inline void ClearPageCompound(struct page *page)
  * because PageCompound is always set for compound pages and not for
  * pages on the LRU and/or pagecache.
  */
-TESTPAGEFLAG(Compound, compound)
-__SETPAGEFLAG(Head, compound)  __CLEARPAGEFLAG(Head, compound)
+TESTPAGEFLAG(Compound, compound, ANY)
+__SETPAGEFLAG(Head, compound, ANY)  __CLEARPAGEFLAG(Head, compound, ANY)
 
 /*
  * PG_reclaim is used in combination with PG_compound to mark the
@@ -636,6 +702,10 @@ static inline int page_has_private(struct page *page)
 	return !!(page->flags & PAGE_FLAGS_PRIVATE);
 }
 
+#undef ANY
+#undef HEAD
+#undef NO_TAIL
+#undef NO_COMPOUND
 #endif /* !__GENERATING_BOUNDS_H */
 
 #endif	/* PAGE_FLAGS_H */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 03/16] page-flags: introduce page flags policies wrt compound pages
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

This patch third argument to macros which create function definitions
for page flags. This arguments defines how page-flags helpers behave
on compound functions.

For now we define four policies:

 - ANY: the helper function operates on the page it gets, regardless if
   it's non-compound, head or tail.

 - HEAD: the helper function operates on the head page of the compound
   page if it gets tail page.

 - NO_TAIL: only head and non-compond pages are acceptable for this
   helper function.

 - NO_COMPOUND: only non-compound pages are acceptable for this helper
   function.

For now we use policy ANY for all helpers, which match current
behaviour.

We do not enforce the policy for TESTPAGEFLAG, because we have flags
checked for random pages all over the kernel. Noticeable exception to
this is PageTransHuge() which triggers VM_BUG_ON() for tail page.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm.h         |  40 ---------
 include/linux/page-flags.h | 198 ++++++++++++++++++++++++++++++---------------
 2 files changed, 134 insertions(+), 104 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index fb1fc38b01ce..bcf37dacbee3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -433,46 +433,6 @@ static inline void compound_unlock_irqrestore(struct page *page,
 #endif
 }
 
-static inline struct page *compound_head_by_tail(struct page *tail)
-{
-	struct page *head = tail->first_page;
-
-	/*
-	 * page->first_page may be a dangling pointer to an old
-	 * compound page, so recheck that it is still a tail
-	 * page before returning.
-	 */
-	smp_rmb();
-	if (likely(PageTail(tail)))
-		return head;
-	return tail;
-}
-
-/*
- * Since either compound page could be dismantled asynchronously in THP
- * or we access asynchronously arbitrary positioned struct page, there
- * would be tail flag race. To handle this race, we should call
- * smp_rmb() before checking tail flag. compound_head_by_tail() did it.
- */
-static inline struct page *compound_head(struct page *page)
-{
-	if (unlikely(PageTail(page)))
-		return compound_head_by_tail(page);
-	return page;
-}
-
-/*
- * If we access compound page synchronously such as access to
- * allocated page, there is no need to handle tail flag race, so we can
- * check tail flag directly without any synchronization primitive.
- */
-static inline struct page *compound_head_fast(struct page *page)
-{
-	if (unlikely(PageTail(page)))
-		return page->first_page;
-	return page;
-}
-
 /*
  * The atomic page->_mapcount, starts from -1: so that transitions
  * both from it and to it can be tracked, using atomic_inc_and_test
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 327aabd9792e..32ea62c0ad30 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -134,49 +134,68 @@ enum pageflags {
 
 #ifndef __GENERATING_BOUNDS_H
 
+/* Page flags policies wrt compound pages */
+#define ANY(page, enforce)	page
+#define HEAD(page, enforce)	compound_head(page)
+#define NO_TAIL(page, enforce) ({					\
+		if (enforce)						\
+			VM_BUG_ON_PAGE(PageTail(page), page);		\
+		else							\
+			page = compound_head(page);			\
+		page;})
+#define NO_COMPOUND(page, enforce) ({					\
+		if (enforce)						\
+			VM_BUG_ON_PAGE(PageCompound(page), page);	\
+		page;})
+
 /*
  * Macros to create function definitions for page flags
  */
-#define TESTPAGEFLAG(uname, lname)					\
-static inline int Page##uname(const struct page *page)			\
-			{ return test_bit(PG_##lname, &page->flags); }
+#define TESTPAGEFLAG(uname, lname, policy)				\
+static inline int Page##uname(struct page *page)			\
+	{ return test_bit(PG_##lname, &policy(page, 0)->flags); }
 
-#define SETPAGEFLAG(uname, lname)					\
+#define SETPAGEFLAG(uname, lname, policy)				\
 static inline void SetPage##uname(struct page *page)			\
-			{ set_bit(PG_##lname, &page->flags); }
+	{ set_bit(PG_##lname, &policy(page, 1)->flags); }
 
-#define CLEARPAGEFLAG(uname, lname)					\
+#define CLEARPAGEFLAG(uname, lname, policy)				\
 static inline void ClearPage##uname(struct page *page)			\
-			{ clear_bit(PG_##lname, &page->flags); }
+	{ clear_bit(PG_##lname, &policy(page, 1)->flags); }
 
-#define __SETPAGEFLAG(uname, lname)					\
+#define __SETPAGEFLAG(uname, lname, policy)				\
 static inline void __SetPage##uname(struct page *page)			\
-			{ __set_bit(PG_##lname, &page->flags); }
+	{ __set_bit(PG_##lname, &policy(page, 1)->flags); }
 
-#define __CLEARPAGEFLAG(uname, lname)					\
+#define __CLEARPAGEFLAG(uname, lname, policy)				\
 static inline void __ClearPage##uname(struct page *page)		\
-			{ __clear_bit(PG_##lname, &page->flags); }
+	{ __clear_bit(PG_##lname, &policy(page, 1)->flags); }
 
-#define TESTSETFLAG(uname, lname)					\
+#define TESTSETFLAG(uname, lname, policy)				\
 static inline int TestSetPage##uname(struct page *page)			\
-		{ return test_and_set_bit(PG_##lname, &page->flags); }
+	{ return test_and_set_bit(PG_##lname, &policy(page, 1)->flags); }
 
-#define TESTCLEARFLAG(uname, lname)					\
+#define TESTCLEARFLAG(uname, lname, policy)				\
 static inline int TestClearPage##uname(struct page *page)		\
-		{ return test_and_clear_bit(PG_##lname, &page->flags); }
+	{ return test_and_clear_bit(PG_##lname, &policy(page, 1)->flags); }
 
-#define __TESTCLEARFLAG(uname, lname)					\
+#define __TESTCLEARFLAG(uname, lname, policy)				\
 static inline int __TestClearPage##uname(struct page *page)		\
-		{ return __test_and_clear_bit(PG_##lname, &page->flags); }
+	{ return __test_and_clear_bit(PG_##lname, &policy(page, 1)->flags); }
 
-#define PAGEFLAG(uname, lname) TESTPAGEFLAG(uname, lname)		\
-	SETPAGEFLAG(uname, lname) CLEARPAGEFLAG(uname, lname)
+#define PAGEFLAG(uname, lname, policy)					\
+	TESTPAGEFLAG(uname, lname, policy)				\
+	SETPAGEFLAG(uname, lname, policy)				\
+	CLEARPAGEFLAG(uname, lname, policy)
 
-#define __PAGEFLAG(uname, lname) TESTPAGEFLAG(uname, lname)		\
-	__SETPAGEFLAG(uname, lname)  __CLEARPAGEFLAG(uname, lname)
+#define __PAGEFLAG(uname, lname, policy)				\
+	TESTPAGEFLAG(uname, lname, policy)				\
+	__SETPAGEFLAG(uname, lname, policy)				\
+	__CLEARPAGEFLAG(uname, lname, policy)
 
-#define TESTSCFLAG(uname, lname)					\
-	TESTSETFLAG(uname, lname) TESTCLEARFLAG(uname, lname)
+#define TESTSCFLAG(uname, lname, policy)				\
+	TESTSETFLAG(uname, lname, policy)				\
+	TESTCLEARFLAG(uname, lname, policy)
 
 #define TESTPAGEFLAG_FALSE(uname)					\
 static inline int Page##uname(const struct page *page) { return 0; }
@@ -205,47 +224,93 @@ static inline int __TestClearPage##uname(struct page *page) { return 0; }
 #define TESTSCFLAG_FALSE(uname)						\
 	TESTSETFLAG_FALSE(uname) TESTCLEARFLAG_FALSE(uname)
 
-struct page;	/* forward declaration */
-
-TESTPAGEFLAG(Locked, locked)
-PAGEFLAG(Error, error) TESTCLEARFLAG(Error, error)
-PAGEFLAG(Referenced, referenced) TESTCLEARFLAG(Referenced, referenced)
-	__SETPAGEFLAG(Referenced, referenced)
-PAGEFLAG(Dirty, dirty) TESTSCFLAG(Dirty, dirty) __CLEARPAGEFLAG(Dirty, dirty)
-PAGEFLAG(LRU, lru) __CLEARPAGEFLAG(LRU, lru)
-PAGEFLAG(Active, active) __CLEARPAGEFLAG(Active, active)
-	TESTCLEARFLAG(Active, active)
-__PAGEFLAG(Slab, slab)
-PAGEFLAG(Checked, checked)		/* Used by some filesystems */
-PAGEFLAG(Pinned, pinned) TESTSCFLAG(Pinned, pinned)	/* Xen */
-PAGEFLAG(SavePinned, savepinned);			/* Xen */
-PAGEFLAG(Foreign, foreign);				/* Xen */
-PAGEFLAG(Reserved, reserved) __CLEARPAGEFLAG(Reserved, reserved)
-PAGEFLAG(SwapBacked, swapbacked) __CLEARPAGEFLAG(SwapBacked, swapbacked)
-	__SETPAGEFLAG(SwapBacked, swapbacked)
-
-__PAGEFLAG(SlobFree, slob_free)
+/* Forward declarations */
+struct page;
+static inline int PageCompound(struct page *page);
+static inline int PageTail(struct page *page);
+
+static inline struct page *compound_head_by_tail(struct page *tail)
+{
+	struct page *head = tail->first_page;
+
+	/*
+	 * page->first_page may be a dangling pointer to an old
+	 * compound page, so recheck that it is still a tail
+	 * page before returning.
+	 */
+	smp_rmb();
+	if (likely(PageTail(tail)))
+		return head;
+	return tail;
+}
+
+/*
+ * Since either compound page could be dismantled asynchronously in THP
+ * or we access asynchronously arbitrary positioned struct page, there
+ * would be tail flag race. To handle this race, we should call
+ * smp_rmb() before checking tail flag. compound_head_by_tail() did it.
+ */
+static inline struct page *compound_head(struct page *page)
+{
+	if (unlikely(PageTail(page)))
+		return compound_head_by_tail(page);
+	return page;
+}
+
+/*
+ * If we access compound page synchronously such as access to
+ * allocated page, there is no need to handle tail flag race, so we can
+ * check tail flag directly without any synchronization primitive.
+ */
+static inline struct page *compound_head_fast(struct page *page)
+{
+	if (unlikely(PageTail(page)))
+		return page->first_page;
+	return page;
+}
+
+TESTPAGEFLAG(Locked, locked, ANY)
+PAGEFLAG(Error, error, ANY) TESTCLEARFLAG(Error, error, ANY)
+PAGEFLAG(Referenced, referenced, ANY) TESTCLEARFLAG(Referenced, referenced, ANY)
+	__SETPAGEFLAG(Referenced, referenced, ANY)
+PAGEFLAG(Dirty, dirty, ANY) TESTSCFLAG(Dirty, dirty, ANY)
+	__CLEARPAGEFLAG(Dirty, dirty, ANY)
+PAGEFLAG(LRU, lru, ANY) __CLEARPAGEFLAG(LRU, lru, ANY)
+PAGEFLAG(Active, active, ANY) __CLEARPAGEFLAG(Active, active, ANY)
+	TESTCLEARFLAG(Active, active, ANY)
+__PAGEFLAG(Slab, slab, ANY)
+PAGEFLAG(Checked, checked, ANY)		/* Used by some filesystems */
+PAGEFLAG(Pinned, pinned, ANY) TESTSCFLAG(Pinned, pinned, ANY)	/* Xen */
+PAGEFLAG(SavePinned, savepinned, ANY);			/* Xen */
+PAGEFLAG(Foreign, foreign, ANY);				/* Xen */
+PAGEFLAG(Reserved, reserved, ANY) __CLEARPAGEFLAG(Reserved, reserved, ANY)
+PAGEFLAG(SwapBacked, swapbacked, ANY)
+	__CLEARPAGEFLAG(SwapBacked, swapbacked, ANY)
+	__SETPAGEFLAG(SwapBacked, swapbacked, ANY)
+
+__PAGEFLAG(SlobFree, slob_free, ANY)
 
 /*
  * Private page markings that may be used by the filesystem that owns the page
  * for its own purposes.
  * - PG_private and PG_private_2 cause releasepage() and co to be invoked
  */
-PAGEFLAG(Private, private) __SETPAGEFLAG(Private, private)
-	__CLEARPAGEFLAG(Private, private)
-PAGEFLAG(Private2, private_2) TESTSCFLAG(Private2, private_2)
-PAGEFLAG(OwnerPriv1, owner_priv_1) TESTCLEARFLAG(OwnerPriv1, owner_priv_1)
+PAGEFLAG(Private, private, ANY) __SETPAGEFLAG(Private, private, ANY)
+	__CLEARPAGEFLAG(Private, private, ANY)
+PAGEFLAG(Private2, private_2, ANY) TESTSCFLAG(Private2, private_2, ANY)
+PAGEFLAG(OwnerPriv1, owner_priv_1, ANY)
+	TESTCLEARFLAG(OwnerPriv1, owner_priv_1, ANY)
 
 /*
  * Only test-and-set exist for PG_writeback.  The unconditional operators are
  * risky: they bypass page accounting.
  */
-TESTPAGEFLAG(Writeback, writeback) TESTSCFLAG(Writeback, writeback)
-PAGEFLAG(MappedToDisk, mappedtodisk)
+TESTPAGEFLAG(Writeback, writeback, ANY) TESTSCFLAG(Writeback, writeback, ANY)
+PAGEFLAG(MappedToDisk, mappedtodisk, ANY)
 
 /* PG_readahead is only used for reads; PG_reclaim is only for writes */
-PAGEFLAG(Reclaim, reclaim) TESTCLEARFLAG(Reclaim, reclaim)
-PAGEFLAG(Readahead, reclaim) TESTCLEARFLAG(Readahead, reclaim)
+PAGEFLAG(Reclaim, reclaim, ANY) TESTCLEARFLAG(Reclaim, reclaim, ANY)
+PAGEFLAG(Readahead, reclaim, ANY) TESTCLEARFLAG(Readahead, reclaim, ANY)
 
 #ifdef CONFIG_HIGHMEM
 /*
@@ -258,31 +323,32 @@ PAGEFLAG_FALSE(HighMem)
 #endif
 
 #ifdef CONFIG_SWAP
-PAGEFLAG(SwapCache, swapcache)
+PAGEFLAG(SwapCache, swapcache, ANY)
 #else
 PAGEFLAG_FALSE(SwapCache)
 #endif
 
-PAGEFLAG(Unevictable, unevictable) __CLEARPAGEFLAG(Unevictable, unevictable)
-	TESTCLEARFLAG(Unevictable, unevictable)
+PAGEFLAG(Unevictable, unevictable, ANY)
+	__CLEARPAGEFLAG(Unevictable, unevictable, ANY)
+	TESTCLEARFLAG(Unevictable, unevictable, ANY)
 
 #ifdef CONFIG_MMU
-PAGEFLAG(Mlocked, mlocked) __CLEARPAGEFLAG(Mlocked, mlocked)
-	TESTSCFLAG(Mlocked, mlocked) __TESTCLEARFLAG(Mlocked, mlocked)
+PAGEFLAG(Mlocked, mlocked, ANY) __CLEARPAGEFLAG(Mlocked, mlocked, ANY)
+	TESTSCFLAG(Mlocked, mlocked, ANY) __TESTCLEARFLAG(Mlocked, mlocked, ANY)
 #else
 PAGEFLAG_FALSE(Mlocked) __CLEARPAGEFLAG_NOOP(Mlocked)
 	TESTSCFLAG_FALSE(Mlocked) __TESTCLEARFLAG_FALSE(Mlocked)
 #endif
 
 #ifdef CONFIG_ARCH_USES_PG_UNCACHED
-PAGEFLAG(Uncached, uncached)
+PAGEFLAG(Uncached, uncached, ANY)
 #else
 PAGEFLAG_FALSE(Uncached)
 #endif
 
 #ifdef CONFIG_MEMORY_FAILURE
-PAGEFLAG(HWPoison, hwpoison)
-TESTSCFLAG(HWPoison, hwpoison)
+PAGEFLAG(HWPoison, hwpoison, ANY)
+TESTSCFLAG(HWPoison, hwpoison, ANY)
 #define __PG_HWPOISON (1UL << PG_hwpoison)
 #else
 PAGEFLAG_FALSE(HWPoison)
@@ -367,7 +433,7 @@ static inline void SetPageUptodate(struct page *page)
 	set_bit(PG_uptodate, &(page)->flags);
 }
 
-CLEARPAGEFLAG(Uptodate, uptodate)
+CLEARPAGEFLAG(Uptodate, uptodate, ANY)
 
 int test_clear_page_writeback(struct page *page);
 int __test_set_page_writeback(struct page *page, bool keep_write);
@@ -396,8 +462,8 @@ static inline void set_page_writeback_keepwrite(struct page *page)
  * and arch/powerpc/kvm/book3s_64_vio_hv.c which use it to detect huge pages
  * and avoid handling those in real mode.
  */
-__PAGEFLAG(Head, head) CLEARPAGEFLAG(Head, head)
-__PAGEFLAG(Tail, tail)
+__PAGEFLAG(Head, head, ANY) CLEARPAGEFLAG(Head, head, ANY)
+__PAGEFLAG(Tail, tail, ANY)
 
 static inline int PageCompound(struct page *page)
 {
@@ -421,8 +487,8 @@ static inline void ClearPageCompound(struct page *page)
  * because PageCompound is always set for compound pages and not for
  * pages on the LRU and/or pagecache.
  */
-TESTPAGEFLAG(Compound, compound)
-__SETPAGEFLAG(Head, compound)  __CLEARPAGEFLAG(Head, compound)
+TESTPAGEFLAG(Compound, compound, ANY)
+__SETPAGEFLAG(Head, compound, ANY)  __CLEARPAGEFLAG(Head, compound, ANY)
 
 /*
  * PG_reclaim is used in combination with PG_compound to mark the
@@ -636,6 +702,10 @@ static inline int page_has_private(struct page *page)
 	return !!(page->flags & PAGE_FLAGS_PRIVATE);
 }
 
+#undef ANY
+#undef HEAD
+#undef NO_TAIL
+#undef NO_COMPOUND
 #endif /* !__GENERATING_BOUNDS_H */
 
 #endif	/* PAGE_FLAGS_H */
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 04/16] page-flags: define PG_locked behavior on compound pages
  2015-03-19 17:08 ` Kirill A. Shutemov
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

lock_page() must operate on the whole compound page. It doesn't make
much sense to lock part of compound page. Change code to use head page's
PG_locked, if tail page is passed.

This patch also get rid of custom helprer functions --
__set_page_locked() and __clear_page_locked(). They replaced with
helpers generated by __SETPAGEFLAG/__CLEARPAGEFLAG. Tail pages to these
helper would trigger VM_BUG_ON().

SLUB uses PG_locked as a bit spin locked. IIUC, tail pages should never
appear there. VM_BUG_ON() is added to make sure that this assumption is
correct.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 fs/cifs/file.c             |  8 ++++----
 include/linux/page-flags.h |  2 +-
 include/linux/pagemap.h    | 25 ++++++++-----------------
 mm/filemap.c               | 15 +++++++++------
 mm/ksm.c                   |  2 +-
 mm/memory-failure.c        |  2 +-
 mm/migrate.c               |  2 +-
 mm/shmem.c                 |  4 ++--
 mm/slub.c                  |  2 ++
 mm/swap_state.c            |  4 ++--
 mm/vmscan.c                |  4 ++--
 mm/zswap.c                 |  4 ++--
 12 files changed, 35 insertions(+), 39 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index ca30c391a894..b9fd85dfee9b 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3413,13 +3413,13 @@ readpages_get_pages(struct address_space *mapping, struct list_head *page_list,
 	 * should have access to this page, we're safe to simply set
 	 * PG_locked without checking it first.
 	 */
-	__set_page_locked(page);
+	__SetPageLocked(page);
 	rc = add_to_page_cache_locked(page, mapping,
 				      page->index, GFP_KERNEL);
 
 	/* give up if we can't stick it in the cache */
 	if (rc) {
-		__clear_page_locked(page);
+		__ClearPageLocked(page);
 		return rc;
 	}
 
@@ -3440,10 +3440,10 @@ readpages_get_pages(struct address_space *mapping, struct list_head *page_list,
 		if (*bytes + PAGE_CACHE_SIZE > rsize)
 			break;
 
-		__set_page_locked(page);
+		__SetPageLocked(page);
 		if (add_to_page_cache_locked(page, mapping, page->index,
 								GFP_KERNEL)) {
-			__clear_page_locked(page);
+			__ClearPageLocked(page);
 			break;
 		}
 		list_move_tail(&page->lru, tmplist);
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 32ea62c0ad30..10bdde20b14c 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -269,7 +269,7 @@ static inline struct page *compound_head_fast(struct page *page)
 	return page;
 }
 
-TESTPAGEFLAG(Locked, locked, ANY)
+__PAGEFLAG(Locked, locked, NO_TAIL)
 PAGEFLAG(Error, error, ANY) TESTCLEARFLAG(Error, error, ANY)
 PAGEFLAG(Referenced, referenced, ANY) TESTCLEARFLAG(Referenced, referenced, ANY)
 	__SETPAGEFLAG(Referenced, referenced, ANY)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 4b3736f7065c..7c3790764795 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -426,18 +426,9 @@ extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				unsigned int flags);
 extern void unlock_page(struct page *page);
 
-static inline void __set_page_locked(struct page *page)
-{
-	__set_bit(PG_locked, &page->flags);
-}
-
-static inline void __clear_page_locked(struct page *page)
-{
-	__clear_bit(PG_locked, &page->flags);
-}
-
 static inline int trylock_page(struct page *page)
 {
+	page = compound_head(page);
 	return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
 }
 
@@ -490,9 +481,9 @@ extern int wait_on_page_bit_killable_timeout(struct page *page,
 
 static inline int wait_on_page_locked_killable(struct page *page)
 {
-	if (PageLocked(page))
-		return wait_on_page_bit_killable(page, PG_locked);
-	return 0;
+	if (!PageLocked(page))
+		return 0;
+	return wait_on_page_bit_killable(compound_head(page), PG_locked);
 }
 
 extern wait_queue_head_t *page_waitqueue(struct page *page);
@@ -511,7 +502,7 @@ static inline void wake_up_page(struct page *page, int bit)
 static inline void wait_on_page_locked(struct page *page)
 {
 	if (PageLocked(page))
-		wait_on_page_bit(page, PG_locked);
+		wait_on_page_bit(compound_head(page), PG_locked);
 }
 
 /* 
@@ -656,17 +647,17 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask);
 
 /*
  * Like add_to_page_cache_locked, but used to add newly allocated pages:
- * the page is new, so we can just run __set_page_locked() against it.
+ * the page is new, so we can just run __SetPageLocked() against it.
  */
 static inline int add_to_page_cache(struct page *page,
 		struct address_space *mapping, pgoff_t offset, gfp_t gfp_mask)
 {
 	int error;
 
-	__set_page_locked(page);
+	__SetPageLocked(page);
 	error = add_to_page_cache_locked(page, mapping, offset, gfp_mask);
 	if (unlikely(error))
-		__clear_page_locked(page);
+		__ClearPageLocked(page);
 	return error;
 }
 
diff --git a/mm/filemap.c b/mm/filemap.c
index 12548d03c11d..467768d4263b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -615,11 +615,11 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
 	void *shadow = NULL;
 	int ret;
 
-	__set_page_locked(page);
+	__SetPageLocked(page);
 	ret = __add_to_page_cache_locked(page, mapping, offset,
 					 gfp_mask, &shadow);
 	if (unlikely(ret))
-		__clear_page_locked(page);
+		__ClearPageLocked(page);
 	else {
 		/*
 		 * The page might have been evicted from cache only
@@ -742,6 +742,7 @@ EXPORT_SYMBOL_GPL(add_page_wait_queue);
  */
 void unlock_page(struct page *page)
 {
+	page = compound_head(page);
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	clear_bit_unlock(PG_locked, &page->flags);
 	smp_mb__after_atomic();
@@ -806,18 +807,20 @@ EXPORT_SYMBOL_GPL(page_endio);
  */
 void __lock_page(struct page *page)
 {
-	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
+	struct page *page_head = compound_head(page);
+	DEFINE_WAIT_BIT(wait, &page_head->flags, PG_locked);
 
-	__wait_on_bit_lock(page_waitqueue(page), &wait, bit_wait_io,
+	__wait_on_bit_lock(page_waitqueue(page_head), &wait, bit_wait_io,
 							TASK_UNINTERRUPTIBLE);
 }
 EXPORT_SYMBOL(__lock_page);
 
 int __lock_page_killable(struct page *page)
 {
-	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
+	struct page *page_head = compound_head(page);
+	DEFINE_WAIT_BIT(wait, &page_head->flags, PG_locked);
 
-	return __wait_on_bit_lock(page_waitqueue(page), &wait,
+	return __wait_on_bit_lock(page_waitqueue(page_head), &wait,
 					bit_wait_io, TASK_KILLABLE);
 }
 EXPORT_SYMBOL_GPL(__lock_page_killable);
diff --git a/mm/ksm.c b/mm/ksm.c
index 4162dce2eb44..23138e99a531 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1884,7 +1884,7 @@ struct page *ksm_might_need_to_copy(struct page *page,
 
 		SetPageDirty(new_page);
 		__SetPageUptodate(new_page);
-		__set_page_locked(new_page);
+		__SetPageLocked(new_page);
 	}
 
 	return new_page;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index d487f8dc6d39..399eee44d13d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1136,7 +1136,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
 	/*
 	 * We ignore non-LRU pages for good reasons.
 	 * - PG_locked is only well defined for LRU pages and a few others
-	 * - to avoid races with __set_page_locked()
+	 * - to avoid races with __SetPageLocked()
 	 * - to avoid races with __SetPageSlab*() (and more non-atomic ops)
 	 * The check (unnecessarily) ignores LRU pages being isolated and
 	 * walked by the page reclaim code, however that's not a big loss.
diff --git a/mm/migrate.c b/mm/migrate.c
index 6aa9a4222ea9..114602a68111 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1734,7 +1734,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 		flush_tlb_range(vma, mmun_start, mmun_end);
 
 	/* Prepare a page as a migration target */
-	__set_page_locked(new_page);
+	__SetPageLocked(new_page);
 	SetPageSwapBacked(new_page);
 
 	/* anon mapping, we can simply copy page->mapping to the new page: */
diff --git a/mm/shmem.c b/mm/shmem.c
index 80b360c7bcd1..2e2b943c8e62 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -981,7 +981,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
 	copy_highpage(newpage, oldpage);
 	flush_dcache_page(newpage);
 
-	__set_page_locked(newpage);
+	__SetPageLocked(newpage);
 	SetPageUptodate(newpage);
 	SetPageSwapBacked(newpage);
 	set_page_private(newpage, swap_index);
@@ -1173,7 +1173,7 @@ repeat:
 		}
 
 		__SetPageSwapBacked(page);
-		__set_page_locked(page);
+		__SetPageLocked(page);
 		if (sgp == SGP_WRITE)
 			__SetPageReferenced(page);
 
diff --git a/mm/slub.c b/mm/slub.c
index 2584d4ff02eb..f33ae2b7a5e7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -338,11 +338,13 @@ static inline int oo_objects(struct kmem_cache_order_objects x)
  */
 static __always_inline void slab_lock(struct page *page)
 {
+	VM_BUG_ON_PAGE(PageTail(page), page);
 	bit_spin_lock(PG_locked, &page->flags);
 }
 
 static __always_inline void slab_unlock(struct page *page)
 {
+	VM_BUG_ON_PAGE(PageTail(page), page);
 	__bit_spin_unlock(PG_locked, &page->flags);
 }
 
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 405923f77334..d1c4a25b4362 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -357,7 +357,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 		}
 
 		/* May fail (-ENOMEM) if radix-tree node allocation failed. */
-		__set_page_locked(new_page);
+		__SetPageLocked(new_page);
 		SetPageSwapBacked(new_page);
 		err = __add_to_swap_cache(new_page, entry);
 		if (likely(!err)) {
@@ -371,7 +371,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 		}
 		radix_tree_preload_end();
 		ClearPageSwapBacked(new_page);
-		__clear_page_locked(new_page);
+		__ClearPageLocked(new_page);
 		/*
 		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
 		 * clear SWAP_HAS_CACHE flag.
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 260c413d39cd..dc6cd51577a6 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1062,7 +1062,7 @@ unmap:
 				VM_BUG_ON_PAGE(PageSwapCache(page), page);
 				if (!page_freeze_refs(page, 1))
 					goto keep_locked;
-				__clear_page_locked(page);
+				__ClearPageLocked(page);
 				count_vm_event(PGLAZYFREED);
 				goto free_it;
 			}
@@ -1174,7 +1174,7 @@ unmap:
 		 * we obviously don't have to worry about waking up a process
 		 * waiting on the page lock, because there are no references.
 		 */
-		__clear_page_locked(page);
+		__ClearPageLocked(page);
 free_it:
 		nr_reclaimed++;
 
diff --git a/mm/zswap.c b/mm/zswap.c
index 4249e82ff934..f8583f1fc938 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -490,7 +490,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
 		}
 
 		/* May fail (-ENOMEM) if radix-tree node allocation failed. */
-		__set_page_locked(new_page);
+		__SetPageLocked(new_page);
 		SetPageSwapBacked(new_page);
 		err = __add_to_swap_cache(new_page, entry);
 		if (likely(!err)) {
@@ -501,7 +501,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
 		}
 		radix_tree_preload_end();
 		ClearPageSwapBacked(new_page);
-		__clear_page_locked(new_page);
+		__ClearPageLocked(new_page);
 		/*
 		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
 		 * clear SWAP_HAS_CACHE flag.
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 04/16] page-flags: define PG_locked behavior on compound pages
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

lock_page() must operate on the whole compound page. It doesn't make
much sense to lock part of compound page. Change code to use head page's
PG_locked, if tail page is passed.

This patch also get rid of custom helprer functions --
__set_page_locked() and __clear_page_locked(). They replaced with
helpers generated by __SETPAGEFLAG/__CLEARPAGEFLAG. Tail pages to these
helper would trigger VM_BUG_ON().

SLUB uses PG_locked as a bit spin locked. IIUC, tail pages should never
appear there. VM_BUG_ON() is added to make sure that this assumption is
correct.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 fs/cifs/file.c             |  8 ++++----
 include/linux/page-flags.h |  2 +-
 include/linux/pagemap.h    | 25 ++++++++-----------------
 mm/filemap.c               | 15 +++++++++------
 mm/ksm.c                   |  2 +-
 mm/memory-failure.c        |  2 +-
 mm/migrate.c               |  2 +-
 mm/shmem.c                 |  4 ++--
 mm/slub.c                  |  2 ++
 mm/swap_state.c            |  4 ++--
 mm/vmscan.c                |  4 ++--
 mm/zswap.c                 |  4 ++--
 12 files changed, 35 insertions(+), 39 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index ca30c391a894..b9fd85dfee9b 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3413,13 +3413,13 @@ readpages_get_pages(struct address_space *mapping, struct list_head *page_list,
 	 * should have access to this page, we're safe to simply set
 	 * PG_locked without checking it first.
 	 */
-	__set_page_locked(page);
+	__SetPageLocked(page);
 	rc = add_to_page_cache_locked(page, mapping,
 				      page->index, GFP_KERNEL);
 
 	/* give up if we can't stick it in the cache */
 	if (rc) {
-		__clear_page_locked(page);
+		__ClearPageLocked(page);
 		return rc;
 	}
 
@@ -3440,10 +3440,10 @@ readpages_get_pages(struct address_space *mapping, struct list_head *page_list,
 		if (*bytes + PAGE_CACHE_SIZE > rsize)
 			break;
 
-		__set_page_locked(page);
+		__SetPageLocked(page);
 		if (add_to_page_cache_locked(page, mapping, page->index,
 								GFP_KERNEL)) {
-			__clear_page_locked(page);
+			__ClearPageLocked(page);
 			break;
 		}
 		list_move_tail(&page->lru, tmplist);
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 32ea62c0ad30..10bdde20b14c 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -269,7 +269,7 @@ static inline struct page *compound_head_fast(struct page *page)
 	return page;
 }
 
-TESTPAGEFLAG(Locked, locked, ANY)
+__PAGEFLAG(Locked, locked, NO_TAIL)
 PAGEFLAG(Error, error, ANY) TESTCLEARFLAG(Error, error, ANY)
 PAGEFLAG(Referenced, referenced, ANY) TESTCLEARFLAG(Referenced, referenced, ANY)
 	__SETPAGEFLAG(Referenced, referenced, ANY)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 4b3736f7065c..7c3790764795 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -426,18 +426,9 @@ extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				unsigned int flags);
 extern void unlock_page(struct page *page);
 
-static inline void __set_page_locked(struct page *page)
-{
-	__set_bit(PG_locked, &page->flags);
-}
-
-static inline void __clear_page_locked(struct page *page)
-{
-	__clear_bit(PG_locked, &page->flags);
-}
-
 static inline int trylock_page(struct page *page)
 {
+	page = compound_head(page);
 	return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
 }
 
@@ -490,9 +481,9 @@ extern int wait_on_page_bit_killable_timeout(struct page *page,
 
 static inline int wait_on_page_locked_killable(struct page *page)
 {
-	if (PageLocked(page))
-		return wait_on_page_bit_killable(page, PG_locked);
-	return 0;
+	if (!PageLocked(page))
+		return 0;
+	return wait_on_page_bit_killable(compound_head(page), PG_locked);
 }
 
 extern wait_queue_head_t *page_waitqueue(struct page *page);
@@ -511,7 +502,7 @@ static inline void wake_up_page(struct page *page, int bit)
 static inline void wait_on_page_locked(struct page *page)
 {
 	if (PageLocked(page))
-		wait_on_page_bit(page, PG_locked);
+		wait_on_page_bit(compound_head(page), PG_locked);
 }
 
 /* 
@@ -656,17 +647,17 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask);
 
 /*
  * Like add_to_page_cache_locked, but used to add newly allocated pages:
- * the page is new, so we can just run __set_page_locked() against it.
+ * the page is new, so we can just run __SetPageLocked() against it.
  */
 static inline int add_to_page_cache(struct page *page,
 		struct address_space *mapping, pgoff_t offset, gfp_t gfp_mask)
 {
 	int error;
 
-	__set_page_locked(page);
+	__SetPageLocked(page);
 	error = add_to_page_cache_locked(page, mapping, offset, gfp_mask);
 	if (unlikely(error))
-		__clear_page_locked(page);
+		__ClearPageLocked(page);
 	return error;
 }
 
diff --git a/mm/filemap.c b/mm/filemap.c
index 12548d03c11d..467768d4263b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -615,11 +615,11 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
 	void *shadow = NULL;
 	int ret;
 
-	__set_page_locked(page);
+	__SetPageLocked(page);
 	ret = __add_to_page_cache_locked(page, mapping, offset,
 					 gfp_mask, &shadow);
 	if (unlikely(ret))
-		__clear_page_locked(page);
+		__ClearPageLocked(page);
 	else {
 		/*
 		 * The page might have been evicted from cache only
@@ -742,6 +742,7 @@ EXPORT_SYMBOL_GPL(add_page_wait_queue);
  */
 void unlock_page(struct page *page)
 {
+	page = compound_head(page);
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	clear_bit_unlock(PG_locked, &page->flags);
 	smp_mb__after_atomic();
@@ -806,18 +807,20 @@ EXPORT_SYMBOL_GPL(page_endio);
  */
 void __lock_page(struct page *page)
 {
-	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
+	struct page *page_head = compound_head(page);
+	DEFINE_WAIT_BIT(wait, &page_head->flags, PG_locked);
 
-	__wait_on_bit_lock(page_waitqueue(page), &wait, bit_wait_io,
+	__wait_on_bit_lock(page_waitqueue(page_head), &wait, bit_wait_io,
 							TASK_UNINTERRUPTIBLE);
 }
 EXPORT_SYMBOL(__lock_page);
 
 int __lock_page_killable(struct page *page)
 {
-	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
+	struct page *page_head = compound_head(page);
+	DEFINE_WAIT_BIT(wait, &page_head->flags, PG_locked);
 
-	return __wait_on_bit_lock(page_waitqueue(page), &wait,
+	return __wait_on_bit_lock(page_waitqueue(page_head), &wait,
 					bit_wait_io, TASK_KILLABLE);
 }
 EXPORT_SYMBOL_GPL(__lock_page_killable);
diff --git a/mm/ksm.c b/mm/ksm.c
index 4162dce2eb44..23138e99a531 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1884,7 +1884,7 @@ struct page *ksm_might_need_to_copy(struct page *page,
 
 		SetPageDirty(new_page);
 		__SetPageUptodate(new_page);
-		__set_page_locked(new_page);
+		__SetPageLocked(new_page);
 	}
 
 	return new_page;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index d487f8dc6d39..399eee44d13d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1136,7 +1136,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
 	/*
 	 * We ignore non-LRU pages for good reasons.
 	 * - PG_locked is only well defined for LRU pages and a few others
-	 * - to avoid races with __set_page_locked()
+	 * - to avoid races with __SetPageLocked()
 	 * - to avoid races with __SetPageSlab*() (and more non-atomic ops)
 	 * The check (unnecessarily) ignores LRU pages being isolated and
 	 * walked by the page reclaim code, however that's not a big loss.
diff --git a/mm/migrate.c b/mm/migrate.c
index 6aa9a4222ea9..114602a68111 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1734,7 +1734,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 		flush_tlb_range(vma, mmun_start, mmun_end);
 
 	/* Prepare a page as a migration target */
-	__set_page_locked(new_page);
+	__SetPageLocked(new_page);
 	SetPageSwapBacked(new_page);
 
 	/* anon mapping, we can simply copy page->mapping to the new page: */
diff --git a/mm/shmem.c b/mm/shmem.c
index 80b360c7bcd1..2e2b943c8e62 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -981,7 +981,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
 	copy_highpage(newpage, oldpage);
 	flush_dcache_page(newpage);
 
-	__set_page_locked(newpage);
+	__SetPageLocked(newpage);
 	SetPageUptodate(newpage);
 	SetPageSwapBacked(newpage);
 	set_page_private(newpage, swap_index);
@@ -1173,7 +1173,7 @@ repeat:
 		}
 
 		__SetPageSwapBacked(page);
-		__set_page_locked(page);
+		__SetPageLocked(page);
 		if (sgp == SGP_WRITE)
 			__SetPageReferenced(page);
 
diff --git a/mm/slub.c b/mm/slub.c
index 2584d4ff02eb..f33ae2b7a5e7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -338,11 +338,13 @@ static inline int oo_objects(struct kmem_cache_order_objects x)
  */
 static __always_inline void slab_lock(struct page *page)
 {
+	VM_BUG_ON_PAGE(PageTail(page), page);
 	bit_spin_lock(PG_locked, &page->flags);
 }
 
 static __always_inline void slab_unlock(struct page *page)
 {
+	VM_BUG_ON_PAGE(PageTail(page), page);
 	__bit_spin_unlock(PG_locked, &page->flags);
 }
 
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 405923f77334..d1c4a25b4362 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -357,7 +357,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 		}
 
 		/* May fail (-ENOMEM) if radix-tree node allocation failed. */
-		__set_page_locked(new_page);
+		__SetPageLocked(new_page);
 		SetPageSwapBacked(new_page);
 		err = __add_to_swap_cache(new_page, entry);
 		if (likely(!err)) {
@@ -371,7 +371,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 		}
 		radix_tree_preload_end();
 		ClearPageSwapBacked(new_page);
-		__clear_page_locked(new_page);
+		__ClearPageLocked(new_page);
 		/*
 		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
 		 * clear SWAP_HAS_CACHE flag.
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 260c413d39cd..dc6cd51577a6 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1062,7 +1062,7 @@ unmap:
 				VM_BUG_ON_PAGE(PageSwapCache(page), page);
 				if (!page_freeze_refs(page, 1))
 					goto keep_locked;
-				__clear_page_locked(page);
+				__ClearPageLocked(page);
 				count_vm_event(PGLAZYFREED);
 				goto free_it;
 			}
@@ -1174,7 +1174,7 @@ unmap:
 		 * we obviously don't have to worry about waking up a process
 		 * waiting on the page lock, because there are no references.
 		 */
-		__clear_page_locked(page);
+		__ClearPageLocked(page);
 free_it:
 		nr_reclaimed++;
 
diff --git a/mm/zswap.c b/mm/zswap.c
index 4249e82ff934..f8583f1fc938 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -490,7 +490,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
 		}
 
 		/* May fail (-ENOMEM) if radix-tree node allocation failed. */
-		__set_page_locked(new_page);
+		__SetPageLocked(new_page);
 		SetPageSwapBacked(new_page);
 		err = __add_to_swap_cache(new_page, entry);
 		if (likely(!err)) {
@@ -501,7 +501,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
 		}
 		radix_tree_preload_end();
 		ClearPageSwapBacked(new_page);
-		__clear_page_locked(new_page);
+		__ClearPageLocked(new_page);
 		/*
 		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
 		 * clear SWAP_HAS_CACHE flag.
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 05/16] page-flags: define behavior of FS/IO-related flags on compound pages
  2015-03-19 17:08 ` Kirill A. Shutemov
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

It seems we don't have compound page on FS/IO path currently. Use
NO_COMPOUND to catch if we have.

The odd expection is PG_dirty: sound uses compound pages and maps them
with PTEs. NO_COMPOUND triggers VM_BUG_ON() in set_page_dirty() on
handling shared fault. Let's use HEAD for PG_dirty.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 10bdde20b14c..df2493860821 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -270,16 +270,16 @@ static inline struct page *compound_head_fast(struct page *page)
 }
 
 __PAGEFLAG(Locked, locked, NO_TAIL)
-PAGEFLAG(Error, error, ANY) TESTCLEARFLAG(Error, error, ANY)
+PAGEFLAG(Error, error, NO_COMPOUND) TESTCLEARFLAG(Error, error, NO_COMPOUND)
 PAGEFLAG(Referenced, referenced, ANY) TESTCLEARFLAG(Referenced, referenced, ANY)
 	__SETPAGEFLAG(Referenced, referenced, ANY)
-PAGEFLAG(Dirty, dirty, ANY) TESTSCFLAG(Dirty, dirty, ANY)
-	__CLEARPAGEFLAG(Dirty, dirty, ANY)
+PAGEFLAG(Dirty, dirty, HEAD) TESTSCFLAG(Dirty, dirty, HEAD)
+	__CLEARPAGEFLAG(Dirty, dirty, HEAD)
 PAGEFLAG(LRU, lru, ANY) __CLEARPAGEFLAG(LRU, lru, ANY)
 PAGEFLAG(Active, active, ANY) __CLEARPAGEFLAG(Active, active, ANY)
 	TESTCLEARFLAG(Active, active, ANY)
 __PAGEFLAG(Slab, slab, ANY)
-PAGEFLAG(Checked, checked, ANY)		/* Used by some filesystems */
+PAGEFLAG(Checked, checked, NO_COMPOUND) /* Used by some filesystems */
 PAGEFLAG(Pinned, pinned, ANY) TESTSCFLAG(Pinned, pinned, ANY)	/* Xen */
 PAGEFLAG(SavePinned, savepinned, ANY);			/* Xen */
 PAGEFLAG(Foreign, foreign, ANY);				/* Xen */
@@ -305,12 +305,15 @@ PAGEFLAG(OwnerPriv1, owner_priv_1, ANY)
  * Only test-and-set exist for PG_writeback.  The unconditional operators are
  * risky: they bypass page accounting.
  */
-TESTPAGEFLAG(Writeback, writeback, ANY) TESTSCFLAG(Writeback, writeback, ANY)
-PAGEFLAG(MappedToDisk, mappedtodisk, ANY)
+TESTPAGEFLAG(Writeback, writeback, NO_COMPOUND)
+	TESTSCFLAG(Writeback, writeback, NO_COMPOUND)
+PAGEFLAG(MappedToDisk, mappedtodisk, NO_COMPOUND)
 
 /* PG_readahead is only used for reads; PG_reclaim is only for writes */
-PAGEFLAG(Reclaim, reclaim, ANY) TESTCLEARFLAG(Reclaim, reclaim, ANY)
-PAGEFLAG(Readahead, reclaim, ANY) TESTCLEARFLAG(Readahead, reclaim, ANY)
+PAGEFLAG(Reclaim, reclaim, NO_COMPOUND)
+	TESTCLEARFLAG(Reclaim, reclaim, NO_COMPOUND)
+PAGEFLAG(Readahead, reclaim, NO_COMPOUND)
+	TESTCLEARFLAG(Readahead, reclaim, NO_COMPOUND)
 
 #ifdef CONFIG_HIGHMEM
 /*
@@ -419,7 +422,7 @@ static inline int PageUptodate(struct page *page)
 static inline void __SetPageUptodate(struct page *page)
 {
 	smp_wmb();
-	__set_bit(PG_uptodate, &(page)->flags);
+	__set_bit(PG_uptodate, &page->flags);
 }
 
 static inline void SetPageUptodate(struct page *page)
@@ -430,7 +433,7 @@ static inline void SetPageUptodate(struct page *page)
 	 * uptodate are actually visible before PageUptodate becomes true.
 	 */
 	smp_wmb();
-	set_bit(PG_uptodate, &(page)->flags);
+	set_bit(PG_uptodate, &page->flags);
 }
 
 CLEARPAGEFLAG(Uptodate, uptodate, ANY)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 05/16] page-flags: define behavior of FS/IO-related flags on compound pages
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

It seems we don't have compound page on FS/IO path currently. Use
NO_COMPOUND to catch if we have.

The odd expection is PG_dirty: sound uses compound pages and maps them
with PTEs. NO_COMPOUND triggers VM_BUG_ON() in set_page_dirty() on
handling shared fault. Let's use HEAD for PG_dirty.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 10bdde20b14c..df2493860821 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -270,16 +270,16 @@ static inline struct page *compound_head_fast(struct page *page)
 }
 
 __PAGEFLAG(Locked, locked, NO_TAIL)
-PAGEFLAG(Error, error, ANY) TESTCLEARFLAG(Error, error, ANY)
+PAGEFLAG(Error, error, NO_COMPOUND) TESTCLEARFLAG(Error, error, NO_COMPOUND)
 PAGEFLAG(Referenced, referenced, ANY) TESTCLEARFLAG(Referenced, referenced, ANY)
 	__SETPAGEFLAG(Referenced, referenced, ANY)
-PAGEFLAG(Dirty, dirty, ANY) TESTSCFLAG(Dirty, dirty, ANY)
-	__CLEARPAGEFLAG(Dirty, dirty, ANY)
+PAGEFLAG(Dirty, dirty, HEAD) TESTSCFLAG(Dirty, dirty, HEAD)
+	__CLEARPAGEFLAG(Dirty, dirty, HEAD)
 PAGEFLAG(LRU, lru, ANY) __CLEARPAGEFLAG(LRU, lru, ANY)
 PAGEFLAG(Active, active, ANY) __CLEARPAGEFLAG(Active, active, ANY)
 	TESTCLEARFLAG(Active, active, ANY)
 __PAGEFLAG(Slab, slab, ANY)
-PAGEFLAG(Checked, checked, ANY)		/* Used by some filesystems */
+PAGEFLAG(Checked, checked, NO_COMPOUND) /* Used by some filesystems */
 PAGEFLAG(Pinned, pinned, ANY) TESTSCFLAG(Pinned, pinned, ANY)	/* Xen */
 PAGEFLAG(SavePinned, savepinned, ANY);			/* Xen */
 PAGEFLAG(Foreign, foreign, ANY);				/* Xen */
@@ -305,12 +305,15 @@ PAGEFLAG(OwnerPriv1, owner_priv_1, ANY)
  * Only test-and-set exist for PG_writeback.  The unconditional operators are
  * risky: they bypass page accounting.
  */
-TESTPAGEFLAG(Writeback, writeback, ANY) TESTSCFLAG(Writeback, writeback, ANY)
-PAGEFLAG(MappedToDisk, mappedtodisk, ANY)
+TESTPAGEFLAG(Writeback, writeback, NO_COMPOUND)
+	TESTSCFLAG(Writeback, writeback, NO_COMPOUND)
+PAGEFLAG(MappedToDisk, mappedtodisk, NO_COMPOUND)
 
 /* PG_readahead is only used for reads; PG_reclaim is only for writes */
-PAGEFLAG(Reclaim, reclaim, ANY) TESTCLEARFLAG(Reclaim, reclaim, ANY)
-PAGEFLAG(Readahead, reclaim, ANY) TESTCLEARFLAG(Readahead, reclaim, ANY)
+PAGEFLAG(Reclaim, reclaim, NO_COMPOUND)
+	TESTCLEARFLAG(Reclaim, reclaim, NO_COMPOUND)
+PAGEFLAG(Readahead, reclaim, NO_COMPOUND)
+	TESTCLEARFLAG(Readahead, reclaim, NO_COMPOUND)
 
 #ifdef CONFIG_HIGHMEM
 /*
@@ -419,7 +422,7 @@ static inline int PageUptodate(struct page *page)
 static inline void __SetPageUptodate(struct page *page)
 {
 	smp_wmb();
-	__set_bit(PG_uptodate, &(page)->flags);
+	__set_bit(PG_uptodate, &page->flags);
 }
 
 static inline void SetPageUptodate(struct page *page)
@@ -430,7 +433,7 @@ static inline void SetPageUptodate(struct page *page)
 	 * uptodate are actually visible before PageUptodate becomes true.
 	 */
 	smp_wmb();
-	set_bit(PG_uptodate, &(page)->flags);
+	set_bit(PG_uptodate, &page->flags);
 }
 
 CLEARPAGEFLAG(Uptodate, uptodate, ANY)
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 06/16] page-flags: define behavior of LRU-related flags on compound pages
  2015-03-19 17:08 ` Kirill A. Shutemov
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

Only head pages are ever on LRU. Let's use HEAD policy to avoid any
confusion for all LRU-related flags.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index df2493860821..bdb0d0e226c4 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -271,13 +271,14 @@ static inline struct page *compound_head_fast(struct page *page)
 
 __PAGEFLAG(Locked, locked, NO_TAIL)
 PAGEFLAG(Error, error, NO_COMPOUND) TESTCLEARFLAG(Error, error, NO_COMPOUND)
-PAGEFLAG(Referenced, referenced, ANY) TESTCLEARFLAG(Referenced, referenced, ANY)
-	__SETPAGEFLAG(Referenced, referenced, ANY)
+PAGEFLAG(Referenced, referenced, HEAD)
+	TESTCLEARFLAG(Referenced, referenced, HEAD)
+	__SETPAGEFLAG(Referenced, referenced, HEAD)
 PAGEFLAG(Dirty, dirty, HEAD) TESTSCFLAG(Dirty, dirty, HEAD)
 	__CLEARPAGEFLAG(Dirty, dirty, HEAD)
-PAGEFLAG(LRU, lru, ANY) __CLEARPAGEFLAG(LRU, lru, ANY)
-PAGEFLAG(Active, active, ANY) __CLEARPAGEFLAG(Active, active, ANY)
-	TESTCLEARFLAG(Active, active, ANY)
+PAGEFLAG(LRU, lru, HEAD) __CLEARPAGEFLAG(LRU, lru, HEAD)
+PAGEFLAG(Active, active, HEAD) __CLEARPAGEFLAG(Active, active, HEAD)
+	TESTCLEARFLAG(Active, active, HEAD)
 __PAGEFLAG(Slab, slab, ANY)
 PAGEFLAG(Checked, checked, NO_COMPOUND) /* Used by some filesystems */
 PAGEFLAG(Pinned, pinned, ANY) TESTSCFLAG(Pinned, pinned, ANY)	/* Xen */
@@ -331,9 +332,9 @@ PAGEFLAG(SwapCache, swapcache, ANY)
 PAGEFLAG_FALSE(SwapCache)
 #endif
 
-PAGEFLAG(Unevictable, unevictable, ANY)
-	__CLEARPAGEFLAG(Unevictable, unevictable, ANY)
-	TESTCLEARFLAG(Unevictable, unevictable, ANY)
+PAGEFLAG(Unevictable, unevictable, HEAD)
+	__CLEARPAGEFLAG(Unevictable, unevictable, HEAD)
+	TESTCLEARFLAG(Unevictable, unevictable, HEAD)
 
 #ifdef CONFIG_MMU
 PAGEFLAG(Mlocked, mlocked, ANY) __CLEARPAGEFLAG(Mlocked, mlocked, ANY)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 06/16] page-flags: define behavior of LRU-related flags on compound pages
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

Only head pages are ever on LRU. Let's use HEAD policy to avoid any
confusion for all LRU-related flags.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index df2493860821..bdb0d0e226c4 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -271,13 +271,14 @@ static inline struct page *compound_head_fast(struct page *page)
 
 __PAGEFLAG(Locked, locked, NO_TAIL)
 PAGEFLAG(Error, error, NO_COMPOUND) TESTCLEARFLAG(Error, error, NO_COMPOUND)
-PAGEFLAG(Referenced, referenced, ANY) TESTCLEARFLAG(Referenced, referenced, ANY)
-	__SETPAGEFLAG(Referenced, referenced, ANY)
+PAGEFLAG(Referenced, referenced, HEAD)
+	TESTCLEARFLAG(Referenced, referenced, HEAD)
+	__SETPAGEFLAG(Referenced, referenced, HEAD)
 PAGEFLAG(Dirty, dirty, HEAD) TESTSCFLAG(Dirty, dirty, HEAD)
 	__CLEARPAGEFLAG(Dirty, dirty, HEAD)
-PAGEFLAG(LRU, lru, ANY) __CLEARPAGEFLAG(LRU, lru, ANY)
-PAGEFLAG(Active, active, ANY) __CLEARPAGEFLAG(Active, active, ANY)
-	TESTCLEARFLAG(Active, active, ANY)
+PAGEFLAG(LRU, lru, HEAD) __CLEARPAGEFLAG(LRU, lru, HEAD)
+PAGEFLAG(Active, active, HEAD) __CLEARPAGEFLAG(Active, active, HEAD)
+	TESTCLEARFLAG(Active, active, HEAD)
 __PAGEFLAG(Slab, slab, ANY)
 PAGEFLAG(Checked, checked, NO_COMPOUND) /* Used by some filesystems */
 PAGEFLAG(Pinned, pinned, ANY) TESTSCFLAG(Pinned, pinned, ANY)	/* Xen */
@@ -331,9 +332,9 @@ PAGEFLAG(SwapCache, swapcache, ANY)
 PAGEFLAG_FALSE(SwapCache)
 #endif
 
-PAGEFLAG(Unevictable, unevictable, ANY)
-	__CLEARPAGEFLAG(Unevictable, unevictable, ANY)
-	TESTCLEARFLAG(Unevictable, unevictable, ANY)
+PAGEFLAG(Unevictable, unevictable, HEAD)
+	__CLEARPAGEFLAG(Unevictable, unevictable, HEAD)
+	TESTCLEARFLAG(Unevictable, unevictable, HEAD)
 
 #ifdef CONFIG_MMU
 PAGEFLAG(Mlocked, mlocked, ANY) __CLEARPAGEFLAG(Mlocked, mlocked, ANY)
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 07/16] page-flags: define behavior SL*B-related flags on compound pages
  2015-03-19 17:08 ` Kirill A. Shutemov
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

SL*B uses compound pages and marks head pages with PG_slab.
__SetPageSlab() and __ClearPageSlab() are never called for tail pages.

The same situation with PG_slob_free in SLOB allocator.

NO_TAIL is appropriate for these flags.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index bdb0d0e226c4..d41c63b566b8 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -279,7 +279,8 @@ PAGEFLAG(Dirty, dirty, HEAD) TESTSCFLAG(Dirty, dirty, HEAD)
 PAGEFLAG(LRU, lru, HEAD) __CLEARPAGEFLAG(LRU, lru, HEAD)
 PAGEFLAG(Active, active, HEAD) __CLEARPAGEFLAG(Active, active, HEAD)
 	TESTCLEARFLAG(Active, active, HEAD)
-__PAGEFLAG(Slab, slab, ANY)
+__PAGEFLAG(Slab, slab, NO_TAIL)
+__PAGEFLAG(SlobFree, slob_free, NO_TAIL)
 PAGEFLAG(Checked, checked, NO_COMPOUND) /* Used by some filesystems */
 PAGEFLAG(Pinned, pinned, ANY) TESTSCFLAG(Pinned, pinned, ANY)	/* Xen */
 PAGEFLAG(SavePinned, savepinned, ANY);			/* Xen */
@@ -289,8 +290,6 @@ PAGEFLAG(SwapBacked, swapbacked, ANY)
 	__CLEARPAGEFLAG(SwapBacked, swapbacked, ANY)
 	__SETPAGEFLAG(SwapBacked, swapbacked, ANY)
 
-__PAGEFLAG(SlobFree, slob_free, ANY)
-
 /*
  * Private page markings that may be used by the filesystem that owns the page
  * for its own purposes.
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 07/16] page-flags: define behavior SL*B-related flags on compound pages
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

SL*B uses compound pages and marks head pages with PG_slab.
__SetPageSlab() and __ClearPageSlab() are never called for tail pages.

The same situation with PG_slob_free in SLOB allocator.

NO_TAIL is appropriate for these flags.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index bdb0d0e226c4..d41c63b566b8 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -279,7 +279,8 @@ PAGEFLAG(Dirty, dirty, HEAD) TESTSCFLAG(Dirty, dirty, HEAD)
 PAGEFLAG(LRU, lru, HEAD) __CLEARPAGEFLAG(LRU, lru, HEAD)
 PAGEFLAG(Active, active, HEAD) __CLEARPAGEFLAG(Active, active, HEAD)
 	TESTCLEARFLAG(Active, active, HEAD)
-__PAGEFLAG(Slab, slab, ANY)
+__PAGEFLAG(Slab, slab, NO_TAIL)
+__PAGEFLAG(SlobFree, slob_free, NO_TAIL)
 PAGEFLAG(Checked, checked, NO_COMPOUND) /* Used by some filesystems */
 PAGEFLAG(Pinned, pinned, ANY) TESTSCFLAG(Pinned, pinned, ANY)	/* Xen */
 PAGEFLAG(SavePinned, savepinned, ANY);			/* Xen */
@@ -289,8 +290,6 @@ PAGEFLAG(SwapBacked, swapbacked, ANY)
 	__CLEARPAGEFLAG(SwapBacked, swapbacked, ANY)
 	__SETPAGEFLAG(SwapBacked, swapbacked, ANY)
 
-__PAGEFLAG(SlobFree, slob_free, ANY)
-
 /*
  * Private page markings that may be used by the filesystem that owns the page
  * for its own purposes.
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 08/16] page-flags: define behavior of Xen-related flags on compound pages
  2015-03-19 17:08 ` Kirill A. Shutemov
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

PG_pinned and PG_savepinned are about page table's pages which are never
compound.

I'm not so sure about PG_foreign, but it seems we shouldn't see compound
pages there too.

Let's use NO_COMPOUND for all of them.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index d41c63b566b8..19373c98d08a 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -282,9 +282,12 @@ PAGEFLAG(Active, active, HEAD) __CLEARPAGEFLAG(Active, active, HEAD)
 __PAGEFLAG(Slab, slab, NO_TAIL)
 __PAGEFLAG(SlobFree, slob_free, NO_TAIL)
 PAGEFLAG(Checked, checked, NO_COMPOUND) /* Used by some filesystems */
-PAGEFLAG(Pinned, pinned, ANY) TESTSCFLAG(Pinned, pinned, ANY)	/* Xen */
-PAGEFLAG(SavePinned, savepinned, ANY);			/* Xen */
-PAGEFLAG(Foreign, foreign, ANY);				/* Xen */
+
+/* Xen */
+PAGEFLAG(Pinned, pinned, NO_COMPOUND) TESTSCFLAG(Pinned, pinned, NO_COMPOUND)
+PAGEFLAG(SavePinned, savepinned, NO_COMPOUND)
+PAGEFLAG(Foreign, foreign, NO_COMPOUND)
+
 PAGEFLAG(Reserved, reserved, ANY) __CLEARPAGEFLAG(Reserved, reserved, ANY)
 PAGEFLAG(SwapBacked, swapbacked, ANY)
 	__CLEARPAGEFLAG(SwapBacked, swapbacked, ANY)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 08/16] page-flags: define behavior of Xen-related flags on compound pages
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

PG_pinned and PG_savepinned are about page table's pages which are never
compound.

I'm not so sure about PG_foreign, but it seems we shouldn't see compound
pages there too.

Let's use NO_COMPOUND for all of them.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index d41c63b566b8..19373c98d08a 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -282,9 +282,12 @@ PAGEFLAG(Active, active, HEAD) __CLEARPAGEFLAG(Active, active, HEAD)
 __PAGEFLAG(Slab, slab, NO_TAIL)
 __PAGEFLAG(SlobFree, slob_free, NO_TAIL)
 PAGEFLAG(Checked, checked, NO_COMPOUND) /* Used by some filesystems */
-PAGEFLAG(Pinned, pinned, ANY) TESTSCFLAG(Pinned, pinned, ANY)	/* Xen */
-PAGEFLAG(SavePinned, savepinned, ANY);			/* Xen */
-PAGEFLAG(Foreign, foreign, ANY);				/* Xen */
+
+/* Xen */
+PAGEFLAG(Pinned, pinned, NO_COMPOUND) TESTSCFLAG(Pinned, pinned, NO_COMPOUND)
+PAGEFLAG(SavePinned, savepinned, NO_COMPOUND)
+PAGEFLAG(Foreign, foreign, NO_COMPOUND)
+
 PAGEFLAG(Reserved, reserved, ANY) __CLEARPAGEFLAG(Reserved, reserved, ANY)
 PAGEFLAG(SwapBacked, swapbacked, ANY)
 	__CLEARPAGEFLAG(SwapBacked, swapbacked, ANY)
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 09/16] page-flags: define PG_reserved behavior on compound pages
  2015-03-19 17:08 ` Kirill A. Shutemov
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

As far as I can see there's no users of PG_reserved on compound pages.
Let's use NO_COMPOUND here.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 19373c98d08a..be691551896b 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -288,7 +288,8 @@ PAGEFLAG(Pinned, pinned, NO_COMPOUND) TESTSCFLAG(Pinned, pinned, NO_COMPOUND)
 PAGEFLAG(SavePinned, savepinned, NO_COMPOUND)
 PAGEFLAG(Foreign, foreign, NO_COMPOUND)
 
-PAGEFLAG(Reserved, reserved, ANY) __CLEARPAGEFLAG(Reserved, reserved, ANY)
+PAGEFLAG(Reserved, reserved, NO_COMPOUND)
+	__CLEARPAGEFLAG(Reserved, reserved, NO_COMPOUND)
 PAGEFLAG(SwapBacked, swapbacked, ANY)
 	__CLEARPAGEFLAG(SwapBacked, swapbacked, ANY)
 	__SETPAGEFLAG(SwapBacked, swapbacked, ANY)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 09/16] page-flags: define PG_reserved behavior on compound pages
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

As far as I can see there's no users of PG_reserved on compound pages.
Let's use NO_COMPOUND here.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 19373c98d08a..be691551896b 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -288,7 +288,8 @@ PAGEFLAG(Pinned, pinned, NO_COMPOUND) TESTSCFLAG(Pinned, pinned, NO_COMPOUND)
 PAGEFLAG(SavePinned, savepinned, NO_COMPOUND)
 PAGEFLAG(Foreign, foreign, NO_COMPOUND)
 
-PAGEFLAG(Reserved, reserved, ANY) __CLEARPAGEFLAG(Reserved, reserved, ANY)
+PAGEFLAG(Reserved, reserved, NO_COMPOUND)
+	__CLEARPAGEFLAG(Reserved, reserved, NO_COMPOUND)
 PAGEFLAG(SwapBacked, swapbacked, ANY)
 	__CLEARPAGEFLAG(SwapBacked, swapbacked, ANY)
 	__SETPAGEFLAG(SwapBacked, swapbacked, ANY)
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 10/16] page-flags: define PG_swapbacked behavior on compound pages
  2015-03-19 17:08 ` Kirill A. Shutemov
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

PG_swapbacked is used for transparent huge pages. For head pages only.
Let's use NO_TAIL policy.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index be691551896b..d1d08508984d 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -290,9 +290,9 @@ PAGEFLAG(Foreign, foreign, NO_COMPOUND)
 
 PAGEFLAG(Reserved, reserved, NO_COMPOUND)
 	__CLEARPAGEFLAG(Reserved, reserved, NO_COMPOUND)
-PAGEFLAG(SwapBacked, swapbacked, ANY)
-	__CLEARPAGEFLAG(SwapBacked, swapbacked, ANY)
-	__SETPAGEFLAG(SwapBacked, swapbacked, ANY)
+PAGEFLAG(SwapBacked, swapbacked, NO_TAIL)
+	__CLEARPAGEFLAG(SwapBacked, swapbacked, NO_TAIL)
+	__SETPAGEFLAG(SwapBacked, swapbacked, NO_TAIL)
 
 /*
  * Private page markings that may be used by the filesystem that owns the page
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 10/16] page-flags: define PG_swapbacked behavior on compound pages
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

PG_swapbacked is used for transparent huge pages. For head pages only.
Let's use NO_TAIL policy.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index be691551896b..d1d08508984d 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -290,9 +290,9 @@ PAGEFLAG(Foreign, foreign, NO_COMPOUND)
 
 PAGEFLAG(Reserved, reserved, NO_COMPOUND)
 	__CLEARPAGEFLAG(Reserved, reserved, NO_COMPOUND)
-PAGEFLAG(SwapBacked, swapbacked, ANY)
-	__CLEARPAGEFLAG(SwapBacked, swapbacked, ANY)
-	__SETPAGEFLAG(SwapBacked, swapbacked, ANY)
+PAGEFLAG(SwapBacked, swapbacked, NO_TAIL)
+	__CLEARPAGEFLAG(SwapBacked, swapbacked, NO_TAIL)
+	__SETPAGEFLAG(SwapBacked, swapbacked, NO_TAIL)
 
 /*
  * Private page markings that may be used by the filesystem that owns the page
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 11/16] page-flags: define PG_swapcache behavior on compound pages
  2015-03-19 17:08 ` Kirill A. Shutemov
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

Swap cannot handle compound pages so far. Transparent huge pages are
split on the way to swap.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index d1d08508984d..9ea90bb8cb89 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -330,7 +330,7 @@ PAGEFLAG_FALSE(HighMem)
 #endif
 
 #ifdef CONFIG_SWAP
-PAGEFLAG(SwapCache, swapcache, ANY)
+PAGEFLAG(SwapCache, swapcache, NO_COMPOUND)
 #else
 PAGEFLAG_FALSE(SwapCache)
 #endif
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 11/16] page-flags: define PG_swapcache behavior on compound pages
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

Swap cannot handle compound pages so far. Transparent huge pages are
split on the way to swap.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index d1d08508984d..9ea90bb8cb89 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -330,7 +330,7 @@ PAGEFLAG_FALSE(HighMem)
 #endif
 
 #ifdef CONFIG_SWAP
-PAGEFLAG(SwapCache, swapcache, ANY)
+PAGEFLAG(SwapCache, swapcache, NO_COMPOUND)
 #else
 PAGEFLAG_FALSE(SwapCache)
 #endif
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 12/16] page-flags: define PG_mlocked behavior on compound pages
  2015-03-19 17:08 ` Kirill A. Shutemov
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

Transparent huge pages can be mlocked -- whole compund page at once.
Something went wrong if we're trying to mlock() tail page.
Let's use NO_TAIL.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 9ea90bb8cb89..a1ecec3f505f 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -340,8 +340,9 @@ PAGEFLAG(Unevictable, unevictable, HEAD)
 	TESTCLEARFLAG(Unevictable, unevictable, HEAD)
 
 #ifdef CONFIG_MMU
-PAGEFLAG(Mlocked, mlocked, ANY) __CLEARPAGEFLAG(Mlocked, mlocked, ANY)
-	TESTSCFLAG(Mlocked, mlocked, ANY) __TESTCLEARFLAG(Mlocked, mlocked, ANY)
+PAGEFLAG(Mlocked, mlocked, NO_TAIL) __CLEARPAGEFLAG(Mlocked, mlocked, NO_TAIL)
+	TESTSCFLAG(Mlocked, mlocked, NO_TAIL)
+	__TESTCLEARFLAG(Mlocked, mlocked, NO_TAIL)
 #else
 PAGEFLAG_FALSE(Mlocked) __CLEARPAGEFLAG_NOOP(Mlocked)
 	TESTSCFLAG_FALSE(Mlocked) __TESTCLEARFLAG_FALSE(Mlocked)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 12/16] page-flags: define PG_mlocked behavior on compound pages
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

Transparent huge pages can be mlocked -- whole compund page at once.
Something went wrong if we're trying to mlock() tail page.
Let's use NO_TAIL.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 9ea90bb8cb89..a1ecec3f505f 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -340,8 +340,9 @@ PAGEFLAG(Unevictable, unevictable, HEAD)
 	TESTCLEARFLAG(Unevictable, unevictable, HEAD)
 
 #ifdef CONFIG_MMU
-PAGEFLAG(Mlocked, mlocked, ANY) __CLEARPAGEFLAG(Mlocked, mlocked, ANY)
-	TESTSCFLAG(Mlocked, mlocked, ANY) __TESTCLEARFLAG(Mlocked, mlocked, ANY)
+PAGEFLAG(Mlocked, mlocked, NO_TAIL) __CLEARPAGEFLAG(Mlocked, mlocked, NO_TAIL)
+	TESTSCFLAG(Mlocked, mlocked, NO_TAIL)
+	__TESTCLEARFLAG(Mlocked, mlocked, NO_TAIL)
 #else
 PAGEFLAG_FALSE(Mlocked) __CLEARPAGEFLAG_NOOP(Mlocked)
 	TESTSCFLAG_FALSE(Mlocked) __TESTCLEARFLAG_FALSE(Mlocked)
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 13/16] page-flags: define PG_uncached behavior on compound pages
  2015-03-19 17:08 ` Kirill A. Shutemov
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

So far, only IA64 uses PG_uncached and only on non-compound pages.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index a1ecec3f505f..0b6921d2f2f3 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -349,7 +349,7 @@ PAGEFLAG_FALSE(Mlocked) __CLEARPAGEFLAG_NOOP(Mlocked)
 #endif
 
 #ifdef CONFIG_ARCH_USES_PG_UNCACHED
-PAGEFLAG(Uncached, uncached, ANY)
+PAGEFLAG(Uncached, uncached, NO_COMPOUND)
 #else
 PAGEFLAG_FALSE(Uncached)
 #endif
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 13/16] page-flags: define PG_uncached behavior on compound pages
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

So far, only IA64 uses PG_uncached and only on non-compound pages.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index a1ecec3f505f..0b6921d2f2f3 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -349,7 +349,7 @@ PAGEFLAG_FALSE(Mlocked) __CLEARPAGEFLAG_NOOP(Mlocked)
 #endif
 
 #ifdef CONFIG_ARCH_USES_PG_UNCACHED
-PAGEFLAG(Uncached, uncached, ANY)
+PAGEFLAG(Uncached, uncached, NO_COMPOUND)
 #else
 PAGEFLAG_FALSE(Uncached)
 #endif
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 14/16] page-flags: define PG_uptodate behavior on compound pages
  2015-03-19 17:08 ` Kirill A. Shutemov
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

We use PG_uptodate on head pages on transparent huge page.
Let's use NO_TAIL.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 0b6921d2f2f3..55a69c40e4ae 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -408,8 +408,9 @@ u64 stable_page_flags(struct page *page);
 
 static inline int PageUptodate(struct page *page)
 {
-	int ret = test_bit(PG_uptodate, &(page)->flags);
-
+	int ret;
+	page = compound_head(page);
+	ret = test_bit(PG_uptodate, &(page)->flags);
 	/*
 	 * Must ensure that the data we read out of the page is loaded
 	 * _after_ we've loaded page->flags to check for PageUptodate.
@@ -426,12 +427,14 @@ static inline int PageUptodate(struct page *page)
 
 static inline void __SetPageUptodate(struct page *page)
 {
+	VM_BUG_ON_PAGE(PageTail(page), page);
 	smp_wmb();
 	__set_bit(PG_uptodate, &page->flags);
 }
 
 static inline void SetPageUptodate(struct page *page)
 {
+	VM_BUG_ON_PAGE(PageTail(page), page);
 	/*
 	 * Memory barrier must be issued before setting the PG_uptodate bit,
 	 * so that all previous stores issued in order to bring the page
@@ -441,7 +444,7 @@ static inline void SetPageUptodate(struct page *page)
 	set_bit(PG_uptodate, &page->flags);
 }
 
-CLEARPAGEFLAG(Uptodate, uptodate, ANY)
+CLEARPAGEFLAG(Uptodate, uptodate, NO_TAIL)
 
 int test_clear_page_writeback(struct page *page);
 int __test_set_page_writeback(struct page *page, bool keep_write);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 14/16] page-flags: define PG_uptodate behavior on compound pages
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

We use PG_uptodate on head pages on transparent huge page.
Let's use NO_TAIL.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 0b6921d2f2f3..55a69c40e4ae 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -408,8 +408,9 @@ u64 stable_page_flags(struct page *page);
 
 static inline int PageUptodate(struct page *page)
 {
-	int ret = test_bit(PG_uptodate, &(page)->flags);
-
+	int ret;
+	page = compound_head(page);
+	ret = test_bit(PG_uptodate, &(page)->flags);
 	/*
 	 * Must ensure that the data we read out of the page is loaded
 	 * _after_ we've loaded page->flags to check for PageUptodate.
@@ -426,12 +427,14 @@ static inline int PageUptodate(struct page *page)
 
 static inline void __SetPageUptodate(struct page *page)
 {
+	VM_BUG_ON_PAGE(PageTail(page), page);
 	smp_wmb();
 	__set_bit(PG_uptodate, &page->flags);
 }
 
 static inline void SetPageUptodate(struct page *page)
 {
+	VM_BUG_ON_PAGE(PageTail(page), page);
 	/*
 	 * Memory barrier must be issued before setting the PG_uptodate bit,
 	 * so that all previous stores issued in order to bring the page
@@ -441,7 +444,7 @@ static inline void SetPageUptodate(struct page *page)
 	set_bit(PG_uptodate, &page->flags);
 }
 
-CLEARPAGEFLAG(Uptodate, uptodate, ANY)
+CLEARPAGEFLAG(Uptodate, uptodate, NO_TAIL)
 
 int test_clear_page_writeback(struct page *page);
 int __test_set_page_writeback(struct page *page, bool keep_write);
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 15/16] page-flags: look on head page if the flag is encoded in page->mapping
  2015-03-19 17:08 ` Kirill A. Shutemov
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

PageAnon() and PageKsm() look on lower bits of page->mapping to check if
the page is Anon or KSM. page->mapping can be overloaded in tail pages.

Let's always look on head page to avoid false-positives.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 55a69c40e4ae..07fa2781df23 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -385,6 +385,7 @@ PAGEFLAG_FALSE(HWPoison)
 
 static inline int PageAnon(struct page *page)
 {
+	page = compound_head(page);
 	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
 }
 
@@ -397,6 +398,7 @@ static inline int PageAnon(struct page *page)
  */
 static inline int PageKsm(struct page *page)
 {
+	page = compound_head(page);
 	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
 				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
 }
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 15/16] page-flags: look on head page if the flag is encoded in page->mapping
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

PageAnon() and PageKsm() look on lower bits of page->mapping to check if
the page is Anon or KSM. page->mapping can be overloaded in tail pages.

Let's always look on head page to avoid false-positives.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 55a69c40e4ae..07fa2781df23 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -385,6 +385,7 @@ PAGEFLAG_FALSE(HWPoison)
 
 static inline int PageAnon(struct page *page)
 {
+	page = compound_head(page);
 	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
 }
 
@@ -397,6 +398,7 @@ static inline int PageAnon(struct page *page)
  */
 static inline int PageKsm(struct page *page)
 {
+	page = compound_head(page);
 	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
 				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
 }
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 16/16] mm: sanitize page->mapping for tail pages
  2015-03-19 17:08 ` Kirill A. Shutemov
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

We don't define meaning of page->mapping for tail pages. Currently it's
always NULL, which can be inconsistent with head page and potentially
lead to problems.

Let's poison the pointer to catch all illigal uses.

page_rmapping() and page_mapping() are changed to look on head page.

The only illigal use I've catched so far is __GPF_COMP pages from sound
subsystem, mapped with PTEs. do_shared_fault() is changed to use
page_rmapping() instead of direct access to fault_page->mapping.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm.h     | 1 +
 include/linux/poison.h | 4 ++++
 mm/huge_memory.c       | 2 +-
 mm/memory.c            | 2 +-
 mm/page_alloc.c        | 7 +++++++
 mm/util.c              | 5 ++++-
 6 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index bcf37dacbee3..4a3a38522ab4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -915,6 +915,7 @@ extern struct address_space *page_mapping(struct page *page);
 /* Neutral page->mapping pointer to address_space or anon_vma or other */
 static inline void *page_rmapping(struct page *page)
 {
+	page = compound_head(page);
 	return (void *)((unsigned long)page->mapping & ~PAGE_MAPPING_FLAGS);
 }
 
diff --git a/include/linux/poison.h b/include/linux/poison.h
index 2110a81c5e2a..7b2a7fcde6a3 100644
--- a/include/linux/poison.h
+++ b/include/linux/poison.h
@@ -32,6 +32,10 @@
 /********** mm/debug-pagealloc.c **********/
 #define PAGE_POISON 0xaa
 
+/********** mm/page_alloc.c ************/
+
+#define TAIL_MAPPING	((void *) 0x01014A11 + POISON_POINTER_DELTA)
+
 /********** mm/slab.c **********/
 /*
  * Magic nums for obj red zoning.
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3412cc8a4bd4..54d90ed2d31b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1739,7 +1739,7 @@ static void __split_huge_page_refcount(struct page *page,
 		*/
 		page_tail->_mapcount = page->_mapcount;
 
-		BUG_ON(page_tail->mapping);
+		BUG_ON(page_tail->mapping != TAIL_MAPPING);
 		page_tail->mapping = page->mapping;
 
 		page_tail->index = page->index + i;
diff --git a/mm/memory.c b/mm/memory.c
index 5ec794f13f8a..a76f61aa88da 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3033,7 +3033,7 @@ static int do_shared_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * pinned by vma->vm_file's reference.  We rely on unlock_page()'s
 	 * release semantics to prevent the compiler from undoing this copying.
 	 */
-	mapping = fault_page->mapping;
+	mapping = page_rmapping(fault_page);
 	unlock_page(fault_page);
 	if ((dirtied || vma->vm_ops->page_mkwrite) && mapping) {
 		/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1b849500640c..e73ecbbfa69f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -373,6 +373,7 @@ void prep_compound_page(struct page *page, unsigned long order)
 	for (i = 1; i < nr_pages; i++) {
 		struct page *p = page + i;
 		set_page_count(p, 0);
+		p->mapping = TAIL_MAPPING;
 		p->first_page = page;
 		/* Make sure p->first_page is always valid for PageTail() */
 		smp_wmb();
@@ -765,6 +766,12 @@ static void free_one_page(struct zone *zone,
 
 static int free_tail_pages_check(struct page *head_page, struct page *page)
 {
+	if (page->mapping != TAIL_MAPPING) {
+		bad_page(page, "corrupted mapping in tail page", 0);
+		page->mapping = NULL;
+		return 1;
+	}
+	page->mapping = NULL;
 	if (!IS_ENABLED(CONFIG_DEBUG_VM))
 		return 0;
 	if (unlikely(!PageTail(page))) {
diff --git a/mm/util.c b/mm/util.c
index d68339206100..769a7a2870af 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -357,7 +357,10 @@ EXPORT_SYMBOL(kvfree);
 
 struct address_space *page_mapping(struct page *page)
 {
-	struct address_space *mapping = page->mapping;
+	struct address_space *mapping;
+
+	page = compound_head(page);
+	mapping = page->mapping;
 
 	/* This happens if someone calls flush_dcache_page on slab page */
 	if (unlikely(PageSlab(page)))
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 16/16] mm: sanitize page->mapping for tail pages
@ 2015-03-19 17:08   ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 17:08 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

We don't define meaning of page->mapping for tail pages. Currently it's
always NULL, which can be inconsistent with head page and potentially
lead to problems.

Let's poison the pointer to catch all illigal uses.

page_rmapping() and page_mapping() are changed to look on head page.

The only illigal use I've catched so far is __GPF_COMP pages from sound
subsystem, mapped with PTEs. do_shared_fault() is changed to use
page_rmapping() instead of direct access to fault_page->mapping.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm.h     | 1 +
 include/linux/poison.h | 4 ++++
 mm/huge_memory.c       | 2 +-
 mm/memory.c            | 2 +-
 mm/page_alloc.c        | 7 +++++++
 mm/util.c              | 5 ++++-
 6 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index bcf37dacbee3..4a3a38522ab4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -915,6 +915,7 @@ extern struct address_space *page_mapping(struct page *page);
 /* Neutral page->mapping pointer to address_space or anon_vma or other */
 static inline void *page_rmapping(struct page *page)
 {
+	page = compound_head(page);
 	return (void *)((unsigned long)page->mapping & ~PAGE_MAPPING_FLAGS);
 }
 
diff --git a/include/linux/poison.h b/include/linux/poison.h
index 2110a81c5e2a..7b2a7fcde6a3 100644
--- a/include/linux/poison.h
+++ b/include/linux/poison.h
@@ -32,6 +32,10 @@
 /********** mm/debug-pagealloc.c **********/
 #define PAGE_POISON 0xaa
 
+/********** mm/page_alloc.c ************/
+
+#define TAIL_MAPPING	((void *) 0x01014A11 + POISON_POINTER_DELTA)
+
 /********** mm/slab.c **********/
 /*
  * Magic nums for obj red zoning.
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3412cc8a4bd4..54d90ed2d31b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1739,7 +1739,7 @@ static void __split_huge_page_refcount(struct page *page,
 		*/
 		page_tail->_mapcount = page->_mapcount;
 
-		BUG_ON(page_tail->mapping);
+		BUG_ON(page_tail->mapping != TAIL_MAPPING);
 		page_tail->mapping = page->mapping;
 
 		page_tail->index = page->index + i;
diff --git a/mm/memory.c b/mm/memory.c
index 5ec794f13f8a..a76f61aa88da 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3033,7 +3033,7 @@ static int do_shared_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * pinned by vma->vm_file's reference.  We rely on unlock_page()'s
 	 * release semantics to prevent the compiler from undoing this copying.
 	 */
-	mapping = fault_page->mapping;
+	mapping = page_rmapping(fault_page);
 	unlock_page(fault_page);
 	if ((dirtied || vma->vm_ops->page_mkwrite) && mapping) {
 		/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1b849500640c..e73ecbbfa69f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -373,6 +373,7 @@ void prep_compound_page(struct page *page, unsigned long order)
 	for (i = 1; i < nr_pages; i++) {
 		struct page *p = page + i;
 		set_page_count(p, 0);
+		p->mapping = TAIL_MAPPING;
 		p->first_page = page;
 		/* Make sure p->first_page is always valid for PageTail() */
 		smp_wmb();
@@ -765,6 +766,12 @@ static void free_one_page(struct zone *zone,
 
 static int free_tail_pages_check(struct page *head_page, struct page *page)
 {
+	if (page->mapping != TAIL_MAPPING) {
+		bad_page(page, "corrupted mapping in tail page", 0);
+		page->mapping = NULL;
+		return 1;
+	}
+	page->mapping = NULL;
 	if (!IS_ENABLED(CONFIG_DEBUG_VM))
 		return 0;
 	if (unlikely(!PageTail(page))) {
diff --git a/mm/util.c b/mm/util.c
index d68339206100..769a7a2870af 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -357,7 +357,10 @@ EXPORT_SYMBOL(kvfree);
 
 struct address_space *page_mapping(struct page *page)
 {
-	struct address_space *mapping = page->mapping;
+	struct address_space *mapping;
+
+	page = compound_head(page);
+	mapping = page->mapping;
 
 	/* This happens if someone calls flush_dcache_page on slab page */
 	if (unlikely(PageSlab(page)))
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* Re: [PATCH 05/16] page-flags: define behavior of FS/IO-related flags on compound pages
  2015-03-19 17:08   ` Kirill A. Shutemov
@ 2015-03-19 18:29     ` Dave Hansen
  -1 siblings, 0 replies; 119+ messages in thread
From: Dave Hansen @ 2015-03-19 18:29 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Mel Gorman, Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On 03/19/2015 10:08 AM, Kirill A. Shutemov wrote:
> The odd expection is PG_dirty: sound uses compound pages and maps them
> with PTEs. NO_COMPOUND triggers VM_BUG_ON() in set_page_dirty() on
> handling shared fault. Let's use HEAD for PG_dirty.

Can we get the sound guys to look at this, btw?  It seems like an odd
thing that we probably don't want to keep around, right?

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 05/16] page-flags: define behavior of FS/IO-related flags on compound pages
@ 2015-03-19 18:29     ` Dave Hansen
  0 siblings, 0 replies; 119+ messages in thread
From: Dave Hansen @ 2015-03-19 18:29 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Mel Gorman, Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On 03/19/2015 10:08 AM, Kirill A. Shutemov wrote:
> The odd expection is PG_dirty: sound uses compound pages and maps them
> with PTEs. NO_COMPOUND triggers VM_BUG_ON() in set_page_dirty() on
> handling shared fault. Let's use HEAD for PG_dirty.

Can we get the sound guys to look at this, btw?  It seems like an odd
thing that we probably don't want to keep around, right?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 05/16] page-flags: define behavior of FS/IO-related flags on compound pages
  2015-03-19 18:29     ` Dave Hansen
@ 2015-03-19 20:02       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 20:02 UTC (permalink / raw)
  To: Dave Hansen, Hugh Dickins
  Cc: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm,
	Jaroslav Kysela, Takashi Iwai, alsa-devel

On Thu, Mar 19, 2015 at 11:29:52AM -0700, Dave Hansen wrote:
> On 03/19/2015 10:08 AM, Kirill A. Shutemov wrote:
> > The odd exception is PG_dirty: sound uses compound pages and maps them
> > with PTEs. NO_COMPOUND triggers VM_BUG_ON() in set_page_dirty() on
> > handling shared fault. Let's use HEAD for PG_dirty.
> 
> Can we get the sound guys to look at this, btw?  It seems like an odd
> thing that we probably don't want to keep around, right?

CC: +sound guys

I'm not sure what is right fix here. At the time adding __GFP_COMP was a
fix: see f3d48f0373c1.

Other odd part about __GFP_COMP here is that we have ->_mapcount in tail
pages to be used for both: mapcount of the individual page and for gup
pins. __compound_tail_refcounted() doesn't recognize that we don't need
tail page accounting for these pages.

Hugh, I tried to ask you about the situation several times (last time on
the summit). Any comments?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 05/16] page-flags: define behavior of FS/IO-related flags on compound pages
@ 2015-03-19 20:02       ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-19 20:02 UTC (permalink / raw)
  To: Dave Hansen, Hugh Dickins
  Cc: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm,
	Jaroslav Kysela, Takashi Iwai, alsa-devel

On Thu, Mar 19, 2015 at 11:29:52AM -0700, Dave Hansen wrote:
> On 03/19/2015 10:08 AM, Kirill A. Shutemov wrote:
> > The odd exception is PG_dirty: sound uses compound pages and maps them
> > with PTEs. NO_COMPOUND triggers VM_BUG_ON() in set_page_dirty() on
> > handling shared fault. Let's use HEAD for PG_dirty.
> 
> Can we get the sound guys to look at this, btw?  It seems like an odd
> thing that we probably don't want to keep around, right?

CC: +sound guys

I'm not sure what is right fix here. At the time adding __GFP_COMP was a
fix: see f3d48f0373c1.

Other odd part about __GFP_COMP here is that we have ->_mapcount in tail
pages to be used for both: mapcount of the individual page and for gup
pins. __compound_tail_refcounted() doesn't recognize that we don't need
tail page accounting for these pages.

Hugh, I tried to ask you about the situation several times (last time on
the summit). Any comments?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 03/16] page-flags: introduce page flags policies wrt compound pages
  2015-03-19 17:08   ` Kirill A. Shutemov
@ 2015-03-20 20:35     ` Andrew Morton
  -1 siblings, 0 replies; 119+ messages in thread
From: Andrew Morton @ 2015-03-20 20:35 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Hugh Dickins, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On Thu, 19 Mar 2015 19:08:09 +0200 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> This patch third argument to macros which create function definitions
> for page flags. This arguments defines how page-flags helpers behave
> on compound functions.
> 
> For now we define four policies:
> 
>  - ANY: the helper function operates on the page it gets, regardless if
>    it's non-compound, head or tail.
> 
>  - HEAD: the helper function operates on the head page of the compound
>    page if it gets tail page.
> 
>  - NO_TAIL: only head and non-compond pages are acceptable for this
>    helper function.
> 
>  - NO_COMPOUND: only non-compound pages are acceptable for this helper
>    function.
> 
> For now we use policy ANY for all helpers, which match current
> behaviour.
> 
> We do not enforce the policy for TESTPAGEFLAG, because we have flags
> checked for random pages all over the kernel. Noticeable exception to
> this is PageTransHuge() which triggers VM_BUG_ON() for tail page.
> 
> +/* Page flags policies wrt compound pages */
> +#define ANY(page, enforce)	page
> +#define HEAD(page, enforce)	compound_head(page)
> +#define NO_TAIL(page, enforce) ({					\
> +#define NO_COMPOUND(page, enforce) ({					\
> ...
>
> +#undef ANY
> +#undef HEAD
> +#undef NO_TAIL
> +#undef NO_COMPOUND
>  #endif /* !__GENERATING_BOUNDS_H */

This is risky - there are existing definitions of ANY and HEAD, and
this code may go and undefine them.  This is improbable at present, as
those definitions are in .c, after all includes.  But still, it's not
good to chew off great hunks of the namespace like this.

So I think I'll prefix all these with "PF_", OK?

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 03/16] page-flags: introduce page flags policies wrt compound pages
@ 2015-03-20 20:35     ` Andrew Morton
  0 siblings, 0 replies; 119+ messages in thread
From: Andrew Morton @ 2015-03-20 20:35 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Hugh Dickins, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On Thu, 19 Mar 2015 19:08:09 +0200 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> This patch third argument to macros which create function definitions
> for page flags. This arguments defines how page-flags helpers behave
> on compound functions.
> 
> For now we define four policies:
> 
>  - ANY: the helper function operates on the page it gets, regardless if
>    it's non-compound, head or tail.
> 
>  - HEAD: the helper function operates on the head page of the compound
>    page if it gets tail page.
> 
>  - NO_TAIL: only head and non-compond pages are acceptable for this
>    helper function.
> 
>  - NO_COMPOUND: only non-compound pages are acceptable for this helper
>    function.
> 
> For now we use policy ANY for all helpers, which match current
> behaviour.
> 
> We do not enforce the policy for TESTPAGEFLAG, because we have flags
> checked for random pages all over the kernel. Noticeable exception to
> this is PageTransHuge() which triggers VM_BUG_ON() for tail page.
> 
> +/* Page flags policies wrt compound pages */
> +#define ANY(page, enforce)	page
> +#define HEAD(page, enforce)	compound_head(page)
> +#define NO_TAIL(page, enforce) ({					\
> +#define NO_COMPOUND(page, enforce) ({					\
> ...
>
> +#undef ANY
> +#undef HEAD
> +#undef NO_TAIL
> +#undef NO_COMPOUND
>  #endif /* !__GENERATING_BOUNDS_H */

This is risky - there are existing definitions of ANY and HEAD, and
this code may go and undefine them.  This is improbable at present, as
those definitions are in .c, after all includes.  But still, it's not
good to chew off great hunks of the namespace like this.

So I think I'll prefix all these with "PF_", OK?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 03/16] page-flags: introduce page flags policies wrt compound pages
  2015-03-20 20:35     ` Andrew Morton
@ 2015-03-20 21:34       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-20 21:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hugh Dickins, Dave Hansen,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On Fri, Mar 20, 2015 at 01:35:53PM -0700, Andrew Morton wrote:
> On Thu, 19 Mar 2015 19:08:09 +0200 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> 
> > This patch third argument to macros which create function definitions
> > for page flags. This arguments defines how page-flags helpers behave
> > on compound functions.
> > 
> > For now we define four policies:
> > 
> >  - ANY: the helper function operates on the page it gets, regardless if
> >    it's non-compound, head or tail.
> > 
> >  - HEAD: the helper function operates on the head page of the compound
> >    page if it gets tail page.
> > 
> >  - NO_TAIL: only head and non-compond pages are acceptable for this
> >    helper function.
> > 
> >  - NO_COMPOUND: only non-compound pages are acceptable for this helper
> >    function.
> > 
> > For now we use policy ANY for all helpers, which match current
> > behaviour.
> > 
> > We do not enforce the policy for TESTPAGEFLAG, because we have flags
> > checked for random pages all over the kernel. Noticeable exception to
> > this is PageTransHuge() which triggers VM_BUG_ON() for tail page.
> > 
> > +/* Page flags policies wrt compound pages */
> > +#define ANY(page, enforce)	page
> > +#define HEAD(page, enforce)	compound_head(page)
> > +#define NO_TAIL(page, enforce) ({					\
> > +#define NO_COMPOUND(page, enforce) ({					\
> > ...
> >
> > +#undef ANY
> > +#undef HEAD
> > +#undef NO_TAIL
> > +#undef NO_COMPOUND
> >  #endif /* !__GENERATING_BOUNDS_H */
> 
> This is risky - there are existing definitions of ANY and HEAD, and
> this code may go and undefine them.  This is improbable at present, as
> those definitions are in .c, after all includes.  But still, it's not
> good to chew off great hunks of the namespace like this.
> 
> So I think I'll prefix all these with "PF_", OK?

Yeah. That's fine.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 03/16] page-flags: introduce page flags policies wrt compound pages
@ 2015-03-20 21:34       ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-20 21:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hugh Dickins, Dave Hansen,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On Fri, Mar 20, 2015 at 01:35:53PM -0700, Andrew Morton wrote:
> On Thu, 19 Mar 2015 19:08:09 +0200 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> 
> > This patch third argument to macros which create function definitions
> > for page flags. This arguments defines how page-flags helpers behave
> > on compound functions.
> > 
> > For now we define four policies:
> > 
> >  - ANY: the helper function operates on the page it gets, regardless if
> >    it's non-compound, head or tail.
> > 
> >  - HEAD: the helper function operates on the head page of the compound
> >    page if it gets tail page.
> > 
> >  - NO_TAIL: only head and non-compond pages are acceptable for this
> >    helper function.
> > 
> >  - NO_COMPOUND: only non-compound pages are acceptable for this helper
> >    function.
> > 
> > For now we use policy ANY for all helpers, which match current
> > behaviour.
> > 
> > We do not enforce the policy for TESTPAGEFLAG, because we have flags
> > checked for random pages all over the kernel. Noticeable exception to
> > this is PageTransHuge() which triggers VM_BUG_ON() for tail page.
> > 
> > +/* Page flags policies wrt compound pages */
> > +#define ANY(page, enforce)	page
> > +#define HEAD(page, enforce)	compound_head(page)
> > +#define NO_TAIL(page, enforce) ({					\
> > +#define NO_COMPOUND(page, enforce) ({					\
> > ...
> >
> > +#undef ANY
> > +#undef HEAD
> > +#undef NO_TAIL
> > +#undef NO_COMPOUND
> >  #endif /* !__GENERATING_BOUNDS_H */
> 
> This is risky - there are existing definitions of ANY and HEAD, and
> this code may go and undefine them.  This is improbable at present, as
> those definitions are in .c, after all includes.  But still, it's not
> good to chew off great hunks of the namespace like this.
> 
> So I think I'll prefix all these with "PF_", OK?

Yeah. That's fine.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 05/16] page-flags: define behavior of FS/IO-related flags on compound pages
  2015-03-19 20:02       ` Kirill A. Shutemov
@ 2015-03-23  0:02         ` Hugh Dickins
  -1 siblings, 0 replies; 119+ messages in thread
From: Hugh Dickins @ 2015-03-23  0:02 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Hugh Dickins, Kirill A. Shutemov, Andrew Morton,
	Andrea Arcangeli, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Jaroslav Kysela, Takashi Iwai,
	alsa-devel

On Thu, 19 Mar 2015, Kirill A. Shutemov wrote:
> On Thu, Mar 19, 2015 at 11:29:52AM -0700, Dave Hansen wrote:
> > On 03/19/2015 10:08 AM, Kirill A. Shutemov wrote:
> > > The odd exception is PG_dirty: sound uses compound pages and maps them
> > > with PTEs. NO_COMPOUND triggers VM_BUG_ON() in set_page_dirty() on
> > > handling shared fault. Let's use HEAD for PG_dirty.

It really depends on what you do with PageDirty of the head, when you
get to support 4k pagecache with subpages of a huge compound page.

HEAD will be fine, so long as PageDirty on the head means the whole
huge page must be written back.  I expect that's what you will choose;
but one could consider that if a huge page is only mapped read-only,
but a few subpages of it writable, then only the few need be written
back, in which case ANY would be more appropriate.  NO_COMPOUND is
certainly wrong.

But that does illustrate that I consider this patch series premature:
it belongs with your huge pagecache implementation.  You seem to be
"tidying up" and adding overhead to things that are fine as they are.

> > 
> > Can we get the sound guys to look at this, btw?  It seems like an odd
> > thing that we probably don't want to keep around, right?
> 
> CC: +sound guys

I don't think this is peculiar to sound at all: there are other users
of __GFP_COMP in the tree, aren't there?  And although some of them
might turn out not to need it any more, I expect most of them still
need it for the same reason they did originally.

> 
> I'm not sure what is right fix here. At the time adding __GFP_COMP was a
> fix: see f3d48f0373c1.

The only thing special about this one, was that I failed to add
__GFP_COMP at first.

The purpose of __GFP_COMP is to allow a >0-order page (originally, just
a hugetlb page: see 2.5.60) to be mapped into userspace, and parts of it
then subjected to get_user_pages (ptrace, futex, direct I/O, infiniband
etc), and now even munmap, without destroying the integrity of the
underlying >0-order page.

We don't bother with __GFP_COMP when a >0-order page cannot be mapped
into userspace (except through /dev/mem or suchlike); we add __GFP_COMP
when it might be, to get the right reference counting.

It's normal for set_page_dirty() to be called in the course of
get_user_pages(), and it's normal for set_page_dirty() to be called
when releasing the get_user_pages() references, and it's normal for
set_page_dirty() to be called when munmap'ing a pte_dirty().

> 
> Other odd part about __GFP_COMP here is that we have ->_mapcount in tail
> pages to be used for both: mapcount of the individual page and for gup
> pins. __compound_tail_refcounted() doesn't recognize that we don't need
> tail page accounting for these pages.

So page->_mapcount of the tails is being used for both their mapcount
and their reference count: that's certainly funny, and further reason
to pursue your aim of simplifying the way THPs are refcounted.  But
not responsible for any actual bug, I think?

> 
> Hugh, I tried to ask you about the situation several times (last time on
> the summit). Any comments?

I do remember we began a curtailed conversation about this at LSF/MM.
I do not remember you asking about it earlier: when was that?

Hugh

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 05/16] page-flags: define behavior of FS/IO-related flags on compound pages
@ 2015-03-23  0:02         ` Hugh Dickins
  0 siblings, 0 replies; 119+ messages in thread
From: Hugh Dickins @ 2015-03-23  0:02 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Hugh Dickins, Kirill A. Shutemov, Andrew Morton,
	Andrea Arcangeli, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Jaroslav Kysela, Takashi Iwai,
	alsa-devel

On Thu, 19 Mar 2015, Kirill A. Shutemov wrote:
> On Thu, Mar 19, 2015 at 11:29:52AM -0700, Dave Hansen wrote:
> > On 03/19/2015 10:08 AM, Kirill A. Shutemov wrote:
> > > The odd exception is PG_dirty: sound uses compound pages and maps them
> > > with PTEs. NO_COMPOUND triggers VM_BUG_ON() in set_page_dirty() on
> > > handling shared fault. Let's use HEAD for PG_dirty.

It really depends on what you do with PageDirty of the head, when you
get to support 4k pagecache with subpages of a huge compound page.

HEAD will be fine, so long as PageDirty on the head means the whole
huge page must be written back.  I expect that's what you will choose;
but one could consider that if a huge page is only mapped read-only,
but a few subpages of it writable, then only the few need be written
back, in which case ANY would be more appropriate.  NO_COMPOUND is
certainly wrong.

But that does illustrate that I consider this patch series premature:
it belongs with your huge pagecache implementation.  You seem to be
"tidying up" and adding overhead to things that are fine as they are.

> > 
> > Can we get the sound guys to look at this, btw?  It seems like an odd
> > thing that we probably don't want to keep around, right?
> 
> CC: +sound guys

I don't think this is peculiar to sound at all: there are other users
of __GFP_COMP in the tree, aren't there?  And although some of them
might turn out not to need it any more, I expect most of them still
need it for the same reason they did originally.

> 
> I'm not sure what is right fix here. At the time adding __GFP_COMP was a
> fix: see f3d48f0373c1.

The only thing special about this one, was that I failed to add
__GFP_COMP at first.

The purpose of __GFP_COMP is to allow a >0-order page (originally, just
a hugetlb page: see 2.5.60) to be mapped into userspace, and parts of it
then subjected to get_user_pages (ptrace, futex, direct I/O, infiniband
etc), and now even munmap, without destroying the integrity of the
underlying >0-order page.

We don't bother with __GFP_COMP when a >0-order page cannot be mapped
into userspace (except through /dev/mem or suchlike); we add __GFP_COMP
when it might be, to get the right reference counting.

It's normal for set_page_dirty() to be called in the course of
get_user_pages(), and it's normal for set_page_dirty() to be called
when releasing the get_user_pages() references, and it's normal for
set_page_dirty() to be called when munmap'ing a pte_dirty().

> 
> Other odd part about __GFP_COMP here is that we have ->_mapcount in tail
> pages to be used for both: mapcount of the individual page and for gup
> pins. __compound_tail_refcounted() doesn't recognize that we don't need
> tail page accounting for these pages.

So page->_mapcount of the tails is being used for both their mapcount
and their reference count: that's certainly funny, and further reason
to pursue your aim of simplifying the way THPs are refcounted.  But
not responsible for any actual bug, I think?

> 
> Hugh, I tried to ask you about the situation several times (last time on
> the summit). Any comments?

I do remember we began a curtailed conversation about this at LSF/MM.
I do not remember you asking about it earlier: when was that?

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 01/16] mm: consolidate all page-flags helpers in <linux/page-flags.h>
  2015-03-19 17:08   ` Kirill A. Shutemov
@ 2015-03-23  0:10     ` Hugh Dickins
  -1 siblings, 0 replies; 119+ messages in thread
From: Hugh Dickins @ 2015-03-23  0:10 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, Dave Hansen,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On Thu, 19 Mar 2015, Kirill A. Shutemov wrote:

> We have page-flags helper function declarations/definitions spread over
> several header files. Let's consolidate them in <linux/page-flags.h>.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Acked-by: Hugh Dickins <hughd@google.com>

I find this one helpful (assuming it builds fine everywhere).  I've
several times recently found myself wanting to use PageAnon tests at a low
level, and been frustrated by its positioning in linux/mm.h (see my 10/24).

> ---
>  include/linux/hugetlb.h    |  7 ----
>  include/linux/ksm.h        | 17 --------
>  include/linux/mm.h         | 81 --------------------------------------
>  include/linux/page-flags.h | 96 ++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 96 insertions(+), 105 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 7b5785032049..1a782733a420 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -41,8 +41,6 @@ extern int hugetlb_max_hstate __read_mostly;
>  struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
>  void hugepage_put_subpool(struct hugepage_subpool *spool);
>  
> -int PageHuge(struct page *page);
> -
>  void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
>  int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
>  int hugetlb_overcommit_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
> @@ -109,11 +107,6 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
>  
>  #else /* !CONFIG_HUGETLB_PAGE */
>  
> -static inline int PageHuge(struct page *page)
> -{
> -	return 0;
> -}
> -
>  static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
>  {
>  }
> diff --git a/include/linux/ksm.h b/include/linux/ksm.h
> index 3be6bb18562d..7ae216a39c9e 100644
> --- a/include/linux/ksm.h
> +++ b/include/linux/ksm.h
> @@ -35,18 +35,6 @@ static inline void ksm_exit(struct mm_struct *mm)
>  		__ksm_exit(mm);
>  }
>  
> -/*
> - * A KSM page is one of those write-protected "shared pages" or "merged pages"
> - * which KSM maps into multiple mms, wherever identical anonymous page content
> - * is found in VM_MERGEABLE vmas.  It's a PageAnon page, pointing not to any
> - * anon_vma, but to that page's node of the stable tree.
> - */
> -static inline int PageKsm(struct page *page)
> -{
> -	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
> -				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
> -}
> -
>  static inline struct stable_node *page_stable_node(struct page *page)
>  {
>  	return PageKsm(page) ? page_rmapping(page) : NULL;
> @@ -87,11 +75,6 @@ static inline void ksm_exit(struct mm_struct *mm)
>  {
>  }
>  
> -static inline int PageKsm(struct page *page)
> -{
> -	return 0;
> -}
> -
>  #ifdef CONFIG_MMU
>  static inline int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
>  		unsigned long end, int advice, unsigned long *vm_flags)
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6571dd78e984..fb1fc38b01ce 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -494,15 +494,6 @@ static inline int page_count(struct page *page)
>  	return atomic_read(&compound_head(page)->_count);
>  }
>  
> -#ifdef CONFIG_HUGETLB_PAGE
> -extern int PageHeadHuge(struct page *page_head);
> -#else /* CONFIG_HUGETLB_PAGE */
> -static inline int PageHeadHuge(struct page *page_head)
> -{
> -	return 0;
> -}
> -#endif /* CONFIG_HUGETLB_PAGE */
> -
>  static inline bool __compound_tail_refcounted(struct page *page)
>  {
>  	return !PageSlab(page) && !PageHeadHuge(page);
> @@ -571,53 +562,6 @@ static inline void init_page_count(struct page *page)
>  	atomic_set(&page->_count, 1);
>  }
>  
> -/*
> - * PageBuddy() indicate that the page is free and in the buddy system
> - * (see mm/page_alloc.c).
> - *
> - * PAGE_BUDDY_MAPCOUNT_VALUE must be <= -2 but better not too close to
> - * -2 so that an underflow of the page_mapcount() won't be mistaken
> - * for a genuine PAGE_BUDDY_MAPCOUNT_VALUE. -128 can be created very
> - * efficiently by most CPU architectures.
> - */
> -#define PAGE_BUDDY_MAPCOUNT_VALUE (-128)
> -
> -static inline int PageBuddy(struct page *page)
> -{
> -	return atomic_read(&page->_mapcount) == PAGE_BUDDY_MAPCOUNT_VALUE;
> -}
> -
> -static inline void __SetPageBuddy(struct page *page)
> -{
> -	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
> -	atomic_set(&page->_mapcount, PAGE_BUDDY_MAPCOUNT_VALUE);
> -}
> -
> -static inline void __ClearPageBuddy(struct page *page)
> -{
> -	VM_BUG_ON_PAGE(!PageBuddy(page), page);
> -	atomic_set(&page->_mapcount, -1);
> -}
> -
> -#define PAGE_BALLOON_MAPCOUNT_VALUE (-256)
> -
> -static inline int PageBalloon(struct page *page)
> -{
> -	return atomic_read(&page->_mapcount) == PAGE_BALLOON_MAPCOUNT_VALUE;
> -}
> -
> -static inline void __SetPageBalloon(struct page *page)
> -{
> -	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
> -	atomic_set(&page->_mapcount, PAGE_BALLOON_MAPCOUNT_VALUE);
> -}
> -
> -static inline void __ClearPageBalloon(struct page *page)
> -{
> -	VM_BUG_ON_PAGE(!PageBalloon(page), page);
> -	atomic_set(&page->_mapcount, -1);
> -}
> -
>  void put_page(struct page *page);
>  void put_pages_list(struct list_head *pages);
>  
> @@ -1006,26 +950,6 @@ void page_address_init(void);
>  #define page_address_init()  do { } while(0)
>  #endif
>  
> -/*
> - * On an anonymous page mapped into a user virtual memory area,
> - * page->mapping points to its anon_vma, not to a struct address_space;
> - * with the PAGE_MAPPING_ANON bit set to distinguish it.  See rmap.h.
> - *
> - * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled,
> - * the PAGE_MAPPING_KSM bit may be set along with the PAGE_MAPPING_ANON bit;
> - * and then page->mapping points, not to an anon_vma, but to a private
> - * structure which KSM associates with that merged page.  See ksm.h.
> - *
> - * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is currently never used.
> - *
> - * Please note that, confusingly, "page_mapping" refers to the inode
> - * address_space which maps the page from disk; whereas "page_mapped"
> - * refers to user virtual address space into which the page is mapped.
> - */
> -#define PAGE_MAPPING_ANON	1
> -#define PAGE_MAPPING_KSM	2
> -#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
> -
>  extern struct address_space *page_mapping(struct page *page);
>  
>  /* Neutral page->mapping pointer to address_space or anon_vma or other */
> @@ -1045,11 +969,6 @@ struct address_space *page_file_mapping(struct page *page)
>  	return page->mapping;
>  }
>  
> -static inline int PageAnon(struct page *page)
> -{
> -	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
> -}
> -
>  /*
>   * Return the pagecache index of the passed page.  Regular pagecache pages
>   * use ->index whereas swapcache pages use ->private
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index c851ff92d5b3..84d10b65cec6 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -289,6 +289,47 @@ PAGEFLAG_FALSE(HWPoison)
>  #define __PG_HWPOISON 0
>  #endif
>  
> +/*
> + * On an anonymous page mapped into a user virtual memory area,
> + * page->mapping points to its anon_vma, not to a struct address_space;
> + * with the PAGE_MAPPING_ANON bit set to distinguish it.  See rmap.h.
> + *
> + * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled,
> + * the PAGE_MAPPING_KSM bit may be set along with the PAGE_MAPPING_ANON bit;
> + * and then page->mapping points, not to an anon_vma, but to a private
> + * structure which KSM associates with that merged page.  See ksm.h.
> + *
> + * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is currently never used.
> + *
> + * Please note that, confusingly, "page_mapping" refers to the inode
> + * address_space which maps the page from disk; whereas "page_mapped"
> + * refers to user virtual address space into which the page is mapped.
> + */
> +#define PAGE_MAPPING_ANON	1
> +#define PAGE_MAPPING_KSM	2
> +#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
> +
> +static inline int PageAnon(struct page *page)
> +{
> +	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
> +}
> +
> +#ifdef CONFIG_KSM
> +/*
> + * A KSM page is one of those write-protected "shared pages" or "merged pages"
> + * which KSM maps into multiple mms, wherever identical anonymous page content
> + * is found in VM_MERGEABLE vmas.  It's a PageAnon page, pointing not to any
> + * anon_vma, but to that page's node of the stable tree.
> + */
> +static inline int PageKsm(struct page *page)
> +{
> +	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
> +				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
> +}
> +#else
> +TESTPAGEFLAG_FALSE(Ksm)
> +#endif
> +
>  u64 stable_page_flags(struct page *page);
>  
>  static inline int PageUptodate(struct page *page)
> @@ -426,6 +467,14 @@ static inline void ClearPageCompound(struct page *page)
>  
>  #endif /* !PAGEFLAGS_EXTENDED */
>  
> +#ifdef CONFIG_HUGETLB_PAGE
> +int PageHuge(struct page *page);
> +int PageHeadHuge(struct page *page);
> +#else
> +TESTPAGEFLAG_FALSE(Huge)
> +TESTPAGEFLAG_FALSE(HeadHuge)
> +#endif
> +
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  /*
>   * PageHuge() only returns true for hugetlbfs pages, but not for
> @@ -480,6 +529,53 @@ static inline int PageTransTail(struct page *page)
>  #endif
>  
>  /*
> + * PageBuddy() indicate that the page is free and in the buddy system
> + * (see mm/page_alloc.c).
> + *
> + * PAGE_BUDDY_MAPCOUNT_VALUE must be <= -2 but better not too close to
> + * -2 so that an underflow of the page_mapcount() won't be mistaken
> + * for a genuine PAGE_BUDDY_MAPCOUNT_VALUE. -128 can be created very
> + * efficiently by most CPU architectures.
> + */
> +#define PAGE_BUDDY_MAPCOUNT_VALUE (-128)
> +
> +static inline int PageBuddy(struct page *page)
> +{
> +	return atomic_read(&page->_mapcount) == PAGE_BUDDY_MAPCOUNT_VALUE;
> +}
> +
> +static inline void __SetPageBuddy(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
> +	atomic_set(&page->_mapcount, PAGE_BUDDY_MAPCOUNT_VALUE);
> +}
> +
> +static inline void __ClearPageBuddy(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(!PageBuddy(page), page);
> +	atomic_set(&page->_mapcount, -1);
> +}
> +
> +#define PAGE_BALLOON_MAPCOUNT_VALUE (-256)
> +
> +static inline int PageBalloon(struct page *page)
> +{
> +	return atomic_read(&page->_mapcount) == PAGE_BALLOON_MAPCOUNT_VALUE;
> +}
> +
> +static inline void __SetPageBalloon(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
> +	atomic_set(&page->_mapcount, PAGE_BALLOON_MAPCOUNT_VALUE);
> +}
> +
> +static inline void __ClearPageBalloon(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(!PageBalloon(page), page);
> +	atomic_set(&page->_mapcount, -1);
> +}
> +
> +/*
>   * If network-based swap is enabled, sl*b must keep track of whether pages
>   * were allocated from pfmemalloc reserves.
>   */
> -- 
> 2.1.4

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 01/16] mm: consolidate all page-flags helpers in <linux/page-flags.h>
@ 2015-03-23  0:10     ` Hugh Dickins
  0 siblings, 0 replies; 119+ messages in thread
From: Hugh Dickins @ 2015-03-23  0:10 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, Dave Hansen,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On Thu, 19 Mar 2015, Kirill A. Shutemov wrote:

> We have page-flags helper function declarations/definitions spread over
> several header files. Let's consolidate them in <linux/page-flags.h>.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Acked-by: Hugh Dickins <hughd@google.com>

I find this one helpful (assuming it builds fine everywhere).  I've
several times recently found myself wanting to use PageAnon tests at a low
level, and been frustrated by its positioning in linux/mm.h (see my 10/24).

> ---
>  include/linux/hugetlb.h    |  7 ----
>  include/linux/ksm.h        | 17 --------
>  include/linux/mm.h         | 81 --------------------------------------
>  include/linux/page-flags.h | 96 ++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 96 insertions(+), 105 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 7b5785032049..1a782733a420 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -41,8 +41,6 @@ extern int hugetlb_max_hstate __read_mostly;
>  struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
>  void hugepage_put_subpool(struct hugepage_subpool *spool);
>  
> -int PageHuge(struct page *page);
> -
>  void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
>  int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
>  int hugetlb_overcommit_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
> @@ -109,11 +107,6 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
>  
>  #else /* !CONFIG_HUGETLB_PAGE */
>  
> -static inline int PageHuge(struct page *page)
> -{
> -	return 0;
> -}
> -
>  static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
>  {
>  }
> diff --git a/include/linux/ksm.h b/include/linux/ksm.h
> index 3be6bb18562d..7ae216a39c9e 100644
> --- a/include/linux/ksm.h
> +++ b/include/linux/ksm.h
> @@ -35,18 +35,6 @@ static inline void ksm_exit(struct mm_struct *mm)
>  		__ksm_exit(mm);
>  }
>  
> -/*
> - * A KSM page is one of those write-protected "shared pages" or "merged pages"
> - * which KSM maps into multiple mms, wherever identical anonymous page content
> - * is found in VM_MERGEABLE vmas.  It's a PageAnon page, pointing not to any
> - * anon_vma, but to that page's node of the stable tree.
> - */
> -static inline int PageKsm(struct page *page)
> -{
> -	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
> -				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
> -}
> -
>  static inline struct stable_node *page_stable_node(struct page *page)
>  {
>  	return PageKsm(page) ? page_rmapping(page) : NULL;
> @@ -87,11 +75,6 @@ static inline void ksm_exit(struct mm_struct *mm)
>  {
>  }
>  
> -static inline int PageKsm(struct page *page)
> -{
> -	return 0;
> -}
> -
>  #ifdef CONFIG_MMU
>  static inline int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
>  		unsigned long end, int advice, unsigned long *vm_flags)
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6571dd78e984..fb1fc38b01ce 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -494,15 +494,6 @@ static inline int page_count(struct page *page)
>  	return atomic_read(&compound_head(page)->_count);
>  }
>  
> -#ifdef CONFIG_HUGETLB_PAGE
> -extern int PageHeadHuge(struct page *page_head);
> -#else /* CONFIG_HUGETLB_PAGE */
> -static inline int PageHeadHuge(struct page *page_head)
> -{
> -	return 0;
> -}
> -#endif /* CONFIG_HUGETLB_PAGE */
> -
>  static inline bool __compound_tail_refcounted(struct page *page)
>  {
>  	return !PageSlab(page) && !PageHeadHuge(page);
> @@ -571,53 +562,6 @@ static inline void init_page_count(struct page *page)
>  	atomic_set(&page->_count, 1);
>  }
>  
> -/*
> - * PageBuddy() indicate that the page is free and in the buddy system
> - * (see mm/page_alloc.c).
> - *
> - * PAGE_BUDDY_MAPCOUNT_VALUE must be <= -2 but better not too close to
> - * -2 so that an underflow of the page_mapcount() won't be mistaken
> - * for a genuine PAGE_BUDDY_MAPCOUNT_VALUE. -128 can be created very
> - * efficiently by most CPU architectures.
> - */
> -#define PAGE_BUDDY_MAPCOUNT_VALUE (-128)
> -
> -static inline int PageBuddy(struct page *page)
> -{
> -	return atomic_read(&page->_mapcount) == PAGE_BUDDY_MAPCOUNT_VALUE;
> -}
> -
> -static inline void __SetPageBuddy(struct page *page)
> -{
> -	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
> -	atomic_set(&page->_mapcount, PAGE_BUDDY_MAPCOUNT_VALUE);
> -}
> -
> -static inline void __ClearPageBuddy(struct page *page)
> -{
> -	VM_BUG_ON_PAGE(!PageBuddy(page), page);
> -	atomic_set(&page->_mapcount, -1);
> -}
> -
> -#define PAGE_BALLOON_MAPCOUNT_VALUE (-256)
> -
> -static inline int PageBalloon(struct page *page)
> -{
> -	return atomic_read(&page->_mapcount) == PAGE_BALLOON_MAPCOUNT_VALUE;
> -}
> -
> -static inline void __SetPageBalloon(struct page *page)
> -{
> -	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
> -	atomic_set(&page->_mapcount, PAGE_BALLOON_MAPCOUNT_VALUE);
> -}
> -
> -static inline void __ClearPageBalloon(struct page *page)
> -{
> -	VM_BUG_ON_PAGE(!PageBalloon(page), page);
> -	atomic_set(&page->_mapcount, -1);
> -}
> -
>  void put_page(struct page *page);
>  void put_pages_list(struct list_head *pages);
>  
> @@ -1006,26 +950,6 @@ void page_address_init(void);
>  #define page_address_init()  do { } while(0)
>  #endif
>  
> -/*
> - * On an anonymous page mapped into a user virtual memory area,
> - * page->mapping points to its anon_vma, not to a struct address_space;
> - * with the PAGE_MAPPING_ANON bit set to distinguish it.  See rmap.h.
> - *
> - * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled,
> - * the PAGE_MAPPING_KSM bit may be set along with the PAGE_MAPPING_ANON bit;
> - * and then page->mapping points, not to an anon_vma, but to a private
> - * structure which KSM associates with that merged page.  See ksm.h.
> - *
> - * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is currently never used.
> - *
> - * Please note that, confusingly, "page_mapping" refers to the inode
> - * address_space which maps the page from disk; whereas "page_mapped"
> - * refers to user virtual address space into which the page is mapped.
> - */
> -#define PAGE_MAPPING_ANON	1
> -#define PAGE_MAPPING_KSM	2
> -#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
> -
>  extern struct address_space *page_mapping(struct page *page);
>  
>  /* Neutral page->mapping pointer to address_space or anon_vma or other */
> @@ -1045,11 +969,6 @@ struct address_space *page_file_mapping(struct page *page)
>  	return page->mapping;
>  }
>  
> -static inline int PageAnon(struct page *page)
> -{
> -	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
> -}
> -
>  /*
>   * Return the pagecache index of the passed page.  Regular pagecache pages
>   * use ->index whereas swapcache pages use ->private
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index c851ff92d5b3..84d10b65cec6 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -289,6 +289,47 @@ PAGEFLAG_FALSE(HWPoison)
>  #define __PG_HWPOISON 0
>  #endif
>  
> +/*
> + * On an anonymous page mapped into a user virtual memory area,
> + * page->mapping points to its anon_vma, not to a struct address_space;
> + * with the PAGE_MAPPING_ANON bit set to distinguish it.  See rmap.h.
> + *
> + * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled,
> + * the PAGE_MAPPING_KSM bit may be set along with the PAGE_MAPPING_ANON bit;
> + * and then page->mapping points, not to an anon_vma, but to a private
> + * structure which KSM associates with that merged page.  See ksm.h.
> + *
> + * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is currently never used.
> + *
> + * Please note that, confusingly, "page_mapping" refers to the inode
> + * address_space which maps the page from disk; whereas "page_mapped"
> + * refers to user virtual address space into which the page is mapped.
> + */
> +#define PAGE_MAPPING_ANON	1
> +#define PAGE_MAPPING_KSM	2
> +#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
> +
> +static inline int PageAnon(struct page *page)
> +{
> +	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
> +}
> +
> +#ifdef CONFIG_KSM
> +/*
> + * A KSM page is one of those write-protected "shared pages" or "merged pages"
> + * which KSM maps into multiple mms, wherever identical anonymous page content
> + * is found in VM_MERGEABLE vmas.  It's a PageAnon page, pointing not to any
> + * anon_vma, but to that page's node of the stable tree.
> + */
> +static inline int PageKsm(struct page *page)
> +{
> +	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
> +				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
> +}
> +#else
> +TESTPAGEFLAG_FALSE(Ksm)
> +#endif
> +
>  u64 stable_page_flags(struct page *page);
>  
>  static inline int PageUptodate(struct page *page)
> @@ -426,6 +467,14 @@ static inline void ClearPageCompound(struct page *page)
>  
>  #endif /* !PAGEFLAGS_EXTENDED */
>  
> +#ifdef CONFIG_HUGETLB_PAGE
> +int PageHuge(struct page *page);
> +int PageHeadHuge(struct page *page);
> +#else
> +TESTPAGEFLAG_FALSE(Huge)
> +TESTPAGEFLAG_FALSE(HeadHuge)
> +#endif
> +
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  /*
>   * PageHuge() only returns true for hugetlbfs pages, but not for
> @@ -480,6 +529,53 @@ static inline int PageTransTail(struct page *page)
>  #endif
>  
>  /*
> + * PageBuddy() indicate that the page is free and in the buddy system
> + * (see mm/page_alloc.c).
> + *
> + * PAGE_BUDDY_MAPCOUNT_VALUE must be <= -2 but better not too close to
> + * -2 so that an underflow of the page_mapcount() won't be mistaken
> + * for a genuine PAGE_BUDDY_MAPCOUNT_VALUE. -128 can be created very
> + * efficiently by most CPU architectures.
> + */
> +#define PAGE_BUDDY_MAPCOUNT_VALUE (-128)
> +
> +static inline int PageBuddy(struct page *page)
> +{
> +	return atomic_read(&page->_mapcount) == PAGE_BUDDY_MAPCOUNT_VALUE;
> +}
> +
> +static inline void __SetPageBuddy(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
> +	atomic_set(&page->_mapcount, PAGE_BUDDY_MAPCOUNT_VALUE);
> +}
> +
> +static inline void __ClearPageBuddy(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(!PageBuddy(page), page);
> +	atomic_set(&page->_mapcount, -1);
> +}
> +
> +#define PAGE_BALLOON_MAPCOUNT_VALUE (-256)
> +
> +static inline int PageBalloon(struct page *page)
> +{
> +	return atomic_read(&page->_mapcount) == PAGE_BALLOON_MAPCOUNT_VALUE;
> +}
> +
> +static inline void __SetPageBalloon(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
> +	atomic_set(&page->_mapcount, PAGE_BALLOON_MAPCOUNT_VALUE);
> +}
> +
> +static inline void __ClearPageBalloon(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(!PageBalloon(page), page);
> +	atomic_set(&page->_mapcount, -1);
> +}
> +
> +/*
>   * If network-based swap is enabled, sl*b must keep track of whether pages
>   * were allocated from pfmemalloc reserves.
>   */
> -- 
> 2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 02/16] page-flags: trivial cleanup for PageTrans* helpers
  2015-03-19 17:08   ` Kirill A. Shutemov
@ 2015-03-23  0:12     ` Hugh Dickins
  -1 siblings, 0 replies; 119+ messages in thread
From: Hugh Dickins @ 2015-03-23  0:12 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, Dave Hansen,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On Thu, 19 Mar 2015, Kirill A. Shutemov wrote:

> Use TESTPAGEFLAG_FALSE() to get it a bit cleaner.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Yeah, this is okay too.

> ---
>  include/linux/page-flags.h | 18 +++---------------
>  1 file changed, 3 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 84d10b65cec6..327aabd9792e 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -511,21 +511,9 @@ static inline int PageTransTail(struct page *page)
>  }
>  
>  #else
> -
> -static inline int PageTransHuge(struct page *page)
> -{
> -	return 0;
> -}
> -
> -static inline int PageTransCompound(struct page *page)
> -{
> -	return 0;
> -}
> -
> -static inline int PageTransTail(struct page *page)
> -{
> -	return 0;
> -}
> +TESTPAGEFLAG_FALSE(TransHuge)
> +TESTPAGEFLAG_FALSE(TransCompound)
> +TESTPAGEFLAG_FALSE(TransTail)
>  #endif
>  
>  /*
> -- 
> 2.1.4

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 02/16] page-flags: trivial cleanup for PageTrans* helpers
@ 2015-03-23  0:12     ` Hugh Dickins
  0 siblings, 0 replies; 119+ messages in thread
From: Hugh Dickins @ 2015-03-23  0:12 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, Dave Hansen,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On Thu, 19 Mar 2015, Kirill A. Shutemov wrote:

> Use TESTPAGEFLAG_FALSE() to get it a bit cleaner.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Yeah, this is okay too.

> ---
>  include/linux/page-flags.h | 18 +++---------------
>  1 file changed, 3 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 84d10b65cec6..327aabd9792e 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -511,21 +511,9 @@ static inline int PageTransTail(struct page *page)
>  }
>  
>  #else
> -
> -static inline int PageTransHuge(struct page *page)
> -{
> -	return 0;
> -}
> -
> -static inline int PageTransCompound(struct page *page)
> -{
> -	return 0;
> -}
> -
> -static inline int PageTransTail(struct page *page)
> -{
> -	return 0;
> -}
> +TESTPAGEFLAG_FALSE(TransHuge)
> +TESTPAGEFLAG_FALSE(TransCompound)
> +TESTPAGEFLAG_FALSE(TransTail)
>  #endif
>  
>  /*
> -- 
> 2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 00/16] Sanitize usage of ->flags and ->mapping for tail pages
  2015-03-19 17:08 ` Kirill A. Shutemov
@ 2015-03-23  0:28   ` Hugh Dickins
  -1 siblings, 0 replies; 119+ messages in thread
From: Hugh Dickins @ 2015-03-23  0:28 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, Dave Hansen,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On Thu, 19 Mar 2015, Kirill A. Shutemov wrote:

> Currently we take naive approach to page flags on compound -- we set the
> flag on the page without consideration if the flag makes sense for tail
> page or for compound page in general. This patchset try to sort this out
> by defining per-flag policy on what need to be done if page-flag helper
> operate on compound page.
> 
> The last patch in patchset also sanitize usege of page->mapping for tail
> pages. We don't define meaning of page->mapping for tail pages. Currently
> it's always NULL, which can be inconsistent with head page and potentially
> lead to problems.
> 
> For now I catched one case of illigal usage of page flags or ->mapping:
> sound subsystem allocates pages with __GFP_COMP and maps them with PTEs.
> It leads to setting dirty bit on tail pages and access to tail_page's
> ->mapping. I don't see any bad behaviour caused by this, but worth fixing
> anyway.

But there's nothing to fix there.  We're more used to having page->mapping
set by filesystems, but it is normal for drivers to have pages with NULL
page->mapping mapped into userspace (and it's not accidental that they
appear !PageAnon); and subpages of compound pages mapped into userspace,
and set_page_dirty applied to them.

> 
> This patchset makes more sense if you take my THP refcounting into
> account: we will see more compound pages mapped with PTEs and we need to
> define behaviour of flags on compound pages to avoid bugs.

Yes, I quite understand that you want to clarify the usage of different
page flags to yourself, to help towards a policy of what to do with each
of them when subpages of a huge compound page are mapped into userspace;
but I don't see that we need this patchset in the kernel now, given that
it adds unnecessary overhead into several low-level inline functions.

I'm surprised that Andrew has fast-tracked it into his mmotm tree:
I don't think it's harmful beyond the overhead, but it seems premature:
let's wait until we get some benefit too?

Hugh

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 00/16] Sanitize usage of ->flags and ->mapping for tail pages
@ 2015-03-23  0:28   ` Hugh Dickins
  0 siblings, 0 replies; 119+ messages in thread
From: Hugh Dickins @ 2015-03-23  0:28 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, Dave Hansen,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On Thu, 19 Mar 2015, Kirill A. Shutemov wrote:

> Currently we take naive approach to page flags on compound -- we set the
> flag on the page without consideration if the flag makes sense for tail
> page or for compound page in general. This patchset try to sort this out
> by defining per-flag policy on what need to be done if page-flag helper
> operate on compound page.
> 
> The last patch in patchset also sanitize usege of page->mapping for tail
> pages. We don't define meaning of page->mapping for tail pages. Currently
> it's always NULL, which can be inconsistent with head page and potentially
> lead to problems.
> 
> For now I catched one case of illigal usage of page flags or ->mapping:
> sound subsystem allocates pages with __GFP_COMP and maps them with PTEs.
> It leads to setting dirty bit on tail pages and access to tail_page's
> ->mapping. I don't see any bad behaviour caused by this, but worth fixing
> anyway.

But there's nothing to fix there.  We're more used to having page->mapping
set by filesystems, but it is normal for drivers to have pages with NULL
page->mapping mapped into userspace (and it's not accidental that they
appear !PageAnon); and subpages of compound pages mapped into userspace,
and set_page_dirty applied to them.

> 
> This patchset makes more sense if you take my THP refcounting into
> account: we will see more compound pages mapped with PTEs and we need to
> define behaviour of flags on compound pages to avoid bugs.

Yes, I quite understand that you want to clarify the usage of different
page flags to yourself, to help towards a policy of what to do with each
of them when subpages of a huge compound page are mapped into userspace;
but I don't see that we need this patchset in the kernel now, given that
it adds unnecessary overhead into several low-level inline functions.

I'm surprised that Andrew has fast-tracked it into his mmotm tree:
I don't think it's harmful beyond the overhead, but it seems premature:
let's wait until we get some benefit too?

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 00/16] Sanitize usage of ->flags and ->mapping for tail pages
  2015-03-23  0:28   ` Hugh Dickins
@ 2015-03-23 10:04     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-23 10:04 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli, Dave Hansen,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On Sun, Mar 22, 2015 at 05:28:47PM -0700, Hugh Dickins wrote:
> On Thu, 19 Mar 2015, Kirill A. Shutemov wrote:
> 
> > Currently we take naive approach to page flags on compound -- we set the
> > flag on the page without consideration if the flag makes sense for tail
> > page or for compound page in general. This patchset try to sort this out
> > by defining per-flag policy on what need to be done if page-flag helper
> > operate on compound page.
> > 
> > The last patch in patchset also sanitize usege of page->mapping for tail
> > pages. We don't define meaning of page->mapping for tail pages. Currently
> > it's always NULL, which can be inconsistent with head page and potentially
> > lead to problems.
> > 
> > For now I catched one case of illigal usage of page flags or ->mapping:
> > sound subsystem allocates pages with __GFP_COMP and maps them with PTEs.
> > It leads to setting dirty bit on tail pages and access to tail_page's
> > ->mapping. I don't see any bad behaviour caused by this, but worth fixing
> > anyway.
> 
> But there's nothing to fix there.  We're more used to having page->mapping
> set by filesystems, but it is normal for drivers to have pages with NULL
> page->mapping mapped into userspace (and it's not accidental that they
> appear !PageAnon); and subpages of compound pages mapped into userspace,
> and set_page_dirty applied to them.

Yes, it works until some sound driver decide it wants to use
page->mappging.

It's just pure luck that it happened to work in this particular case.

> > This patchset makes more sense if you take my THP refcounting into
> > account: we will see more compound pages mapped with PTEs and we need to
> > define behaviour of flags on compound pages to avoid bugs.
> 
> Yes, I quite understand that you want to clarify the usage of different
> page flags to yourself, to help towards a policy of what to do with each
> of them when subpages of a huge compound page are mapped into userspace;
> but I don't see that we need this patchset in the kernel now, given that
> it adds unnecessary overhead into several low-level inline functions.

We already have subpages of compound page mapped to userspace -- the sound
case.

And what overhead are you talking about?

Check for compound or head bit is practically free in most cases since you
are going to check other bits in the same cache line anyway. Probably a
bit more expensive if the flag is encoded in ->mapping or somewhere else.
(on 32-bit x86 ->mapping case is also free, since it's in the same cache
line as ->flags).

You only need to pay the expense if you hit tail page which is very rare
in current kernel. I think we can pay this cost for correctness.

We will shave some cost of compound_head() if/when my refcounting patchset
get merged: no need of barrier anymore.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 00/16] Sanitize usage of ->flags and ->mapping for tail pages
@ 2015-03-23 10:04     ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-23 10:04 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli, Dave Hansen,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On Sun, Mar 22, 2015 at 05:28:47PM -0700, Hugh Dickins wrote:
> On Thu, 19 Mar 2015, Kirill A. Shutemov wrote:
> 
> > Currently we take naive approach to page flags on compound -- we set the
> > flag on the page without consideration if the flag makes sense for tail
> > page or for compound page in general. This patchset try to sort this out
> > by defining per-flag policy on what need to be done if page-flag helper
> > operate on compound page.
> > 
> > The last patch in patchset also sanitize usege of page->mapping for tail
> > pages. We don't define meaning of page->mapping for tail pages. Currently
> > it's always NULL, which can be inconsistent with head page and potentially
> > lead to problems.
> > 
> > For now I catched one case of illigal usage of page flags or ->mapping:
> > sound subsystem allocates pages with __GFP_COMP and maps them with PTEs.
> > It leads to setting dirty bit on tail pages and access to tail_page's
> > ->mapping. I don't see any bad behaviour caused by this, but worth fixing
> > anyway.
> 
> But there's nothing to fix there.  We're more used to having page->mapping
> set by filesystems, but it is normal for drivers to have pages with NULL
> page->mapping mapped into userspace (and it's not accidental that they
> appear !PageAnon); and subpages of compound pages mapped into userspace,
> and set_page_dirty applied to them.

Yes, it works until some sound driver decide it wants to use
page->mappging.

It's just pure luck that it happened to work in this particular case.

> > This patchset makes more sense if you take my THP refcounting into
> > account: we will see more compound pages mapped with PTEs and we need to
> > define behaviour of flags on compound pages to avoid bugs.
> 
> Yes, I quite understand that you want to clarify the usage of different
> page flags to yourself, to help towards a policy of what to do with each
> of them when subpages of a huge compound page are mapped into userspace;
> but I don't see that we need this patchset in the kernel now, given that
> it adds unnecessary overhead into several low-level inline functions.

We already have subpages of compound page mapped to userspace -- the sound
case.

And what overhead are you talking about?

Check for compound or head bit is practically free in most cases since you
are going to check other bits in the same cache line anyway. Probably a
bit more expensive if the flag is encoded in ->mapping or somewhere else.
(on 32-bit x86 ->mapping case is also free, since it's in the same cache
line as ->flags).

You only need to pay the expense if you hit tail page which is very rare
in current kernel. I think we can pay this cost for correctness.

We will shave some cost of compound_head() if/when my refcounting patchset
get merged: no need of barrier anymore.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 05/16] page-flags: define behavior of FS/IO-related flags on compound pages
  2015-03-23  0:02         ` Hugh Dickins
@ 2015-03-23 12:17           ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-23 12:17 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Hansen, Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm,
	Jaroslav Kysela, Takashi Iwai, alsa-devel

On Sun, Mar 22, 2015 at 05:02:58PM -0700, Hugh Dickins wrote:
> On Thu, 19 Mar 2015, Kirill A. Shutemov wrote:
> > On Thu, Mar 19, 2015 at 11:29:52AM -0700, Dave Hansen wrote:
> > > On 03/19/2015 10:08 AM, Kirill A. Shutemov wrote:
> > > > The odd exception is PG_dirty: sound uses compound pages and maps them
> > > > with PTEs. NO_COMPOUND triggers VM_BUG_ON() in set_page_dirty() on
> > > > handling shared fault. Let's use HEAD for PG_dirty.
> 
> It really depends on what you do with PageDirty of the head, when you
> get to support 4k pagecache with subpages of a huge compound page.
> 
> HEAD will be fine, so long as PageDirty on the head means the whole
> huge page must be written back.  I expect that's what you will choose;
> but one could consider that if a huge page is only mapped read-only,
> but a few subpages of it writable, then only the few need be written
> back, in which case ANY would be more appropriate.  NO_COMPOUND is
> certainly wrong.
> 
> But that does illustrate that I consider this patch series premature:
> it belongs with your huge pagecache implementation.  You seem to be
> "tidying up" and adding overhead to things that are fine as they are.

I agree, it can be ANY too, since we don't use PG_dirty anywhere at the
moment. My first thought was that it's better to match PG_dirty behaviour
with LRU-related, but it's not necessary should be the case.

BTW, do we make any use of PG_dirty on pages with ->mapping == NULL?
Should we avoid dirtying them in the first place?

> > > Can we get the sound guys to look at this, btw?  It seems like an odd
> > > thing that we probably don't want to keep around, right?
> > 
> > CC: +sound guys
> 
> I don't think this is peculiar to sound at all: there are other users
> of __GFP_COMP in the tree, aren't there?  And although some of them
> might turn out not to need it any more, I expect most of them still
> need it for the same reason they did originally.

I haven't seen any other __GFP_COMP user which get it mapped to user-space
with PTEs. Do you? Probably I haven't just stepped on it.

... looking into code a bit more: at least one fb-drivers has compound
pages mapped with PTEs..

> > I'm not sure what is right fix here. At the time adding __GFP_COMP was a
> > fix: see f3d48f0373c1.
> 
> The only thing special about this one, was that I failed to add
> __GFP_COMP at first.
> 
> The purpose of __GFP_COMP is to allow a >0-order page (originally, just
> a hugetlb page: see 2.5.60) to be mapped into userspace, and parts of it
> then subjected to get_user_pages (ptrace, futex, direct I/O, infiniband
> etc), and now even munmap, without destroying the integrity of the
> underlying >0-order page.
> 
> We don't bother with __GFP_COMP when a >0-order page cannot be mapped
> into userspace (except through /dev/mem or suchlike); we add __GFP_COMP
> when it might be, to get the right reference counting.

Wouldn't non-compound >0-order page allocation + split_page() work too?

> It's normal for set_page_dirty() to be called in the course of
> get_user_pages(), and it's normal for set_page_dirty() to be called
> when releasing the get_user_pages() references, and it's normal for
> set_page_dirty() to be called when munmap'ing a pte_dirty().
> 
> > 
> > Other odd part about __GFP_COMP here is that we have ->_mapcount in tail
> > pages to be used for both: mapcount of the individual page and for gup
> > pins. __compound_tail_refcounted() doesn't recognize that we don't need
> > tail page accounting for these pages.
> 
> So page->_mapcount of the tails is being used for both their mapcount
> and their reference count: that's certainly funny, and further reason
> to pursue your aim of simplifying the way THPs are refcounted.  But
> not responsible for any actual bug, I think?

GUP pin would screw up page_mapcount() on these pages. It would affect
memory stats for the process and probably something else.

I think we can get __compound_tail_refcounted() ignore these pages by
checking if page->mapping is NULL.

> > Hugh, I tried to ask you about the situation several times (last time on
> > the summit). Any comments?
> 
> I do remember we began a curtailed conversation about this at LSF/MM.
> I do not remember you asking about it earlier: when was that?

http://lkml.kernel.org/g/20141217004734.GA23150@node.dhcp.inet.fi

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 05/16] page-flags: define behavior of FS/IO-related flags on compound pages
@ 2015-03-23 12:17           ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-23 12:17 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Hansen, Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm,
	Jaroslav Kysela, Takashi Iwai, alsa-devel

On Sun, Mar 22, 2015 at 05:02:58PM -0700, Hugh Dickins wrote:
> On Thu, 19 Mar 2015, Kirill A. Shutemov wrote:
> > On Thu, Mar 19, 2015 at 11:29:52AM -0700, Dave Hansen wrote:
> > > On 03/19/2015 10:08 AM, Kirill A. Shutemov wrote:
> > > > The odd exception is PG_dirty: sound uses compound pages and maps them
> > > > with PTEs. NO_COMPOUND triggers VM_BUG_ON() in set_page_dirty() on
> > > > handling shared fault. Let's use HEAD for PG_dirty.
> 
> It really depends on what you do with PageDirty of the head, when you
> get to support 4k pagecache with subpages of a huge compound page.
> 
> HEAD will be fine, so long as PageDirty on the head means the whole
> huge page must be written back.  I expect that's what you will choose;
> but one could consider that if a huge page is only mapped read-only,
> but a few subpages of it writable, then only the few need be written
> back, in which case ANY would be more appropriate.  NO_COMPOUND is
> certainly wrong.
> 
> But that does illustrate that I consider this patch series premature:
> it belongs with your huge pagecache implementation.  You seem to be
> "tidying up" and adding overhead to things that are fine as they are.

I agree, it can be ANY too, since we don't use PG_dirty anywhere at the
moment. My first thought was that it's better to match PG_dirty behaviour
with LRU-related, but it's not necessary should be the case.

BTW, do we make any use of PG_dirty on pages with ->mapping == NULL?
Should we avoid dirtying them in the first place?

> > > Can we get the sound guys to look at this, btw?  It seems like an odd
> > > thing that we probably don't want to keep around, right?
> > 
> > CC: +sound guys
> 
> I don't think this is peculiar to sound at all: there are other users
> of __GFP_COMP in the tree, aren't there?  And although some of them
> might turn out not to need it any more, I expect most of them still
> need it for the same reason they did originally.

I haven't seen any other __GFP_COMP user which get it mapped to user-space
with PTEs. Do you? Probably I haven't just stepped on it.

... looking into code a bit more: at least one fb-drivers has compound
pages mapped with PTEs..

> > I'm not sure what is right fix here. At the time adding __GFP_COMP was a
> > fix: see f3d48f0373c1.
> 
> The only thing special about this one, was that I failed to add
> __GFP_COMP at first.
> 
> The purpose of __GFP_COMP is to allow a >0-order page (originally, just
> a hugetlb page: see 2.5.60) to be mapped into userspace, and parts of it
> then subjected to get_user_pages (ptrace, futex, direct I/O, infiniband
> etc), and now even munmap, without destroying the integrity of the
> underlying >0-order page.
> 
> We don't bother with __GFP_COMP when a >0-order page cannot be mapped
> into userspace (except through /dev/mem or suchlike); we add __GFP_COMP
> when it might be, to get the right reference counting.

Wouldn't non-compound >0-order page allocation + split_page() work too?

> It's normal for set_page_dirty() to be called in the course of
> get_user_pages(), and it's normal for set_page_dirty() to be called
> when releasing the get_user_pages() references, and it's normal for
> set_page_dirty() to be called when munmap'ing a pte_dirty().
> 
> > 
> > Other odd part about __GFP_COMP here is that we have ->_mapcount in tail
> > pages to be used for both: mapcount of the individual page and for gup
> > pins. __compound_tail_refcounted() doesn't recognize that we don't need
> > tail page accounting for these pages.
> 
> So page->_mapcount of the tails is being used for both their mapcount
> and their reference count: that's certainly funny, and further reason
> to pursue your aim of simplifying the way THPs are refcounted.  But
> not responsible for any actual bug, I think?

GUP pin would screw up page_mapcount() on these pages. It would affect
memory stats for the process and probably something else.

I think we can get __compound_tail_refcounted() ignore these pages by
checking if page->mapping is NULL.

> > Hugh, I tried to ask you about the situation several times (last time on
> > the summit). Any comments?
> 
> I do remember we began a curtailed conversation about this at LSF/MM.
> I do not remember you asking about it earlier: when was that?

http://lkml.kernel.org/g/20141217004734.GA23150@node.dhcp.inet.fi

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 00/16] Sanitize usage of ->flags and ->mapping for tail pages
  2015-03-19 17:08 ` Kirill A. Shutemov
@ 2015-03-24 17:39   ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 119+ messages in thread
From: Konstantin Khlebnikov @ 2015-03-24 17:39 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, Dave Hansen,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, Linux Kernel Mailing List,
	linux-mm

On Thu, Mar 19, 2015 at 8:08 PM, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
> Currently we take naive approach to page flags on compound -- we set the
> flag on the page without consideration if the flag makes sense for tail
> page or for compound page in general. This patchset try to sort this out
> by defining per-flag policy on what need to be done if page-flag helper
> operate on compound page.
>
> The last patch in patchset also sanitize usege of page->mapping for tail
> pages. We don't define meaning of page->mapping for tail pages. Currently
> it's always NULL, which can be inconsistent with head page and potentially
> lead to problems.
>
> For now I catched one case of illigal usage of page flags or ->mapping:
> sound subsystem allocates pages with __GFP_COMP and maps them with PTEs.
> It leads to setting dirty bit on tail pages and access to tail_page's
> ->mapping. I don't see any bad behaviour caused by this, but worth fixing
> anyway.

Do you mean call of set_page_dirty() from zap_pte_range() ?
I think this should be replaced with vma operation:
vma->vm_ops->set_page_dirty()

>
> This patchset makes more sense if you take my THP refcounting into
> account: we will see more compound pages mapped with PTEs and we need to
> define behaviour of flags on compound pages to avoid bugs.
>
> Kirill A. Shutemov (16):
>   mm: consolidate all page-flags helpers in <linux/page-flags.h>
>   page-flags: trivial cleanup for PageTrans* helpers
>   page-flags: introduce page flags policies wrt compound pages
>   page-flags: define PG_locked behavior on compound pages
>   page-flags: define behavior of FS/IO-related flags on compound pages
>   page-flags: define behavior of LRU-related flags on compound pages
>   page-flags: define behavior SL*B-related flags on compound pages
>   page-flags: define behavior of Xen-related flags on compound pages
>   page-flags: define PG_reserved behavior on compound pages
>   page-flags: define PG_swapbacked behavior on compound pages
>   page-flags: define PG_swapcache behavior on compound pages
>   page-flags: define PG_mlocked behavior on compound pages
>   page-flags: define PG_uncached behavior on compound pages
>   page-flags: define PG_uptodate behavior on compound pages
>   page-flags: look on head page if the flag is encoded in page->mapping
>   mm: sanitize page->mapping for tail pages
>
>  fs/cifs/file.c             |   8 +-
>  include/linux/hugetlb.h    |   7 -
>  include/linux/ksm.h        |  17 ---
>  include/linux/mm.h         | 122 +----------------
>  include/linux/page-flags.h | 317 ++++++++++++++++++++++++++++++++++-----------
>  include/linux/pagemap.h    |  25 ++--
>  include/linux/poison.h     |   4 +
>  mm/filemap.c               |  15 ++-
>  mm/huge_memory.c           |   2 +-
>  mm/ksm.c                   |   2 +-
>  mm/memory-failure.c        |   2 +-
>  mm/memory.c                |   2 +-
>  mm/migrate.c               |   2 +-
>  mm/page_alloc.c            |   7 +
>  mm/shmem.c                 |   4 +-
>  mm/slub.c                  |   2 +
>  mm/swap_state.c            |   4 +-
>  mm/util.c                  |   5 +-
>  mm/vmscan.c                |   4 +-
>  mm/zswap.c                 |   4 +-
>  20 files changed, 294 insertions(+), 261 deletions(-)
>
> --
> 2.1.4
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 00/16] Sanitize usage of ->flags and ->mapping for tail pages
@ 2015-03-24 17:39   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 119+ messages in thread
From: Konstantin Khlebnikov @ 2015-03-24 17:39 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, Dave Hansen,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, Linux Kernel Mailing List,
	linux-mm

On Thu, Mar 19, 2015 at 8:08 PM, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
> Currently we take naive approach to page flags on compound -- we set the
> flag on the page without consideration if the flag makes sense for tail
> page or for compound page in general. This patchset try to sort this out
> by defining per-flag policy on what need to be done if page-flag helper
> operate on compound page.
>
> The last patch in patchset also sanitize usege of page->mapping for tail
> pages. We don't define meaning of page->mapping for tail pages. Currently
> it's always NULL, which can be inconsistent with head page and potentially
> lead to problems.
>
> For now I catched one case of illigal usage of page flags or ->mapping:
> sound subsystem allocates pages with __GFP_COMP and maps them with PTEs.
> It leads to setting dirty bit on tail pages and access to tail_page's
> ->mapping. I don't see any bad behaviour caused by this, but worth fixing
> anyway.

Do you mean call of set_page_dirty() from zap_pte_range() ?
I think this should be replaced with vma operation:
vma->vm_ops->set_page_dirty()

>
> This patchset makes more sense if you take my THP refcounting into
> account: we will see more compound pages mapped with PTEs and we need to
> define behaviour of flags on compound pages to avoid bugs.
>
> Kirill A. Shutemov (16):
>   mm: consolidate all page-flags helpers in <linux/page-flags.h>
>   page-flags: trivial cleanup for PageTrans* helpers
>   page-flags: introduce page flags policies wrt compound pages
>   page-flags: define PG_locked behavior on compound pages
>   page-flags: define behavior of FS/IO-related flags on compound pages
>   page-flags: define behavior of LRU-related flags on compound pages
>   page-flags: define behavior SL*B-related flags on compound pages
>   page-flags: define behavior of Xen-related flags on compound pages
>   page-flags: define PG_reserved behavior on compound pages
>   page-flags: define PG_swapbacked behavior on compound pages
>   page-flags: define PG_swapcache behavior on compound pages
>   page-flags: define PG_mlocked behavior on compound pages
>   page-flags: define PG_uncached behavior on compound pages
>   page-flags: define PG_uptodate behavior on compound pages
>   page-flags: look on head page if the flag is encoded in page->mapping
>   mm: sanitize page->mapping for tail pages
>
>  fs/cifs/file.c             |   8 +-
>  include/linux/hugetlb.h    |   7 -
>  include/linux/ksm.h        |  17 ---
>  include/linux/mm.h         | 122 +----------------
>  include/linux/page-flags.h | 317 ++++++++++++++++++++++++++++++++++-----------
>  include/linux/pagemap.h    |  25 ++--
>  include/linux/poison.h     |   4 +
>  mm/filemap.c               |  15 ++-
>  mm/huge_memory.c           |   2 +-
>  mm/ksm.c                   |   2 +-
>  mm/memory-failure.c        |   2 +-
>  mm/memory.c                |   2 +-
>  mm/migrate.c               |   2 +-
>  mm/page_alloc.c            |   7 +
>  mm/shmem.c                 |   4 +-
>  mm/slub.c                  |   2 +
>  mm/swap_state.c            |   4 +-
>  mm/util.c                  |   5 +-
>  mm/vmscan.c                |   4 +-
>  mm/zswap.c                 |   4 +-
>  20 files changed, 294 insertions(+), 261 deletions(-)
>
> --
> 2.1.4
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 00/16] Sanitize usage of ->flags and ->mapping for tail pages
  2015-03-24 17:39   ` Konstantin Khlebnikov
@ 2015-03-24 20:04     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-24 20:04 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli,
	Hugh Dickins, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, Linux Kernel Mailing List, linux-mm

On Tue, Mar 24, 2015 at 08:39:49PM +0300, Konstantin Khlebnikov wrote:
> On Thu, Mar 19, 2015 at 8:08 PM, Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> > Currently we take naive approach to page flags on compound -- we set the
> > flag on the page without consideration if the flag makes sense for tail
> > page or for compound page in general. This patchset try to sort this out
> > by defining per-flag policy on what need to be done if page-flag helper
> > operate on compound page.
> >
> > The last patch in patchset also sanitize usege of page->mapping for tail
> > pages. We don't define meaning of page->mapping for tail pages. Currently
> > it's always NULL, which can be inconsistent with head page and potentially
> > lead to problems.
> >
> > For now I catched one case of illigal usage of page flags or ->mapping:
> > sound subsystem allocates pages with __GFP_COMP and maps them with PTEs.
> > It leads to setting dirty bit on tail pages and access to tail_page's
> > ->mapping. I don't see any bad behaviour caused by this, but worth fixing
> > anyway.
> 
> Do you mean call of set_page_dirty() from zap_pte_range() ?

No. I trigger it earlier: set_page_dirty() from do_shared_fault().

> I think this should be replaced with vma operation:
> vma->vm_ops->set_page_dirty()

Does anybody know why would we want to dirtying pages with ->mapping ==
NULL?

I don't see a place where we can make any use of this. We probably could
avoid dirting such pages. Hm?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 00/16] Sanitize usage of ->flags and ->mapping for tail pages
@ 2015-03-24 20:04     ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-24 20:04 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli,
	Hugh Dickins, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, Linux Kernel Mailing List, linux-mm

On Tue, Mar 24, 2015 at 08:39:49PM +0300, Konstantin Khlebnikov wrote:
> On Thu, Mar 19, 2015 at 8:08 PM, Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> > Currently we take naive approach to page flags on compound -- we set the
> > flag on the page without consideration if the flag makes sense for tail
> > page or for compound page in general. This patchset try to sort this out
> > by defining per-flag policy on what need to be done if page-flag helper
> > operate on compound page.
> >
> > The last patch in patchset also sanitize usege of page->mapping for tail
> > pages. We don't define meaning of page->mapping for tail pages. Currently
> > it's always NULL, which can be inconsistent with head page and potentially
> > lead to problems.
> >
> > For now I catched one case of illigal usage of page flags or ->mapping:
> > sound subsystem allocates pages with __GFP_COMP and maps them with PTEs.
> > It leads to setting dirty bit on tail pages and access to tail_page's
> > ->mapping. I don't see any bad behaviour caused by this, but worth fixing
> > anyway.
> 
> Do you mean call of set_page_dirty() from zap_pte_range() ?

No. I trigger it earlier: set_page_dirty() from do_shared_fault().

> I think this should be replaced with vma operation:
> vma->vm_ops->set_page_dirty()

Does anybody know why would we want to dirtying pages with ->mapping ==
NULL?

I don't see a place where we can make any use of this. We probably could
avoid dirting such pages. Hm?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 05/16] page-flags: define behavior of FS/IO-related flags on compound pages
  2015-03-23 12:17           ` Kirill A. Shutemov
@ 2015-03-24 22:54             ` Hugh Dickins
  -1 siblings, 0 replies; 119+ messages in thread
From: Hugh Dickins @ 2015-03-24 22:54 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Dave Hansen, Kirill A. Shutemov, Andrew Morton,
	Andrea Arcangeli, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Jaroslav Kysela, Takashi Iwai,
	alsa-devel

On Mon, 23 Mar 2015, Kirill A. Shutemov wrote:
> On Sun, Mar 22, 2015 at 05:02:58PM -0700, Hugh Dickins wrote:
> > On Thu, 19 Mar 2015, Kirill A. Shutemov wrote:
> > > On Thu, Mar 19, 2015 at 11:29:52AM -0700, Dave Hansen wrote:
> > > > On 03/19/2015 10:08 AM, Kirill A. Shutemov wrote:
> > > > > The odd exception is PG_dirty: sound uses compound pages and maps them
> > > > > with PTEs. NO_COMPOUND triggers VM_BUG_ON() in set_page_dirty() on
> > > > > handling shared fault. Let's use HEAD for PG_dirty.
> > 
> > It really depends on what you do with PageDirty of the head, when you
> > get to support 4k pagecache with subpages of a huge compound page.
> > 
> > HEAD will be fine, so long as PageDirty on the head means the whole
> > huge page must be written back.  I expect that's what you will choose;
> > but one could consider that if a huge page is only mapped read-only,
> > but a few subpages of it writable, then only the few need be written
> > back, in which case ANY would be more appropriate.  NO_COMPOUND is
> > certainly wrong.
> > 
> > But that does illustrate that I consider this patch series premature:
> > it belongs with your huge pagecache implementation.  You seem to be
> > "tidying up" and adding overhead to things that are fine as they are.
> 
> I agree, it can be ANY too, since we don't use PG_dirty anywhere at the
> moment. My first thought was that it's better to match PG_dirty behaviour
> with LRU-related, but it's not necessary should be the case.

No, yes, we do treat Dirty differently from LRU.

> 
> BTW, do we make any use of PG_dirty on pages with ->mapping == NULL?

No use that I can recall; but I suppose it's possible there's some
driver which does make use of it (if so, then you should choose ANY).

> Should we avoid dirtying them in the first place?

I don't think so: to do so would add more branches in hot paths,
just to avoid a rare case which works fine without them; and
prevent a driver from using it, in the unlikely case that's so.

> 
> > > > Can we get the sound guys to look at this, btw?  It seems like an odd
> > > > thing that we probably don't want to keep around, right?
> > > 
> > > CC: +sound guys
> > 
> > I don't think this is peculiar to sound at all: there are other users
> > of __GFP_COMP in the tree, aren't there?  And although some of them
> > might turn out not to need it any more, I expect most of them still
> > need it for the same reason they did originally.
> 
> I haven't seen any other __GFP_COMP user which get it mapped to user-space
> with PTEs. Do you? Probably I haven't just stepped on it.

I don't know why a driver would use __GFP_COMP if it cannot get mapped
into user-space (except copy-and-paste from a driver that needed it to
a driver that did not): if there's no chance of mapping into userspace,
then an ordinary >0-order allocation is good enough, isn't it?

> 
> ... looking into code a bit more: at least one fb-drivers has compound
> pages mapped with PTEs..

Good, you've saved me from looking for them.  I would expect every
__GFP_COMP allocation to be mappable into user-space, with silly
exceptions.

> 
> > > I'm not sure what is right fix here. At the time adding __GFP_COMP was a
> > > fix: see f3d48f0373c1.
> > 
> > The only thing special about this one, was that I failed to add
> > __GFP_COMP at first.
> > 
> > The purpose of __GFP_COMP is to allow a >0-order page (originally, just
> > a hugetlb page: see 2.5.60) to be mapped into userspace, and parts of it
> > then subjected to get_user_pages (ptrace, futex, direct I/O, infiniband
> > etc), and now even munmap, without destroying the integrity of the
> > underlying >0-order page.
> > 
> > We don't bother with __GFP_COMP when a >0-order page cannot be mapped
> > into userspace (except through /dev/mem or suchlike); we add __GFP_COMP
> > when it might be, to get the right reference counting.
> 
> Wouldn't non-compound >0-order page allocation + split_page() work too?

That works very well for me in huge tmpfs, yes :)

But I think the typical __GFP_COMP-using driver wants one large
contiguous area that it holds as a single piece, without worrying
about the ref-counting implications of when it's mapped into
user-space, then partially unmapped, or accessed via get_user_pages.
It can't risk losing parts of its buffer at the whim of its users.

I expect you're right that drivers could be converted over to
manage their buffers differently, without __GFP_COMP.  But __GFP_COMP
existed already for hugetlbfs, and was easy for drivers to use safely:
the whole being held until the head is freed.  (And split_page() was
added later in history - I think so the surplus tail end of a high
order page could be freed immediately.)

> 
> > It's normal for set_page_dirty() to be called in the course of
> > get_user_pages(), and it's normal for set_page_dirty() to be called
> > when releasing the get_user_pages() references, and it's normal for
> > set_page_dirty() to be called when munmap'ing a pte_dirty().
> > 
> > > 
> > > Other odd part about __GFP_COMP here is that we have ->_mapcount in tail
> > > pages to be used for both: mapcount of the individual page and for gup
> > > pins. __compound_tail_refcounted() doesn't recognize that we don't need
> > > tail page accounting for these pages.
> > 
> > So page->_mapcount of the tails is being used for both their mapcount
> > and their reference count: that's certainly funny, and further reason
> > to pursue your aim of simplifying the way THPs are refcounted.  But
> > not responsible for any actual bug, I think?
> 
> GUP pin would screw up page_mapcount() on these pages. It would affect
> memory stats for the process and probably something else.

Yes, the GUP pin would increment page_mapcount() without an additional
mapping - but can only happen once the page has already been mapped,
so FILE_MAPPED stats unaffected?  I'm not sure; but surely it wouldn't
work as well when unmapped before unpinned, since the unmapping will
see "still mapped" and the unpinning won't do anything with FILE_MAPPED.

Unmapping before unpinning is an uncommon path; but it can't be ignored,
it is the path which demanded __GFP_COMP in the first place.

Looks like extending THP by-mapcount refcounting to other compound pages
was not such a good idea.  But since nobody has noticed, we may not need
a more urgent fix than your simplification of THP refcounting.

> 
> I think we can get __compound_tail_refcounted() ignore these pages by
> checking if page->mapping is NULL.

I forget what's in page->mapping on the THP tails.  Or do you mean
page->mapping of head?  It would be better not to rely on that, I'm
not certain that no driver could set page->mapping of compound head.
There's probably some field or flag on the tails that you could use;
but I don't know that it's needed in a hurry.

> 
> > > Hugh, I tried to ask you about the situation several times (last time on
> > > the summit). Any comments?
> > 
> > I do remember we began a curtailed conversation about this at LSF/MM.
> > I do not remember you asking about it earlier: when was that?
> 
> http://lkml.kernel.org/g/20141217004734.GA23150@node.dhcp.inet.fi

Hmm, curious: never reached me (and I should have seen that on linux-mm
even if not Cc'ed); unless I deleted it by accident, that's not unknown.

And in that you explain as I've said above, so you didn't really need
me anyway.

Hugh

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 05/16] page-flags: define behavior of FS/IO-related flags on compound pages
@ 2015-03-24 22:54             ` Hugh Dickins
  0 siblings, 0 replies; 119+ messages in thread
From: Hugh Dickins @ 2015-03-24 22:54 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Dave Hansen, Kirill A. Shutemov, Andrew Morton,
	Andrea Arcangeli, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Jaroslav Kysela, Takashi Iwai,
	alsa-devel

On Mon, 23 Mar 2015, Kirill A. Shutemov wrote:
> On Sun, Mar 22, 2015 at 05:02:58PM -0700, Hugh Dickins wrote:
> > On Thu, 19 Mar 2015, Kirill A. Shutemov wrote:
> > > On Thu, Mar 19, 2015 at 11:29:52AM -0700, Dave Hansen wrote:
> > > > On 03/19/2015 10:08 AM, Kirill A. Shutemov wrote:
> > > > > The odd exception is PG_dirty: sound uses compound pages and maps them
> > > > > with PTEs. NO_COMPOUND triggers VM_BUG_ON() in set_page_dirty() on
> > > > > handling shared fault. Let's use HEAD for PG_dirty.
> > 
> > It really depends on what you do with PageDirty of the head, when you
> > get to support 4k pagecache with subpages of a huge compound page.
> > 
> > HEAD will be fine, so long as PageDirty on the head means the whole
> > huge page must be written back.  I expect that's what you will choose;
> > but one could consider that if a huge page is only mapped read-only,
> > but a few subpages of it writable, then only the few need be written
> > back, in which case ANY would be more appropriate.  NO_COMPOUND is
> > certainly wrong.
> > 
> > But that does illustrate that I consider this patch series premature:
> > it belongs with your huge pagecache implementation.  You seem to be
> > "tidying up" and adding overhead to things that are fine as they are.
> 
> I agree, it can be ANY too, since we don't use PG_dirty anywhere at the
> moment. My first thought was that it's better to match PG_dirty behaviour
> with LRU-related, but it's not necessary should be the case.

No, yes, we do treat Dirty differently from LRU.

> 
> BTW, do we make any use of PG_dirty on pages with ->mapping == NULL?

No use that I can recall; but I suppose it's possible there's some
driver which does make use of it (if so, then you should choose ANY).

> Should we avoid dirtying them in the first place?

I don't think so: to do so would add more branches in hot paths,
just to avoid a rare case which works fine without them; and
prevent a driver from using it, in the unlikely case that's so.

> 
> > > > Can we get the sound guys to look at this, btw?  It seems like an odd
> > > > thing that we probably don't want to keep around, right?
> > > 
> > > CC: +sound guys
> > 
> > I don't think this is peculiar to sound at all: there are other users
> > of __GFP_COMP in the tree, aren't there?  And although some of them
> > might turn out not to need it any more, I expect most of them still
> > need it for the same reason they did originally.
> 
> I haven't seen any other __GFP_COMP user which get it mapped to user-space
> with PTEs. Do you? Probably I haven't just stepped on it.

I don't know why a driver would use __GFP_COMP if it cannot get mapped
into user-space (except copy-and-paste from a driver that needed it to
a driver that did not): if there's no chance of mapping into userspace,
then an ordinary >0-order allocation is good enough, isn't it?

> 
> ... looking into code a bit more: at least one fb-drivers has compound
> pages mapped with PTEs..

Good, you've saved me from looking for them.  I would expect every
__GFP_COMP allocation to be mappable into user-space, with silly
exceptions.

> 
> > > I'm not sure what is right fix here. At the time adding __GFP_COMP was a
> > > fix: see f3d48f0373c1.
> > 
> > The only thing special about this one, was that I failed to add
> > __GFP_COMP at first.
> > 
> > The purpose of __GFP_COMP is to allow a >0-order page (originally, just
> > a hugetlb page: see 2.5.60) to be mapped into userspace, and parts of it
> > then subjected to get_user_pages (ptrace, futex, direct I/O, infiniband
> > etc), and now even munmap, without destroying the integrity of the
> > underlying >0-order page.
> > 
> > We don't bother with __GFP_COMP when a >0-order page cannot be mapped
> > into userspace (except through /dev/mem or suchlike); we add __GFP_COMP
> > when it might be, to get the right reference counting.
> 
> Wouldn't non-compound >0-order page allocation + split_page() work too?

That works very well for me in huge tmpfs, yes :)

But I think the typical __GFP_COMP-using driver wants one large
contiguous area that it holds as a single piece, without worrying
about the ref-counting implications of when it's mapped into
user-space, then partially unmapped, or accessed via get_user_pages.
It can't risk losing parts of its buffer at the whim of its users.

I expect you're right that drivers could be converted over to
manage their buffers differently, without __GFP_COMP.  But __GFP_COMP
existed already for hugetlbfs, and was easy for drivers to use safely:
the whole being held until the head is freed.  (And split_page() was
added later in history - I think so the surplus tail end of a high
order page could be freed immediately.)

> 
> > It's normal for set_page_dirty() to be called in the course of
> > get_user_pages(), and it's normal for set_page_dirty() to be called
> > when releasing the get_user_pages() references, and it's normal for
> > set_page_dirty() to be called when munmap'ing a pte_dirty().
> > 
> > > 
> > > Other odd part about __GFP_COMP here is that we have ->_mapcount in tail
> > > pages to be used for both: mapcount of the individual page and for gup
> > > pins. __compound_tail_refcounted() doesn't recognize that we don't need
> > > tail page accounting for these pages.
> > 
> > So page->_mapcount of the tails is being used for both their mapcount
> > and their reference count: that's certainly funny, and further reason
> > to pursue your aim of simplifying the way THPs are refcounted.  But
> > not responsible for any actual bug, I think?
> 
> GUP pin would screw up page_mapcount() on these pages. It would affect
> memory stats for the process and probably something else.

Yes, the GUP pin would increment page_mapcount() without an additional
mapping - but can only happen once the page has already been mapped,
so FILE_MAPPED stats unaffected?  I'm not sure; but surely it wouldn't
work as well when unmapped before unpinned, since the unmapping will
see "still mapped" and the unpinning won't do anything with FILE_MAPPED.

Unmapping before unpinning is an uncommon path; but it can't be ignored,
it is the path which demanded __GFP_COMP in the first place.

Looks like extending THP by-mapcount refcounting to other compound pages
was not such a good idea.  But since nobody has noticed, we may not need
a more urgent fix than your simplification of THP refcounting.

> 
> I think we can get __compound_tail_refcounted() ignore these pages by
> checking if page->mapping is NULL.

I forget what's in page->mapping on the THP tails.  Or do you mean
page->mapping of head?  It would be better not to rely on that, I'm
not certain that no driver could set page->mapping of compound head.
There's probably some field or flag on the tails that you could use;
but I don't know that it's needed in a hurry.

> 
> > > Hugh, I tried to ask you about the situation several times (last time on
> > > the summit). Any comments?
> > 
> > I do remember we began a curtailed conversation about this at LSF/MM.
> > I do not remember you asking about it earlier: when was that?
> 
> http://lkml.kernel.org/g/20141217004734.GA23150@node.dhcp.inet.fi

Hmm, curious: never reached me (and I should have seen that on linux-mm
even if not Cc'ed); unless I deleted it by accident, that's not unknown.

And in that you explain as I've said above, so you didn't really need
me anyway.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 00/16] Sanitize usage of ->flags and ->mapping for tail pages
  2015-03-23 10:04     ` Kirill A. Shutemov
@ 2015-03-24 23:42       ` Hugh Dickins
  -1 siblings, 0 replies; 119+ messages in thread
From: Hugh Dickins @ 2015-03-24 23:42 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Kirill A. Shutemov, Andrew Morton,
	Andrea Arcangeli, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, linux-kernel, linux-mm

On Mon, 23 Mar 2015, Kirill A. Shutemov wrote:
> On Sun, Mar 22, 2015 at 05:28:47PM -0700, Hugh Dickins wrote:
> > On Thu, 19 Mar 2015, Kirill A. Shutemov wrote:
> > 
> > > Currently we take naive approach to page flags on compound -- we set the
> > > flag on the page without consideration if the flag makes sense for tail
> > > page or for compound page in general. This patchset try to sort this out
> > > by defining per-flag policy on what need to be done if page-flag helper
> > > operate on compound page.
> > > 
> > > The last patch in patchset also sanitize usege of page->mapping for tail
> > > pages. We don't define meaning of page->mapping for tail pages. Currently
> > > it's always NULL, which can be inconsistent with head page and potentially
> > > lead to problems.
> > > 
> > > For now I catched one case of illigal usage of page flags or ->mapping:
> > > sound subsystem allocates pages with __GFP_COMP and maps them with PTEs.
> > > It leads to setting dirty bit on tail pages and access to tail_page's
> > > ->mapping. I don't see any bad behaviour caused by this, but worth fixing
> > > anyway.
> > 
> > But there's nothing to fix there.  We're more used to having page->mapping
> > set by filesystems, but it is normal for drivers to have pages with NULL
> > page->mapping mapped into userspace (and it's not accidental that they
> > appear !PageAnon); and subpages of compound pages mapped into userspace,
> > and set_page_dirty applied to them.
> 
> Yes, it works until some sound driver decide it wants to use
> page->mappging.

(a) Why would it want to use page->mapping?
(b) What's the problem if it wants to use page->mapping?
(c) Or perhaps some __GFP_COMP driver does already use page->mapping?

The code works fine as is (er, modulo the fact that someone has tried
to use page_mapcount for two different things at the same time), and
has worked for years.

If new needs emerge, we can make suitable changes.  If your refcounting
rework needs a change here, fine, then just make these patches a part
of that set.  But please don't impose new rules for no reason.

> 
> It's just pure luck that it happened to work in this particular case.

We were lucky that it fitted together without needing extra code, yes.
But this didn't happen by accident, it was known and considered.

> 
> > > This patchset makes more sense if you take my THP refcounting into
> > > account: we will see more compound pages mapped with PTEs and we need to
> > > define behaviour of flags on compound pages to avoid bugs.
> > 
> > Yes, I quite understand that you want to clarify the usage of different
> > page flags to yourself, to help towards a policy of what to do with each
> > of them when subpages of a huge compound page are mapped into userspace;
> > but I don't see that we need this patchset in the kernel now, given that
> > it adds unnecessary overhead into several low-level inline functions.
> 
> We already have subpages of compound page mapped to userspace -- the sound
> case.
> 
> And what overhead are you talking about?
> 
> Check for compound or head bit is practically free in most cases since you
> are going to check other bits in the same cache line anyway. Probably a
> bit more expensive if the flag is encoded in ->mapping or somewhere else.
> (on 32-bit x86 ->mapping case is also free, since it's in the same cache
> line as ->flags).

Good that it's practically free on x86 (though your "practically"
suggests it's not quite free).  Then there's also the extra icache.

This is small stuff, I do agree (though small stuff concealed in
common inline functions we tend to think of as lightweight).

I care more about not adding unnecessary code,
and not fixing what's not broken.

> 
> You only need to pay the expense if you hit tail page which is very rare
> in current kernel. I think we can pay this cost for correctness.

But it's correct as is.

> 
> We will shave some cost of compound_head() if/when my refcounting patchset
> get merged: no need of barrier anymore.

And if these changes are necessary for that, sure, go ahead:
but as part of that work.

Hugh

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 00/16] Sanitize usage of ->flags and ->mapping for tail pages
@ 2015-03-24 23:42       ` Hugh Dickins
  0 siblings, 0 replies; 119+ messages in thread
From: Hugh Dickins @ 2015-03-24 23:42 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Kirill A. Shutemov, Andrew Morton,
	Andrea Arcangeli, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, linux-kernel, linux-mm

On Mon, 23 Mar 2015, Kirill A. Shutemov wrote:
> On Sun, Mar 22, 2015 at 05:28:47PM -0700, Hugh Dickins wrote:
> > On Thu, 19 Mar 2015, Kirill A. Shutemov wrote:
> > 
> > > Currently we take naive approach to page flags on compound -- we set the
> > > flag on the page without consideration if the flag makes sense for tail
> > > page or for compound page in general. This patchset try to sort this out
> > > by defining per-flag policy on what need to be done if page-flag helper
> > > operate on compound page.
> > > 
> > > The last patch in patchset also sanitize usege of page->mapping for tail
> > > pages. We don't define meaning of page->mapping for tail pages. Currently
> > > it's always NULL, which can be inconsistent with head page and potentially
> > > lead to problems.
> > > 
> > > For now I catched one case of illigal usage of page flags or ->mapping:
> > > sound subsystem allocates pages with __GFP_COMP and maps them with PTEs.
> > > It leads to setting dirty bit on tail pages and access to tail_page's
> > > ->mapping. I don't see any bad behaviour caused by this, but worth fixing
> > > anyway.
> > 
> > But there's nothing to fix there.  We're more used to having page->mapping
> > set by filesystems, but it is normal for drivers to have pages with NULL
> > page->mapping mapped into userspace (and it's not accidental that they
> > appear !PageAnon); and subpages of compound pages mapped into userspace,
> > and set_page_dirty applied to them.
> 
> Yes, it works until some sound driver decide it wants to use
> page->mappging.

(a) Why would it want to use page->mapping?
(b) What's the problem if it wants to use page->mapping?
(c) Or perhaps some __GFP_COMP driver does already use page->mapping?

The code works fine as is (er, modulo the fact that someone has tried
to use page_mapcount for two different things at the same time), and
has worked for years.

If new needs emerge, we can make suitable changes.  If your refcounting
rework needs a change here, fine, then just make these patches a part
of that set.  But please don't impose new rules for no reason.

> 
> It's just pure luck that it happened to work in this particular case.

We were lucky that it fitted together without needing extra code, yes.
But this didn't happen by accident, it was known and considered.

> 
> > > This patchset makes more sense if you take my THP refcounting into
> > > account: we will see more compound pages mapped with PTEs and we need to
> > > define behaviour of flags on compound pages to avoid bugs.
> > 
> > Yes, I quite understand that you want to clarify the usage of different
> > page flags to yourself, to help towards a policy of what to do with each
> > of them when subpages of a huge compound page are mapped into userspace;
> > but I don't see that we need this patchset in the kernel now, given that
> > it adds unnecessary overhead into several low-level inline functions.
> 
> We already have subpages of compound page mapped to userspace -- the sound
> case.
> 
> And what overhead are you talking about?
> 
> Check for compound or head bit is practically free in most cases since you
> are going to check other bits in the same cache line anyway. Probably a
> bit more expensive if the flag is encoded in ->mapping or somewhere else.
> (on 32-bit x86 ->mapping case is also free, since it's in the same cache
> line as ->flags).

Good that it's practically free on x86 (though your "practically"
suggests it's not quite free).  Then there's also the extra icache.

This is small stuff, I do agree (though small stuff concealed in
common inline functions we tend to think of as lightweight).

I care more about not adding unnecessary code,
and not fixing what's not broken.

> 
> You only need to pay the expense if you hit tail page which is very rare
> in current kernel. I think we can pay this cost for correctness.

But it's correct as is.

> 
> We will shave some cost of compound_head() if/when my refcounting patchset
> get merged: no need of barrier anymore.

And if these changes are necessary for that, sure, go ahead:
but as part of that work.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 05/16] page-flags: define behavior of FS/IO-related flags on compound pages
  2015-03-24 22:54             ` Hugh Dickins
@ 2015-03-25 10:23               ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-25 10:23 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Hansen, Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm,
	Jaroslav Kysela, Takashi Iwai, alsa-devel

On Tue, Mar 24, 2015 at 03:54:00PM -0700, Hugh Dickins wrote:
> On Mon, 23 Mar 2015, Kirill A. Shutemov wrote:
> > Should we avoid dirtying them in the first place?
> 
> I don't think so: to do so would add more branches in hot paths,
> just to avoid a rare case which works fine without them; and
> prevent a driver from using it, in the unlikely case that's so.

It's branches vs. useless atomic oprations.

> > GUP pin would screw up page_mapcount() on these pages. It would affect
> > memory stats for the process and probably something else.
> 
> Yes, the GUP pin would increment page_mapcount() without an additional
> mapping - but can only happen once the page has already been mapped,
> so FILE_MAPPED stats unaffected?  I'm not sure; but surely it wouldn't
> work as well when unmapped before unpinned, since the unmapping will
> see "still mapped" and the unpinning won't do anything with FILE_MAPPED.
> 
> Unmapping before unpinning is an uncommon path; but it can't be ignored,
> it is the path which demanded __GFP_COMP in the first place.
> 
> Looks like extending THP by-mapcount refcounting to other compound pages
> was not such a good idea.  But since nobody has noticed, we may not need
> a more urgent fix than your simplification of THP refcounting.

I think PSS and /proc/kpagecount are broken by this.

> > I think we can get __compound_tail_refcounted() ignore these pages by
> > checking if page->mapping is NULL.
> 
> I forget what's in page->mapping on the THP tails.

NULL. We never set ->mapping on any tail pages. That's why I want outlaw
using that value: it's just doesn't match with head page ->mapping for
some of compound pages. And for others it matches just because nobody
touches it for any subpage.

> Or do you mean page->mapping of head?  It would be better not to rely on
> that, I'm not certain that no driver could set page->mapping of compound
> head.  There's probably some field or flag on the tails that you could
> use; but I don't know that it's needed in a hurry.

We only need tail refcounting for THP, so I think this should fix the issue:

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 4a3a38522ab4..9ab432660adb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -456,7 +456,7 @@ static inline int page_count(struct page *page)
 
 static inline bool __compound_tail_refcounted(struct page *page)
 {
-       return !PageSlab(page) && !PageHeadHuge(page);
+       return !PageSlab(page) && !PageHeadHuge(page) && PageAnon(page);
 }
 
 /*

-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* Re: [PATCH 05/16] page-flags: define behavior of FS/IO-related flags on compound pages
@ 2015-03-25 10:23               ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-25 10:23 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Hansen, Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm,
	Jaroslav Kysela, Takashi Iwai, alsa-devel

On Tue, Mar 24, 2015 at 03:54:00PM -0700, Hugh Dickins wrote:
> On Mon, 23 Mar 2015, Kirill A. Shutemov wrote:
> > Should we avoid dirtying them in the first place?
> 
> I don't think so: to do so would add more branches in hot paths,
> just to avoid a rare case which works fine without them; and
> prevent a driver from using it, in the unlikely case that's so.

It's branches vs. useless atomic oprations.

> > GUP pin would screw up page_mapcount() on these pages. It would affect
> > memory stats for the process and probably something else.
> 
> Yes, the GUP pin would increment page_mapcount() without an additional
> mapping - but can only happen once the page has already been mapped,
> so FILE_MAPPED stats unaffected?  I'm not sure; but surely it wouldn't
> work as well when unmapped before unpinned, since the unmapping will
> see "still mapped" and the unpinning won't do anything with FILE_MAPPED.
> 
> Unmapping before unpinning is an uncommon path; but it can't be ignored,
> it is the path which demanded __GFP_COMP in the first place.
> 
> Looks like extending THP by-mapcount refcounting to other compound pages
> was not such a good idea.  But since nobody has noticed, we may not need
> a more urgent fix than your simplification of THP refcounting.

I think PSS and /proc/kpagecount are broken by this.

> > I think we can get __compound_tail_refcounted() ignore these pages by
> > checking if page->mapping is NULL.
> 
> I forget what's in page->mapping on the THP tails.

NULL. We never set ->mapping on any tail pages. That's why I want outlaw
using that value: it's just doesn't match with head page ->mapping for
some of compound pages. And for others it matches just because nobody
touches it for any subpage.

> Or do you mean page->mapping of head?  It would be better not to rely on
> that, I'm not certain that no driver could set page->mapping of compound
> head.  There's probably some field or flag on the tails that you could
> use; but I don't know that it's needed in a hurry.

We only need tail refcounting for THP, so I think this should fix the issue:

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 4a3a38522ab4..9ab432660adb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -456,7 +456,7 @@ static inline int page_count(struct page *page)
 
 static inline bool __compound_tail_refcounted(struct page *page)
 {
-       return !PageSlab(page) && !PageHeadHuge(page);
+       return !PageSlab(page) && !PageHeadHuge(page) && PageAnon(page);
 }
 
 /*

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* Re: [PATCH 00/16] Sanitize usage of ->flags and ->mapping for tail pages
  2015-03-24 23:42       ` Hugh Dickins
@ 2015-03-25 10:55         ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-25 10:55 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli, Dave Hansen,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On Tue, Mar 24, 2015 at 04:42:48PM -0700, Hugh Dickins wrote:
> On Mon, 23 Mar 2015, Kirill A. Shutemov wrote:
> > Yes, it works until some sound driver decide it wants to use
> > page->mappging.
> 
> (a) Why would it want to use page->mapping?

No idea.

> (b) What's the problem if it wants to use page->mapping?

It would need to be initalized for all subpages to get core mm see correct
value. And this doesn't match with current ->mapping users of __GFP_COMP
page (THP and hugetlb) which initialize ->mapping only for head pages.

> (c) Or perhaps some __GFP_COMP driver does already use page->mapping?

I haven't found any.

> > It's just pure luck that it happened to work in this particular case.
> 
> We were lucky that it fitted together without needing extra code, yes.
> But this didn't happen by accident, it was known and considered.

I don't agree it was considered well enough.

> > You only need to pay the expense if you hit tail page which is very rare
> > in current kernel. I think we can pay this cost for correctness.
> 
> But it's correct as is.

See above.

> > 
> > We will shave some cost of compound_head() if/when my refcounting patchset
> > get merged: no need of barrier anymore.
> 
> And if these changes are necessary for that, sure, go ahead:
> but as part of that work.

I believe the patchset has value by its own. And having it merged makes my
life easier. But up to Andrew.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 00/16] Sanitize usage of ->flags and ->mapping for tail pages
@ 2015-03-25 10:55         ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-25 10:55 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli, Dave Hansen,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On Tue, Mar 24, 2015 at 04:42:48PM -0700, Hugh Dickins wrote:
> On Mon, 23 Mar 2015, Kirill A. Shutemov wrote:
> > Yes, it works until some sound driver decide it wants to use
> > page->mappging.
> 
> (a) Why would it want to use page->mapping?

No idea.

> (b) What's the problem if it wants to use page->mapping?

It would need to be initalized for all subpages to get core mm see correct
value. And this doesn't match with current ->mapping users of __GFP_COMP
page (THP and hugetlb) which initialize ->mapping only for head pages.

> (c) Or perhaps some __GFP_COMP driver does already use page->mapping?

I haven't found any.

> > It's just pure luck that it happened to work in this particular case.
> 
> We were lucky that it fitted together without needing extra code, yes.
> But this didn't happen by accident, it was known and considered.

I don't agree it was considered well enough.

> > You only need to pay the expense if you hit tail page which is very rare
> > in current kernel. I think we can pay this cost for correctness.
> 
> But it's correct as is.

See above.

> > 
> > We will shave some cost of compound_head() if/when my refcounting patchset
> > get merged: no need of barrier anymore.
> 
> And if these changes are necessary for that, sure, go ahead:
> but as part of that work.

I believe the patchset has value by its own. And having it merged makes my
life easier. But up to Andrew.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 05/16] page-flags: define behavior of FS/IO-related flags on compound pages
  2015-03-25 10:23               ` Kirill A. Shutemov
@ 2015-03-25 18:56                 ` Hugh Dickins
  -1 siblings, 0 replies; 119+ messages in thread
From: Hugh Dickins @ 2015-03-25 18:56 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Dave Hansen, Kirill A. Shutemov, Andrew Morton,
	Andrea Arcangeli, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Jaroslav Kysela, Takashi Iwai,
	alsa-devel

On Wed, 25 Mar 2015, Kirill A. Shutemov wrote:
> 
> We only need tail refcounting for THP, so I think this should fix the issue:
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 4a3a38522ab4..9ab432660adb 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -456,7 +456,7 @@ static inline int page_count(struct page *page)
>  
>  static inline bool __compound_tail_refcounted(struct page *page)
>  {
> -       return !PageSlab(page) && !PageHeadHuge(page);
> +       return !PageSlab(page) && !PageHeadHuge(page) && PageAnon(page);
>  }
>  
>  /*

Yes, that should be a good fix for the mapcount issue.
And no coincidence that it's just what I needed too,
when reusing the PG_compound_lock bit: see my 10/24
(which had to rearrange mm.h. not having your 1/16).

Hugh

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 05/16] page-flags: define behavior of FS/IO-related flags on compound pages
@ 2015-03-25 18:56                 ` Hugh Dickins
  0 siblings, 0 replies; 119+ messages in thread
From: Hugh Dickins @ 2015-03-25 18:56 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Dave Hansen, Kirill A. Shutemov, Andrew Morton,
	Andrea Arcangeli, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Jaroslav Kysela, Takashi Iwai,
	alsa-devel

On Wed, 25 Mar 2015, Kirill A. Shutemov wrote:
> 
> We only need tail refcounting for THP, so I think this should fix the issue:
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 4a3a38522ab4..9ab432660adb 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -456,7 +456,7 @@ static inline int page_count(struct page *page)
>  
>  static inline bool __compound_tail_refcounted(struct page *page)
>  {
> -       return !PageSlab(page) && !PageHeadHuge(page);
> +       return !PageSlab(page) && !PageHeadHuge(page) && PageAnon(page);
>  }
>  
>  /*

Yes, that should be a good fix for the mapcount issue.
And no coincidence that it's just what I needed too,
when reusing the PG_compound_lock bit: see my 10/24
(which had to rearrange mm.h. not having your 1/16).

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 04/16] page-flags: define PG_locked behavior on compound pages
  2015-03-19 17:08   ` Kirill A. Shutemov
@ 2015-03-27 15:11     ` Mateusz Krawczuk
  -1 siblings, 0 replies; 119+ messages in thread
From: Mateusz Krawczuk @ 2015-03-27 15:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, linux-kernel

Hi!

This patch breaks build of linux next since 2015-03-25 on arm using 
exynos_defconfig with arm-linux-gnueabi-linaro_4.7.4-2014.04, 
arm-linux-gnueabi-linaro_4.8.3-2014.04 and 
arm-linux-gnueabi-4.7.3-12ubuntu1(from ubuntu 14.04 lts). Compiler shows 
this error message:
mm/migrate.c: In function ‘migrate_pages’:
mm/migrate.c:1148:1: internal compiler error: in push_minipool_fix, at 
config/arm/arm.c:13500
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.7/README.Bugs> for instructions.

It builds fine with arm-linux-gnueabi-linaro_4.9.1-2014.07.

Best Regards
Mateusz Krawczuk
Samsung R&D Institute Poland

  dniu 19.03.2015 o 18:08, Kirill A. Shutemov pisze:
> lock_page() must operate on the whole compound page. It doesn't make
> much sense to lock part of compound page. Change code to use head page's
> PG_locked, if tail page is passed.
>
> This patch also get rid of custom helprer functions --
> __set_page_locked() and __clear_page_locked(). They replaced with
> helpers generated by __SETPAGEFLAG/__CLEARPAGEFLAG. Tail pages to these
> helper would trigger VM_BUG_ON().
>
> SLUB uses PG_locked as a bit spin locked. IIUC, tail pages should never
> appear there. VM_BUG_ON() is added to make sure that this assumption is
> correct.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>   fs/cifs/file.c             |  8 ++++----
>   include/linux/page-flags.h |  2 +-
>   include/linux/pagemap.h    | 25 ++++++++-----------------
>   mm/filemap.c               | 15 +++++++++------
>   mm/ksm.c                   |  2 +-
>   mm/memory-failure.c        |  2 +-
>   mm/migrate.c               |  2 +-
>   mm/shmem.c                 |  4 ++--
>   mm/slub.c                  |  2 ++
>   mm/swap_state.c            |  4 ++--
>   mm/vmscan.c                |  4 ++--
>   mm/zswap.c                 |  4 ++--
>   12 files changed, 35 insertions(+), 39 deletions(-)
>
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index ca30c391a894..b9fd85dfee9b 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -3413,13 +3413,13 @@ readpages_get_pages(struct address_space *mapping, struct list_head *page_list,
>   	 * should have access to this page, we're safe to simply set
>   	 * PG_locked without checking it first.
>   	 */
> -	__set_page_locked(page);
> +	__SetPageLocked(page);
>   	rc = add_to_page_cache_locked(page, mapping,
>   				      page->index, GFP_KERNEL);
>
>   	/* give up if we can't stick it in the cache */
>   	if (rc) {
> -		__clear_page_locked(page);
> +		__ClearPageLocked(page);
>   		return rc;
>   	}
>
> @@ -3440,10 +3440,10 @@ readpages_get_pages(struct address_space *mapping, struct list_head *page_list,
>   		if (*bytes + PAGE_CACHE_SIZE > rsize)
>   			break;
>
> -		__set_page_locked(page);
> +		__SetPageLocked(page);
>   		if (add_to_page_cache_locked(page, mapping, page->index,
>   								GFP_KERNEL)) {
> -			__clear_page_locked(page);
> +			__ClearPageLocked(page);
>   			break;
>   		}
>   		list_move_tail(&page->lru, tmplist);
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 32ea62c0ad30..10bdde20b14c 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -269,7 +269,7 @@ static inline struct page *compound_head_fast(struct page *page)
>   	return page;
>   }
>
> -TESTPAGEFLAG(Locked, locked, ANY)
> +__PAGEFLAG(Locked, locked, NO_TAIL)
>   PAGEFLAG(Error, error, ANY) TESTCLEARFLAG(Error, error, ANY)
>   PAGEFLAG(Referenced, referenced, ANY) TESTCLEARFLAG(Referenced, referenced, ANY)
>   	__SETPAGEFLAG(Referenced, referenced, ANY)
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 4b3736f7065c..7c3790764795 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -426,18 +426,9 @@ extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
>   				unsigned int flags);
>   extern void unlock_page(struct page *page);
>
> -static inline void __set_page_locked(struct page *page)
> -{
> -	__set_bit(PG_locked, &page->flags);
> -}
> -
> -static inline void __clear_page_locked(struct page *page)
> -{
> -	__clear_bit(PG_locked, &page->flags);
> -}
> -
>   static inline int trylock_page(struct page *page)
>   {
> +	page = compound_head(page);
>   	return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
>   }
>
> @@ -490,9 +481,9 @@ extern int wait_on_page_bit_killable_timeout(struct page *page,
>
>   static inline int wait_on_page_locked_killable(struct page *page)
>   {
> -	if (PageLocked(page))
> -		return wait_on_page_bit_killable(page, PG_locked);
> -	return 0;
> +	if (!PageLocked(page))
> +		return 0;
> +	return wait_on_page_bit_killable(compound_head(page), PG_locked);
>   }
>
>   extern wait_queue_head_t *page_waitqueue(struct page *page);
> @@ -511,7 +502,7 @@ static inline void wake_up_page(struct page *page, int bit)
>   static inline void wait_on_page_locked(struct page *page)
>   {
>   	if (PageLocked(page))
> -		wait_on_page_bit(page, PG_locked);
> +		wait_on_page_bit(compound_head(page), PG_locked);
>   }
>
>   /*
> @@ -656,17 +647,17 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask);
>
>   /*
>    * Like add_to_page_cache_locked, but used to add newly allocated pages:
> - * the page is new, so we can just run __set_page_locked() against it.
> + * the page is new, so we can just run __SetPageLocked() against it.
>    */
>   static inline int add_to_page_cache(struct page *page,
>   		struct address_space *mapping, pgoff_t offset, gfp_t gfp_mask)
>   {
>   	int error;
>
> -	__set_page_locked(page);
> +	__SetPageLocked(page);
>   	error = add_to_page_cache_locked(page, mapping, offset, gfp_mask);
>   	if (unlikely(error))
> -		__clear_page_locked(page);
> +		__ClearPageLocked(page);
>   	return error;
>   }
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 12548d03c11d..467768d4263b 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -615,11 +615,11 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
>   	void *shadow = NULL;
>   	int ret;
>
> -	__set_page_locked(page);
> +	__SetPageLocked(page);
>   	ret = __add_to_page_cache_locked(page, mapping, offset,
>   					 gfp_mask, &shadow);
>   	if (unlikely(ret))
> -		__clear_page_locked(page);
> +		__ClearPageLocked(page);
>   	else {
>   		/*
>   		 * The page might have been evicted from cache only
> @@ -742,6 +742,7 @@ EXPORT_SYMBOL_GPL(add_page_wait_queue);
>    */
>   void unlock_page(struct page *page)
>   {
> +	page = compound_head(page);
>   	VM_BUG_ON_PAGE(!PageLocked(page), page);
>   	clear_bit_unlock(PG_locked, &page->flags);
>   	smp_mb__after_atomic();
> @@ -806,18 +807,20 @@ EXPORT_SYMBOL_GPL(page_endio);
>    */
>   void __lock_page(struct page *page)
>   {
> -	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
> +	struct page *page_head = compound_head(page);
> +	DEFINE_WAIT_BIT(wait, &page_head->flags, PG_locked);
>
> -	__wait_on_bit_lock(page_waitqueue(page), &wait, bit_wait_io,
> +	__wait_on_bit_lock(page_waitqueue(page_head), &wait, bit_wait_io,
>   							TASK_UNINTERRUPTIBLE);
>   }
>   EXPORT_SYMBOL(__lock_page);
>
>   int __lock_page_killable(struct page *page)
>   {
> -	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
> +	struct page *page_head = compound_head(page);
> +	DEFINE_WAIT_BIT(wait, &page_head->flags, PG_locked);
>
> -	return __wait_on_bit_lock(page_waitqueue(page), &wait,
> +	return __wait_on_bit_lock(page_waitqueue(page_head), &wait,
>   					bit_wait_io, TASK_KILLABLE);
>   }
>   EXPORT_SYMBOL_GPL(__lock_page_killable);
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 4162dce2eb44..23138e99a531 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1884,7 +1884,7 @@ struct page *ksm_might_need_to_copy(struct page *page,
>
>   		SetPageDirty(new_page);
>   		__SetPageUptodate(new_page);
> -		__set_page_locked(new_page);
> +		__SetPageLocked(new_page);
>   	}
>
>   	return new_page;
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index d487f8dc6d39..399eee44d13d 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1136,7 +1136,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
>   	/*
>   	 * We ignore non-LRU pages for good reasons.
>   	 * - PG_locked is only well defined for LRU pages and a few others
> -	 * - to avoid races with __set_page_locked()
> +	 * - to avoid races with __SetPageLocked()
>   	 * - to avoid races with __SetPageSlab*() (and more non-atomic ops)
>   	 * The check (unnecessarily) ignores LRU pages being isolated and
>   	 * walked by the page reclaim code, however that's not a big loss.
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 6aa9a4222ea9..114602a68111 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1734,7 +1734,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
>   		flush_tlb_range(vma, mmun_start, mmun_end);
>
>   	/* Prepare a page as a migration target */
> -	__set_page_locked(new_page);
> +	__SetPageLocked(new_page);
>   	SetPageSwapBacked(new_page);
>
>   	/* anon mapping, we can simply copy page->mapping to the new page: */
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 80b360c7bcd1..2e2b943c8e62 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -981,7 +981,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
>   	copy_highpage(newpage, oldpage);
>   	flush_dcache_page(newpage);
>
> -	__set_page_locked(newpage);
> +	__SetPageLocked(newpage);
>   	SetPageUptodate(newpage);
>   	SetPageSwapBacked(newpage);
>   	set_page_private(newpage, swap_index);
> @@ -1173,7 +1173,7 @@ repeat:
>   		}
>
>   		__SetPageSwapBacked(page);
> -		__set_page_locked(page);
> +		__SetPageLocked(page);
>   		if (sgp == SGP_WRITE)
>   			__SetPageReferenced(page);
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 2584d4ff02eb..f33ae2b7a5e7 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -338,11 +338,13 @@ static inline int oo_objects(struct kmem_cache_order_objects x)
>    */
>   static __always_inline void slab_lock(struct page *page)
>   {
> +	VM_BUG_ON_PAGE(PageTail(page), page);
>   	bit_spin_lock(PG_locked, &page->flags);
>   }
>
>   static __always_inline void slab_unlock(struct page *page)
>   {
> +	VM_BUG_ON_PAGE(PageTail(page), page);
>   	__bit_spin_unlock(PG_locked, &page->flags);
>   }
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 405923f77334..d1c4a25b4362 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -357,7 +357,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>   		}
>
>   		/* May fail (-ENOMEM) if radix-tree node allocation failed. */
> -		__set_page_locked(new_page);
> +		__SetPageLocked(new_page);
>   		SetPageSwapBacked(new_page);
>   		err = __add_to_swap_cache(new_page, entry);
>   		if (likely(!err)) {
> @@ -371,7 +371,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>   		}
>   		radix_tree_preload_end();
>   		ClearPageSwapBacked(new_page);
> -		__clear_page_locked(new_page);
> +		__ClearPageLocked(new_page);
>   		/*
>   		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
>   		 * clear SWAP_HAS_CACHE flag.
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 260c413d39cd..dc6cd51577a6 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1062,7 +1062,7 @@ unmap:
>   				VM_BUG_ON_PAGE(PageSwapCache(page), page);
>   				if (!page_freeze_refs(page, 1))
>   					goto keep_locked;
> -				__clear_page_locked(page);
> +				__ClearPageLocked(page);
>   				count_vm_event(PGLAZYFREED);
>   				goto free_it;
>   			}
> @@ -1174,7 +1174,7 @@ unmap:
>   		 * we obviously don't have to worry about waking up a process
>   		 * waiting on the page lock, because there are no references.
>   		 */
> -		__clear_page_locked(page);
> +		__ClearPageLocked(page);
>   free_it:
>   		nr_reclaimed++;
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 4249e82ff934..f8583f1fc938 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -490,7 +490,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
>   		}
>
>   		/* May fail (-ENOMEM) if radix-tree node allocation failed. */
> -		__set_page_locked(new_page);
> +		__SetPageLocked(new_page);
>   		SetPageSwapBacked(new_page);
>   		err = __add_to_swap_cache(new_page, entry);
>   		if (likely(!err)) {
> @@ -501,7 +501,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
>   		}
>   		radix_tree_preload_end();
>   		ClearPageSwapBacked(new_page);
> -		__clear_page_locked(new_page);
> +		__ClearPageLocked(new_page);
>   		/*
>   		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
>   		 * clear SWAP_HAS_CACHE flag.
>



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 04/16] page-flags: define PG_locked behavior on compound pages
@ 2015-03-27 15:11     ` Mateusz Krawczuk
  0 siblings, 0 replies; 119+ messages in thread
From: Mateusz Krawczuk @ 2015-03-27 15:11 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel

Hi!

This patch breaks build of linux next since 2015-03-25 on arm using 
exynos_defconfig with arm-linux-gnueabi-linaro_4.7.4-2014.04, 
arm-linux-gnueabi-linaro_4.8.3-2014.04 and 
arm-linux-gnueabi-4.7.3-12ubuntu1(from ubuntu 14.04 lts). Compiler shows 
this error message:
mm/migrate.c: In function ?migrate_pages?:
mm/migrate.c:1148:1: internal compiler error: in push_minipool_fix, at 
config/arm/arm.c:13500
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.7/README.Bugs> for instructions.

It builds fine with arm-linux-gnueabi-linaro_4.9.1-2014.07.

Best Regards
Mateusz Krawczuk
Samsung R&D Institute Poland

  dniu 19.03.2015 o 18:08, Kirill A. Shutemov pisze:
> lock_page() must operate on the whole compound page. It doesn't make
> much sense to lock part of compound page. Change code to use head page's
> PG_locked, if tail page is passed.
>
> This patch also get rid of custom helprer functions --
> __set_page_locked() and __clear_page_locked(). They replaced with
> helpers generated by __SETPAGEFLAG/__CLEARPAGEFLAG. Tail pages to these
> helper would trigger VM_BUG_ON().
>
> SLUB uses PG_locked as a bit spin locked. IIUC, tail pages should never
> appear there. VM_BUG_ON() is added to make sure that this assumption is
> correct.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>   fs/cifs/file.c             |  8 ++++----
>   include/linux/page-flags.h |  2 +-
>   include/linux/pagemap.h    | 25 ++++++++-----------------
>   mm/filemap.c               | 15 +++++++++------
>   mm/ksm.c                   |  2 +-
>   mm/memory-failure.c        |  2 +-
>   mm/migrate.c               |  2 +-
>   mm/shmem.c                 |  4 ++--
>   mm/slub.c                  |  2 ++
>   mm/swap_state.c            |  4 ++--
>   mm/vmscan.c                |  4 ++--
>   mm/zswap.c                 |  4 ++--
>   12 files changed, 35 insertions(+), 39 deletions(-)
>
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index ca30c391a894..b9fd85dfee9b 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -3413,13 +3413,13 @@ readpages_get_pages(struct address_space *mapping, struct list_head *page_list,
>   	 * should have access to this page, we're safe to simply set
>   	 * PG_locked without checking it first.
>   	 */
> -	__set_page_locked(page);
> +	__SetPageLocked(page);
>   	rc = add_to_page_cache_locked(page, mapping,
>   				      page->index, GFP_KERNEL);
>
>   	/* give up if we can't stick it in the cache */
>   	if (rc) {
> -		__clear_page_locked(page);
> +		__ClearPageLocked(page);
>   		return rc;
>   	}
>
> @@ -3440,10 +3440,10 @@ readpages_get_pages(struct address_space *mapping, struct list_head *page_list,
>   		if (*bytes + PAGE_CACHE_SIZE > rsize)
>   			break;
>
> -		__set_page_locked(page);
> +		__SetPageLocked(page);
>   		if (add_to_page_cache_locked(page, mapping, page->index,
>   								GFP_KERNEL)) {
> -			__clear_page_locked(page);
> +			__ClearPageLocked(page);
>   			break;
>   		}
>   		list_move_tail(&page->lru, tmplist);
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 32ea62c0ad30..10bdde20b14c 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -269,7 +269,7 @@ static inline struct page *compound_head_fast(struct page *page)
>   	return page;
>   }
>
> -TESTPAGEFLAG(Locked, locked, ANY)
> +__PAGEFLAG(Locked, locked, NO_TAIL)
>   PAGEFLAG(Error, error, ANY) TESTCLEARFLAG(Error, error, ANY)
>   PAGEFLAG(Referenced, referenced, ANY) TESTCLEARFLAG(Referenced, referenced, ANY)
>   	__SETPAGEFLAG(Referenced, referenced, ANY)
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 4b3736f7065c..7c3790764795 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -426,18 +426,9 @@ extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
>   				unsigned int flags);
>   extern void unlock_page(struct page *page);
>
> -static inline void __set_page_locked(struct page *page)
> -{
> -	__set_bit(PG_locked, &page->flags);
> -}
> -
> -static inline void __clear_page_locked(struct page *page)
> -{
> -	__clear_bit(PG_locked, &page->flags);
> -}
> -
>   static inline int trylock_page(struct page *page)
>   {
> +	page = compound_head(page);
>   	return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
>   }
>
> @@ -490,9 +481,9 @@ extern int wait_on_page_bit_killable_timeout(struct page *page,
>
>   static inline int wait_on_page_locked_killable(struct page *page)
>   {
> -	if (PageLocked(page))
> -		return wait_on_page_bit_killable(page, PG_locked);
> -	return 0;
> +	if (!PageLocked(page))
> +		return 0;
> +	return wait_on_page_bit_killable(compound_head(page), PG_locked);
>   }
>
>   extern wait_queue_head_t *page_waitqueue(struct page *page);
> @@ -511,7 +502,7 @@ static inline void wake_up_page(struct page *page, int bit)
>   static inline void wait_on_page_locked(struct page *page)
>   {
>   	if (PageLocked(page))
> -		wait_on_page_bit(page, PG_locked);
> +		wait_on_page_bit(compound_head(page), PG_locked);
>   }
>
>   /*
> @@ -656,17 +647,17 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask);
>
>   /*
>    * Like add_to_page_cache_locked, but used to add newly allocated pages:
> - * the page is new, so we can just run __set_page_locked() against it.
> + * the page is new, so we can just run __SetPageLocked() against it.
>    */
>   static inline int add_to_page_cache(struct page *page,
>   		struct address_space *mapping, pgoff_t offset, gfp_t gfp_mask)
>   {
>   	int error;
>
> -	__set_page_locked(page);
> +	__SetPageLocked(page);
>   	error = add_to_page_cache_locked(page, mapping, offset, gfp_mask);
>   	if (unlikely(error))
> -		__clear_page_locked(page);
> +		__ClearPageLocked(page);
>   	return error;
>   }
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 12548d03c11d..467768d4263b 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -615,11 +615,11 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
>   	void *shadow = NULL;
>   	int ret;
>
> -	__set_page_locked(page);
> +	__SetPageLocked(page);
>   	ret = __add_to_page_cache_locked(page, mapping, offset,
>   					 gfp_mask, &shadow);
>   	if (unlikely(ret))
> -		__clear_page_locked(page);
> +		__ClearPageLocked(page);
>   	else {
>   		/*
>   		 * The page might have been evicted from cache only
> @@ -742,6 +742,7 @@ EXPORT_SYMBOL_GPL(add_page_wait_queue);
>    */
>   void unlock_page(struct page *page)
>   {
> +	page = compound_head(page);
>   	VM_BUG_ON_PAGE(!PageLocked(page), page);
>   	clear_bit_unlock(PG_locked, &page->flags);
>   	smp_mb__after_atomic();
> @@ -806,18 +807,20 @@ EXPORT_SYMBOL_GPL(page_endio);
>    */
>   void __lock_page(struct page *page)
>   {
> -	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
> +	struct page *page_head = compound_head(page);
> +	DEFINE_WAIT_BIT(wait, &page_head->flags, PG_locked);
>
> -	__wait_on_bit_lock(page_waitqueue(page), &wait, bit_wait_io,
> +	__wait_on_bit_lock(page_waitqueue(page_head), &wait, bit_wait_io,
>   							TASK_UNINTERRUPTIBLE);
>   }
>   EXPORT_SYMBOL(__lock_page);
>
>   int __lock_page_killable(struct page *page)
>   {
> -	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
> +	struct page *page_head = compound_head(page);
> +	DEFINE_WAIT_BIT(wait, &page_head->flags, PG_locked);
>
> -	return __wait_on_bit_lock(page_waitqueue(page), &wait,
> +	return __wait_on_bit_lock(page_waitqueue(page_head), &wait,
>   					bit_wait_io, TASK_KILLABLE);
>   }
>   EXPORT_SYMBOL_GPL(__lock_page_killable);
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 4162dce2eb44..23138e99a531 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1884,7 +1884,7 @@ struct page *ksm_might_need_to_copy(struct page *page,
>
>   		SetPageDirty(new_page);
>   		__SetPageUptodate(new_page);
> -		__set_page_locked(new_page);
> +		__SetPageLocked(new_page);
>   	}
>
>   	return new_page;
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index d487f8dc6d39..399eee44d13d 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1136,7 +1136,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
>   	/*
>   	 * We ignore non-LRU pages for good reasons.
>   	 * - PG_locked is only well defined for LRU pages and a few others
> -	 * - to avoid races with __set_page_locked()
> +	 * - to avoid races with __SetPageLocked()
>   	 * - to avoid races with __SetPageSlab*() (and more non-atomic ops)
>   	 * The check (unnecessarily) ignores LRU pages being isolated and
>   	 * walked by the page reclaim code, however that's not a big loss.
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 6aa9a4222ea9..114602a68111 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1734,7 +1734,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
>   		flush_tlb_range(vma, mmun_start, mmun_end);
>
>   	/* Prepare a page as a migration target */
> -	__set_page_locked(new_page);
> +	__SetPageLocked(new_page);
>   	SetPageSwapBacked(new_page);
>
>   	/* anon mapping, we can simply copy page->mapping to the new page: */
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 80b360c7bcd1..2e2b943c8e62 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -981,7 +981,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
>   	copy_highpage(newpage, oldpage);
>   	flush_dcache_page(newpage);
>
> -	__set_page_locked(newpage);
> +	__SetPageLocked(newpage);
>   	SetPageUptodate(newpage);
>   	SetPageSwapBacked(newpage);
>   	set_page_private(newpage, swap_index);
> @@ -1173,7 +1173,7 @@ repeat:
>   		}
>
>   		__SetPageSwapBacked(page);
> -		__set_page_locked(page);
> +		__SetPageLocked(page);
>   		if (sgp == SGP_WRITE)
>   			__SetPageReferenced(page);
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 2584d4ff02eb..f33ae2b7a5e7 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -338,11 +338,13 @@ static inline int oo_objects(struct kmem_cache_order_objects x)
>    */
>   static __always_inline void slab_lock(struct page *page)
>   {
> +	VM_BUG_ON_PAGE(PageTail(page), page);
>   	bit_spin_lock(PG_locked, &page->flags);
>   }
>
>   static __always_inline void slab_unlock(struct page *page)
>   {
> +	VM_BUG_ON_PAGE(PageTail(page), page);
>   	__bit_spin_unlock(PG_locked, &page->flags);
>   }
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 405923f77334..d1c4a25b4362 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -357,7 +357,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>   		}
>
>   		/* May fail (-ENOMEM) if radix-tree node allocation failed. */
> -		__set_page_locked(new_page);
> +		__SetPageLocked(new_page);
>   		SetPageSwapBacked(new_page);
>   		err = __add_to_swap_cache(new_page, entry);
>   		if (likely(!err)) {
> @@ -371,7 +371,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>   		}
>   		radix_tree_preload_end();
>   		ClearPageSwapBacked(new_page);
> -		__clear_page_locked(new_page);
> +		__ClearPageLocked(new_page);
>   		/*
>   		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
>   		 * clear SWAP_HAS_CACHE flag.
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 260c413d39cd..dc6cd51577a6 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1062,7 +1062,7 @@ unmap:
>   				VM_BUG_ON_PAGE(PageSwapCache(page), page);
>   				if (!page_freeze_refs(page, 1))
>   					goto keep_locked;
> -				__clear_page_locked(page);
> +				__ClearPageLocked(page);
>   				count_vm_event(PGLAZYFREED);
>   				goto free_it;
>   			}
> @@ -1174,7 +1174,7 @@ unmap:
>   		 * we obviously don't have to worry about waking up a process
>   		 * waiting on the page lock, because there are no references.
>   		 */
> -		__clear_page_locked(page);
> +		__ClearPageLocked(page);
>   free_it:
>   		nr_reclaimed++;
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 4249e82ff934..f8583f1fc938 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -490,7 +490,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
>   		}
>
>   		/* May fail (-ENOMEM) if radix-tree node allocation failed. */
> -		__set_page_locked(new_page);
> +		__SetPageLocked(new_page);
>   		SetPageSwapBacked(new_page);
>   		err = __add_to_swap_cache(new_page, entry);
>   		if (likely(!err)) {
> @@ -501,7 +501,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
>   		}
>   		radix_tree_preload_end();
>   		ClearPageSwapBacked(new_page);
> -		__clear_page_locked(new_page);
> +		__ClearPageLocked(new_page);
>   		/*
>   		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
>   		 * clear SWAP_HAS_CACHE flag.
>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 04/16] page-flags: define PG_locked behavior on compound pages
  2015-03-19 17:08   ` Kirill A. Shutemov
  (?)
@ 2015-03-27 15:13     ` Mateusz Krawczuk
  -1 siblings, 0 replies; 119+ messages in thread
From: Mateusz Krawczuk @ 2015-03-27 15:13 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, linux-next, sfr

Hi!

This patch breaks build of linux next since 2015-03-25 on arm using 
exynos_defconfig with arm-linux-gnueabi-linaro_4.7.4-2014.04, 
arm-linux-gnueabi-linaro_4.8.3-2014.04 and 
arm-linux-gnueabi-4.7.3-12ubuntu1(from ubuntu 14.04 lts). Compiler shows 
this error message:
mm/migrate.c: In function ‘migrate_pages’:
mm/migrate.c:1148:1: internal compiler error: in push_minipool_fix, at 
config/arm/arm.c:13500
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.7/README.Bugs> for instructions.

It builds fine with arm-linux-gnueabi-linaro_4.9.1-2014.07.

Best Regards
Mateusz Krawczuk
Samsung R&D Institute Poland

  dniu 19.03.2015 o 18:08, Kirill A. Shutemov pisze:
> lock_page() must operate on the whole compound page. It doesn't make
> much sense to lock part of compound page. Change code to use head page's
> PG_locked, if tail page is passed.
>
> This patch also get rid of custom helprer functions --
> __set_page_locked() and __clear_page_locked(). They replaced with
> helpers generated by __SETPAGEFLAG/__CLEARPAGEFLAG. Tail pages to these
> helper would trigger VM_BUG_ON().
>
> SLUB uses PG_locked as a bit spin locked. IIUC, tail pages should never
> appear there. VM_BUG_ON() is added to make sure that this assumption is
> correct.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>   fs/cifs/file.c             |  8 ++++----
>   include/linux/page-flags.h |  2 +-
>   include/linux/pagemap.h    | 25 ++++++++-----------------
>   mm/filemap.c               | 15 +++++++++------
>   mm/ksm.c                   |  2 +-
>   mm/memory-failure.c        |  2 +-
>   mm/migrate.c               |  2 +-
>   mm/shmem.c                 |  4 ++--
>   mm/slub.c                  |  2 ++
>   mm/swap_state.c            |  4 ++--
>   mm/vmscan.c                |  4 ++--
>   mm/zswap.c                 |  4 ++--
>   12 files changed, 35 insertions(+), 39 deletions(-)
>
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index ca30c391a894..b9fd85dfee9b 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -3413,13 +3413,13 @@ readpages_get_pages(struct address_space *mapping, struct list_head *page_list,
>   	 * should have access to this page, we're safe to simply set
>   	 * PG_locked without checking it first.
>   	 */
> -	__set_page_locked(page);
> +	__SetPageLocked(page);
>   	rc = add_to_page_cache_locked(page, mapping,
>   				      page->index, GFP_KERNEL);
>
>   	/* give up if we can't stick it in the cache */
>   	if (rc) {
> -		__clear_page_locked(page);
> +		__ClearPageLocked(page);
>   		return rc;
>   	}
>
> @@ -3440,10 +3440,10 @@ readpages_get_pages(struct address_space *mapping, struct list_head *page_list,
>   		if (*bytes + PAGE_CACHE_SIZE > rsize)
>   			break;
>
> -		__set_page_locked(page);
> +		__SetPageLocked(page);
>   		if (add_to_page_cache_locked(page, mapping, page->index,
>   								GFP_KERNEL)) {
> -			__clear_page_locked(page);
> +			__ClearPageLocked(page);
>   			break;
>   		}
>   		list_move_tail(&page->lru, tmplist);
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 32ea62c0ad30..10bdde20b14c 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -269,7 +269,7 @@ static inline struct page *compound_head_fast(struct page *page)
>   	return page;
>   }
>
> -TESTPAGEFLAG(Locked, locked, ANY)
> +__PAGEFLAG(Locked, locked, NO_TAIL)
>   PAGEFLAG(Error, error, ANY) TESTCLEARFLAG(Error, error, ANY)
>   PAGEFLAG(Referenced, referenced, ANY) TESTCLEARFLAG(Referenced, referenced, ANY)
>   	__SETPAGEFLAG(Referenced, referenced, ANY)
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 4b3736f7065c..7c3790764795 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -426,18 +426,9 @@ extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
>   				unsigned int flags);
>   extern void unlock_page(struct page *page);
>
> -static inline void __set_page_locked(struct page *page)
> -{
> -	__set_bit(PG_locked, &page->flags);
> -}
> -
> -static inline void __clear_page_locked(struct page *page)
> -{
> -	__clear_bit(PG_locked, &page->flags);
> -}
> -
>   static inline int trylock_page(struct page *page)
>   {
> +	page = compound_head(page);
>   	return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
>   }
>
> @@ -490,9 +481,9 @@ extern int wait_on_page_bit_killable_timeout(struct page *page,
>
>   static inline int wait_on_page_locked_killable(struct page *page)
>   {
> -	if (PageLocked(page))
> -		return wait_on_page_bit_killable(page, PG_locked);
> -	return 0;
> +	if (!PageLocked(page))
> +		return 0;
> +	return wait_on_page_bit_killable(compound_head(page), PG_locked);
>   }
>
>   extern wait_queue_head_t *page_waitqueue(struct page *page);
> @@ -511,7 +502,7 @@ static inline void wake_up_page(struct page *page, int bit)
>   static inline void wait_on_page_locked(struct page *page)
>   {
>   	if (PageLocked(page))
> -		wait_on_page_bit(page, PG_locked);
> +		wait_on_page_bit(compound_head(page), PG_locked);
>   }
>
>   /*
> @@ -656,17 +647,17 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask);
>
>   /*
>    * Like add_to_page_cache_locked, but used to add newly allocated pages:
> - * the page is new, so we can just run __set_page_locked() against it.
> + * the page is new, so we can just run __SetPageLocked() against it.
>    */
>   static inline int add_to_page_cache(struct page *page,
>   		struct address_space *mapping, pgoff_t offset, gfp_t gfp_mask)
>   {
>   	int error;
>
> -	__set_page_locked(page);
> +	__SetPageLocked(page);
>   	error = add_to_page_cache_locked(page, mapping, offset, gfp_mask);
>   	if (unlikely(error))
> -		__clear_page_locked(page);
> +		__ClearPageLocked(page);
>   	return error;
>   }
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 12548d03c11d..467768d4263b 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -615,11 +615,11 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
>   	void *shadow = NULL;
>   	int ret;
>
> -	__set_page_locked(page);
> +	__SetPageLocked(page);
>   	ret = __add_to_page_cache_locked(page, mapping, offset,
>   					 gfp_mask, &shadow);
>   	if (unlikely(ret))
> -		__clear_page_locked(page);
> +		__ClearPageLocked(page);
>   	else {
>   		/*
>   		 * The page might have been evicted from cache only
> @@ -742,6 +742,7 @@ EXPORT_SYMBOL_GPL(add_page_wait_queue);
>    */
>   void unlock_page(struct page *page)
>   {
> +	page = compound_head(page);
>   	VM_BUG_ON_PAGE(!PageLocked(page), page);
>   	clear_bit_unlock(PG_locked, &page->flags);
>   	smp_mb__after_atomic();
> @@ -806,18 +807,20 @@ EXPORT_SYMBOL_GPL(page_endio);
>    */
>   void __lock_page(struct page *page)
>   {
> -	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
> +	struct page *page_head = compound_head(page);
> +	DEFINE_WAIT_BIT(wait, &page_head->flags, PG_locked);
>
> -	__wait_on_bit_lock(page_waitqueue(page), &wait, bit_wait_io,
> +	__wait_on_bit_lock(page_waitqueue(page_head), &wait, bit_wait_io,
>   							TASK_UNINTERRUPTIBLE);
>   }
>   EXPORT_SYMBOL(__lock_page);
>
>   int __lock_page_killable(struct page *page)
>   {
> -	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
> +	struct page *page_head = compound_head(page);
> +	DEFINE_WAIT_BIT(wait, &page_head->flags, PG_locked);
>
> -	return __wait_on_bit_lock(page_waitqueue(page), &wait,
> +	return __wait_on_bit_lock(page_waitqueue(page_head), &wait,
>   					bit_wait_io, TASK_KILLABLE);
>   }
>   EXPORT_SYMBOL_GPL(__lock_page_killable);
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 4162dce2eb44..23138e99a531 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1884,7 +1884,7 @@ struct page *ksm_might_need_to_copy(struct page *page,
>
>   		SetPageDirty(new_page);
>   		__SetPageUptodate(new_page);
> -		__set_page_locked(new_page);
> +		__SetPageLocked(new_page);
>   	}
>
>   	return new_page;
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index d487f8dc6d39..399eee44d13d 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1136,7 +1136,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
>   	/*
>   	 * We ignore non-LRU pages for good reasons.
>   	 * - PG_locked is only well defined for LRU pages and a few others
> -	 * - to avoid races with __set_page_locked()
> +	 * - to avoid races with __SetPageLocked()
>   	 * - to avoid races with __SetPageSlab*() (and more non-atomic ops)
>   	 * The check (unnecessarily) ignores LRU pages being isolated and
>   	 * walked by the page reclaim code, however that's not a big loss.
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 6aa9a4222ea9..114602a68111 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1734,7 +1734,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
>   		flush_tlb_range(vma, mmun_start, mmun_end);
>
>   	/* Prepare a page as a migration target */
> -	__set_page_locked(new_page);
> +	__SetPageLocked(new_page);
>   	SetPageSwapBacked(new_page);
>
>   	/* anon mapping, we can simply copy page->mapping to the new page: */
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 80b360c7bcd1..2e2b943c8e62 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -981,7 +981,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
>   	copy_highpage(newpage, oldpage);
>   	flush_dcache_page(newpage);
>
> -	__set_page_locked(newpage);
> +	__SetPageLocked(newpage);
>   	SetPageUptodate(newpage);
>   	SetPageSwapBacked(newpage);
>   	set_page_private(newpage, swap_index);
> @@ -1173,7 +1173,7 @@ repeat:
>   		}
>
>   		__SetPageSwapBacked(page);
> -		__set_page_locked(page);
> +		__SetPageLocked(page);
>   		if (sgp == SGP_WRITE)
>   			__SetPageReferenced(page);
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 2584d4ff02eb..f33ae2b7a5e7 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -338,11 +338,13 @@ static inline int oo_objects(struct kmem_cache_order_objects x)
>    */
>   static __always_inline void slab_lock(struct page *page)
>   {
> +	VM_BUG_ON_PAGE(PageTail(page), page);
>   	bit_spin_lock(PG_locked, &page->flags);
>   }
>
>   static __always_inline void slab_unlock(struct page *page)
>   {
> +	VM_BUG_ON_PAGE(PageTail(page), page);
>   	__bit_spin_unlock(PG_locked, &page->flags);
>   }
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 405923f77334..d1c4a25b4362 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -357,7 +357,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>   		}
>
>   		/* May fail (-ENOMEM) if radix-tree node allocation failed. */
> -		__set_page_locked(new_page);
> +		__SetPageLocked(new_page);
>   		SetPageSwapBacked(new_page);
>   		err = __add_to_swap_cache(new_page, entry);
>   		if (likely(!err)) {
> @@ -371,7 +371,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>   		}
>   		radix_tree_preload_end();
>   		ClearPageSwapBacked(new_page);
> -		__clear_page_locked(new_page);
> +		__ClearPageLocked(new_page);
>   		/*
>   		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
>   		 * clear SWAP_HAS_CACHE flag.
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 260c413d39cd..dc6cd51577a6 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1062,7 +1062,7 @@ unmap:
>   				VM_BUG_ON_PAGE(PageSwapCache(page), page);
>   				if (!page_freeze_refs(page, 1))
>   					goto keep_locked;
> -				__clear_page_locked(page);
> +				__ClearPageLocked(page);
>   				count_vm_event(PGLAZYFREED);
>   				goto free_it;
>   			}
> @@ -1174,7 +1174,7 @@ unmap:
>   		 * we obviously don't have to worry about waking up a process
>   		 * waiting on the page lock, because there are no references.
>   		 */
> -		__clear_page_locked(page);
> +		__ClearPageLocked(page);
>   free_it:
>   		nr_reclaimed++;
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 4249e82ff934..f8583f1fc938 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -490,7 +490,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
>   		}
>
>   		/* May fail (-ENOMEM) if radix-tree node allocation failed. */
> -		__set_page_locked(new_page);
> +		__SetPageLocked(new_page);
>   		SetPageSwapBacked(new_page);
>   		err = __add_to_swap_cache(new_page, entry);
>   		if (likely(!err)) {
> @@ -501,7 +501,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
>   		}
>   		radix_tree_preload_end();
>   		ClearPageSwapBacked(new_page);
> -		__clear_page_locked(new_page);
> +		__ClearPageLocked(new_page);
>   		/*
>   		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
>   		 * clear SWAP_HAS_CACHE flag.
>


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 04/16] page-flags: define PG_locked behavior on compound pages
@ 2015-03-27 15:13     ` Mateusz Krawczuk
  0 siblings, 0 replies; 119+ messages in thread
From: Mateusz Krawczuk @ 2015-03-27 15:13 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, linux-next, sfr

Hi!

This patch breaks build of linux next since 2015-03-25 on arm using 
exynos_defconfig with arm-linux-gnueabi-linaro_4.7.4-2014.04, 
arm-linux-gnueabi-linaro_4.8.3-2014.04 and 
arm-linux-gnueabi-4.7.3-12ubuntu1(from ubuntu 14.04 lts). Compiler shows 
this error message:
mm/migrate.c: In function ‘migrate_pages’:
mm/migrate.c:1148:1: internal compiler error: in push_minipool_fix, at 
config/arm/arm.c:13500
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.7/README.Bugs> for instructions.

It builds fine with arm-linux-gnueabi-linaro_4.9.1-2014.07.

Best Regards
Mateusz Krawczuk
Samsung R&D Institute Poland

  dniu 19.03.2015 o 18:08, Kirill A. Shutemov pisze:
> lock_page() must operate on the whole compound page. It doesn't make
> much sense to lock part of compound page. Change code to use head page's
> PG_locked, if tail page is passed.
>
> This patch also get rid of custom helprer functions --
> __set_page_locked() and __clear_page_locked(). They replaced with
> helpers generated by __SETPAGEFLAG/__CLEARPAGEFLAG. Tail pages to these
> helper would trigger VM_BUG_ON().
>
> SLUB uses PG_locked as a bit spin locked. IIUC, tail pages should never
> appear there. VM_BUG_ON() is added to make sure that this assumption is
> correct.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>   fs/cifs/file.c             |  8 ++++----
>   include/linux/page-flags.h |  2 +-
>   include/linux/pagemap.h    | 25 ++++++++-----------------
>   mm/filemap.c               | 15 +++++++++------
>   mm/ksm.c                   |  2 +-
>   mm/memory-failure.c        |  2 +-
>   mm/migrate.c               |  2 +-
>   mm/shmem.c                 |  4 ++--
>   mm/slub.c                  |  2 ++
>   mm/swap_state.c            |  4 ++--
>   mm/vmscan.c                |  4 ++--
>   mm/zswap.c                 |  4 ++--
>   12 files changed, 35 insertions(+), 39 deletions(-)
>
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index ca30c391a894..b9fd85dfee9b 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -3413,13 +3413,13 @@ readpages_get_pages(struct address_space *mapping, struct list_head *page_list,
>   	 * should have access to this page, we're safe to simply set
>   	 * PG_locked without checking it first.
>   	 */
> -	__set_page_locked(page);
> +	__SetPageLocked(page);
>   	rc = add_to_page_cache_locked(page, mapping,
>   				      page->index, GFP_KERNEL);
>
>   	/* give up if we can't stick it in the cache */
>   	if (rc) {
> -		__clear_page_locked(page);
> +		__ClearPageLocked(page);
>   		return rc;
>   	}
>
> @@ -3440,10 +3440,10 @@ readpages_get_pages(struct address_space *mapping, struct list_head *page_list,
>   		if (*bytes + PAGE_CACHE_SIZE > rsize)
>   			break;
>
> -		__set_page_locked(page);
> +		__SetPageLocked(page);
>   		if (add_to_page_cache_locked(page, mapping, page->index,
>   								GFP_KERNEL)) {
> -			__clear_page_locked(page);
> +			__ClearPageLocked(page);
>   			break;
>   		}
>   		list_move_tail(&page->lru, tmplist);
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 32ea62c0ad30..10bdde20b14c 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -269,7 +269,7 @@ static inline struct page *compound_head_fast(struct page *page)
>   	return page;
>   }
>
> -TESTPAGEFLAG(Locked, locked, ANY)
> +__PAGEFLAG(Locked, locked, NO_TAIL)
>   PAGEFLAG(Error, error, ANY) TESTCLEARFLAG(Error, error, ANY)
>   PAGEFLAG(Referenced, referenced, ANY) TESTCLEARFLAG(Referenced, referenced, ANY)
>   	__SETPAGEFLAG(Referenced, referenced, ANY)
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 4b3736f7065c..7c3790764795 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -426,18 +426,9 @@ extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
>   				unsigned int flags);
>   extern void unlock_page(struct page *page);
>
> -static inline void __set_page_locked(struct page *page)
> -{
> -	__set_bit(PG_locked, &page->flags);
> -}
> -
> -static inline void __clear_page_locked(struct page *page)
> -{
> -	__clear_bit(PG_locked, &page->flags);
> -}
> -
>   static inline int trylock_page(struct page *page)
>   {
> +	page = compound_head(page);
>   	return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
>   }
>
> @@ -490,9 +481,9 @@ extern int wait_on_page_bit_killable_timeout(struct page *page,
>
>   static inline int wait_on_page_locked_killable(struct page *page)
>   {
> -	if (PageLocked(page))
> -		return wait_on_page_bit_killable(page, PG_locked);
> -	return 0;
> +	if (!PageLocked(page))
> +		return 0;
> +	return wait_on_page_bit_killable(compound_head(page), PG_locked);
>   }
>
>   extern wait_queue_head_t *page_waitqueue(struct page *page);
> @@ -511,7 +502,7 @@ static inline void wake_up_page(struct page *page, int bit)
>   static inline void wait_on_page_locked(struct page *page)
>   {
>   	if (PageLocked(page))
> -		wait_on_page_bit(page, PG_locked);
> +		wait_on_page_bit(compound_head(page), PG_locked);
>   }
>
>   /*
> @@ -656,17 +647,17 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask);
>
>   /*
>    * Like add_to_page_cache_locked, but used to add newly allocated pages:
> - * the page is new, so we can just run __set_page_locked() against it.
> + * the page is new, so we can just run __SetPageLocked() against it.
>    */
>   static inline int add_to_page_cache(struct page *page,
>   		struct address_space *mapping, pgoff_t offset, gfp_t gfp_mask)
>   {
>   	int error;
>
> -	__set_page_locked(page);
> +	__SetPageLocked(page);
>   	error = add_to_page_cache_locked(page, mapping, offset, gfp_mask);
>   	if (unlikely(error))
> -		__clear_page_locked(page);
> +		__ClearPageLocked(page);
>   	return error;
>   }
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 12548d03c11d..467768d4263b 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -615,11 +615,11 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
>   	void *shadow = NULL;
>   	int ret;
>
> -	__set_page_locked(page);
> +	__SetPageLocked(page);
>   	ret = __add_to_page_cache_locked(page, mapping, offset,
>   					 gfp_mask, &shadow);
>   	if (unlikely(ret))
> -		__clear_page_locked(page);
> +		__ClearPageLocked(page);
>   	else {
>   		/*
>   		 * The page might have been evicted from cache only
> @@ -742,6 +742,7 @@ EXPORT_SYMBOL_GPL(add_page_wait_queue);
>    */
>   void unlock_page(struct page *page)
>   {
> +	page = compound_head(page);
>   	VM_BUG_ON_PAGE(!PageLocked(page), page);
>   	clear_bit_unlock(PG_locked, &page->flags);
>   	smp_mb__after_atomic();
> @@ -806,18 +807,20 @@ EXPORT_SYMBOL_GPL(page_endio);
>    */
>   void __lock_page(struct page *page)
>   {
> -	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
> +	struct page *page_head = compound_head(page);
> +	DEFINE_WAIT_BIT(wait, &page_head->flags, PG_locked);
>
> -	__wait_on_bit_lock(page_waitqueue(page), &wait, bit_wait_io,
> +	__wait_on_bit_lock(page_waitqueue(page_head), &wait, bit_wait_io,
>   							TASK_UNINTERRUPTIBLE);
>   }
>   EXPORT_SYMBOL(__lock_page);
>
>   int __lock_page_killable(struct page *page)
>   {
> -	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
> +	struct page *page_head = compound_head(page);
> +	DEFINE_WAIT_BIT(wait, &page_head->flags, PG_locked);
>
> -	return __wait_on_bit_lock(page_waitqueue(page), &wait,
> +	return __wait_on_bit_lock(page_waitqueue(page_head), &wait,
>   					bit_wait_io, TASK_KILLABLE);
>   }
>   EXPORT_SYMBOL_GPL(__lock_page_killable);
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 4162dce2eb44..23138e99a531 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1884,7 +1884,7 @@ struct page *ksm_might_need_to_copy(struct page *page,
>
>   		SetPageDirty(new_page);
>   		__SetPageUptodate(new_page);
> -		__set_page_locked(new_page);
> +		__SetPageLocked(new_page);
>   	}
>
>   	return new_page;
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index d487f8dc6d39..399eee44d13d 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1136,7 +1136,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
>   	/*
>   	 * We ignore non-LRU pages for good reasons.
>   	 * - PG_locked is only well defined for LRU pages and a few others
> -	 * - to avoid races with __set_page_locked()
> +	 * - to avoid races with __SetPageLocked()
>   	 * - to avoid races with __SetPageSlab*() (and more non-atomic ops)
>   	 * The check (unnecessarily) ignores LRU pages being isolated and
>   	 * walked by the page reclaim code, however that's not a big loss.
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 6aa9a4222ea9..114602a68111 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1734,7 +1734,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
>   		flush_tlb_range(vma, mmun_start, mmun_end);
>
>   	/* Prepare a page as a migration target */
> -	__set_page_locked(new_page);
> +	__SetPageLocked(new_page);
>   	SetPageSwapBacked(new_page);
>
>   	/* anon mapping, we can simply copy page->mapping to the new page: */
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 80b360c7bcd1..2e2b943c8e62 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -981,7 +981,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
>   	copy_highpage(newpage, oldpage);
>   	flush_dcache_page(newpage);
>
> -	__set_page_locked(newpage);
> +	__SetPageLocked(newpage);
>   	SetPageUptodate(newpage);
>   	SetPageSwapBacked(newpage);
>   	set_page_private(newpage, swap_index);
> @@ -1173,7 +1173,7 @@ repeat:
>   		}
>
>   		__SetPageSwapBacked(page);
> -		__set_page_locked(page);
> +		__SetPageLocked(page);
>   		if (sgp == SGP_WRITE)
>   			__SetPageReferenced(page);
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 2584d4ff02eb..f33ae2b7a5e7 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -338,11 +338,13 @@ static inline int oo_objects(struct kmem_cache_order_objects x)
>    */
>   static __always_inline void slab_lock(struct page *page)
>   {
> +	VM_BUG_ON_PAGE(PageTail(page), page);
>   	bit_spin_lock(PG_locked, &page->flags);
>   }
>
>   static __always_inline void slab_unlock(struct page *page)
>   {
> +	VM_BUG_ON_PAGE(PageTail(page), page);
>   	__bit_spin_unlock(PG_locked, &page->flags);
>   }
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 405923f77334..d1c4a25b4362 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -357,7 +357,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>   		}
>
>   		/* May fail (-ENOMEM) if radix-tree node allocation failed. */
> -		__set_page_locked(new_page);
> +		__SetPageLocked(new_page);
>   		SetPageSwapBacked(new_page);
>   		err = __add_to_swap_cache(new_page, entry);
>   		if (likely(!err)) {
> @@ -371,7 +371,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>   		}
>   		radix_tree_preload_end();
>   		ClearPageSwapBacked(new_page);
> -		__clear_page_locked(new_page);
> +		__ClearPageLocked(new_page);
>   		/*
>   		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
>   		 * clear SWAP_HAS_CACHE flag.
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 260c413d39cd..dc6cd51577a6 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1062,7 +1062,7 @@ unmap:
>   				VM_BUG_ON_PAGE(PageSwapCache(page), page);
>   				if (!page_freeze_refs(page, 1))
>   					goto keep_locked;
> -				__clear_page_locked(page);
> +				__ClearPageLocked(page);
>   				count_vm_event(PGLAZYFREED);
>   				goto free_it;
>   			}
> @@ -1174,7 +1174,7 @@ unmap:
>   		 * we obviously don't have to worry about waking up a process
>   		 * waiting on the page lock, because there are no references.
>   		 */
> -		__clear_page_locked(page);
> +		__ClearPageLocked(page);
>   free_it:
>   		nr_reclaimed++;
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 4249e82ff934..f8583f1fc938 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -490,7 +490,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
>   		}
>
>   		/* May fail (-ENOMEM) if radix-tree node allocation failed. */
> -		__set_page_locked(new_page);
> +		__SetPageLocked(new_page);
>   		SetPageSwapBacked(new_page);
>   		err = __add_to_swap_cache(new_page, entry);
>   		if (likely(!err)) {
> @@ -501,7 +501,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
>   		}
>   		radix_tree_preload_end();
>   		ClearPageSwapBacked(new_page);
> -		__clear_page_locked(new_page);
> +		__ClearPageLocked(new_page);
>   		/*
>   		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
>   		 * clear SWAP_HAS_CACHE flag.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 04/16] page-flags: define PG_locked behavior on compound pages
@ 2015-03-27 15:13     ` Mateusz Krawczuk
  0 siblings, 0 replies; 119+ messages in thread
From: Mateusz Krawczuk @ 2015-03-27 15:13 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, linux-next, sfr

Hi!

This patch breaks build of linux next since 2015-03-25 on arm using 
exynos_defconfig with arm-linux-gnueabi-linaro_4.7.4-2014.04, 
arm-linux-gnueabi-linaro_4.8.3-2014.04 and 
arm-linux-gnueabi-4.7.3-12ubuntu1(from ubuntu 14.04 lts). Compiler shows 
this error message:
mm/migrate.c: In function ?migrate_pages?:
mm/migrate.c:1148:1: internal compiler error: in push_minipool_fix, at 
config/arm/arm.c:13500
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.7/README.Bugs> for instructions.

It builds fine with arm-linux-gnueabi-linaro_4.9.1-2014.07.

Best Regards
Mateusz Krawczuk
Samsung R&D Institute Poland

  dniu 19.03.2015 o 18:08, Kirill A. Shutemov pisze:
> lock_page() must operate on the whole compound page. It doesn't make
> much sense to lock part of compound page. Change code to use head page's
> PG_locked, if tail page is passed.
>
> This patch also get rid of custom helprer functions --
> __set_page_locked() and __clear_page_locked(). They replaced with
> helpers generated by __SETPAGEFLAG/__CLEARPAGEFLAG. Tail pages to these
> helper would trigger VM_BUG_ON().
>
> SLUB uses PG_locked as a bit spin locked. IIUC, tail pages should never
> appear there. VM_BUG_ON() is added to make sure that this assumption is
> correct.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>   fs/cifs/file.c             |  8 ++++----
>   include/linux/page-flags.h |  2 +-
>   include/linux/pagemap.h    | 25 ++++++++-----------------
>   mm/filemap.c               | 15 +++++++++------
>   mm/ksm.c                   |  2 +-
>   mm/memory-failure.c        |  2 +-
>   mm/migrate.c               |  2 +-
>   mm/shmem.c                 |  4 ++--
>   mm/slub.c                  |  2 ++
>   mm/swap_state.c            |  4 ++--
>   mm/vmscan.c                |  4 ++--
>   mm/zswap.c                 |  4 ++--
>   12 files changed, 35 insertions(+), 39 deletions(-)
>
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index ca30c391a894..b9fd85dfee9b 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -3413,13 +3413,13 @@ readpages_get_pages(struct address_space *mapping, struct list_head *page_list,
>   	 * should have access to this page, we're safe to simply set
>   	 * PG_locked without checking it first.
>   	 */
> -	__set_page_locked(page);
> +	__SetPageLocked(page);
>   	rc = add_to_page_cache_locked(page, mapping,
>   				      page->index, GFP_KERNEL);
>
>   	/* give up if we can't stick it in the cache */
>   	if (rc) {
> -		__clear_page_locked(page);
> +		__ClearPageLocked(page);
>   		return rc;
>   	}
>
> @@ -3440,10 +3440,10 @@ readpages_get_pages(struct address_space *mapping, struct list_head *page_list,
>   		if (*bytes + PAGE_CACHE_SIZE > rsize)
>   			break;
>
> -		__set_page_locked(page);
> +		__SetPageLocked(page);
>   		if (add_to_page_cache_locked(page, mapping, page->index,
>   								GFP_KERNEL)) {
> -			__clear_page_locked(page);
> +			__ClearPageLocked(page);
>   			break;
>   		}
>   		list_move_tail(&page->lru, tmplist);
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 32ea62c0ad30..10bdde20b14c 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -269,7 +269,7 @@ static inline struct page *compound_head_fast(struct page *page)
>   	return page;
>   }
>
> -TESTPAGEFLAG(Locked, locked, ANY)
> +__PAGEFLAG(Locked, locked, NO_TAIL)
>   PAGEFLAG(Error, error, ANY) TESTCLEARFLAG(Error, error, ANY)
>   PAGEFLAG(Referenced, referenced, ANY) TESTCLEARFLAG(Referenced, referenced, ANY)
>   	__SETPAGEFLAG(Referenced, referenced, ANY)
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 4b3736f7065c..7c3790764795 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -426,18 +426,9 @@ extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
>   				unsigned int flags);
>   extern void unlock_page(struct page *page);
>
> -static inline void __set_page_locked(struct page *page)
> -{
> -	__set_bit(PG_locked, &page->flags);
> -}
> -
> -static inline void __clear_page_locked(struct page *page)
> -{
> -	__clear_bit(PG_locked, &page->flags);
> -}
> -
>   static inline int trylock_page(struct page *page)
>   {
> +	page = compound_head(page);
>   	return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
>   }
>
> @@ -490,9 +481,9 @@ extern int wait_on_page_bit_killable_timeout(struct page *page,
>
>   static inline int wait_on_page_locked_killable(struct page *page)
>   {
> -	if (PageLocked(page))
> -		return wait_on_page_bit_killable(page, PG_locked);
> -	return 0;
> +	if (!PageLocked(page))
> +		return 0;
> +	return wait_on_page_bit_killable(compound_head(page), PG_locked);
>   }
>
>   extern wait_queue_head_t *page_waitqueue(struct page *page);
> @@ -511,7 +502,7 @@ static inline void wake_up_page(struct page *page, int bit)
>   static inline void wait_on_page_locked(struct page *page)
>   {
>   	if (PageLocked(page))
> -		wait_on_page_bit(page, PG_locked);
> +		wait_on_page_bit(compound_head(page), PG_locked);
>   }
>
>   /*
> @@ -656,17 +647,17 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask);
>
>   /*
>    * Like add_to_page_cache_locked, but used to add newly allocated pages:
> - * the page is new, so we can just run __set_page_locked() against it.
> + * the page is new, so we can just run __SetPageLocked() against it.
>    */
>   static inline int add_to_page_cache(struct page *page,
>   		struct address_space *mapping, pgoff_t offset, gfp_t gfp_mask)
>   {
>   	int error;
>
> -	__set_page_locked(page);
> +	__SetPageLocked(page);
>   	error = add_to_page_cache_locked(page, mapping, offset, gfp_mask);
>   	if (unlikely(error))
> -		__clear_page_locked(page);
> +		__ClearPageLocked(page);
>   	return error;
>   }
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 12548d03c11d..467768d4263b 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -615,11 +615,11 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
>   	void *shadow = NULL;
>   	int ret;
>
> -	__set_page_locked(page);
> +	__SetPageLocked(page);
>   	ret = __add_to_page_cache_locked(page, mapping, offset,
>   					 gfp_mask, &shadow);
>   	if (unlikely(ret))
> -		__clear_page_locked(page);
> +		__ClearPageLocked(page);
>   	else {
>   		/*
>   		 * The page might have been evicted from cache only
> @@ -742,6 +742,7 @@ EXPORT_SYMBOL_GPL(add_page_wait_queue);
>    */
>   void unlock_page(struct page *page)
>   {
> +	page = compound_head(page);
>   	VM_BUG_ON_PAGE(!PageLocked(page), page);
>   	clear_bit_unlock(PG_locked, &page->flags);
>   	smp_mb__after_atomic();
> @@ -806,18 +807,20 @@ EXPORT_SYMBOL_GPL(page_endio);
>    */
>   void __lock_page(struct page *page)
>   {
> -	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
> +	struct page *page_head = compound_head(page);
> +	DEFINE_WAIT_BIT(wait, &page_head->flags, PG_locked);
>
> -	__wait_on_bit_lock(page_waitqueue(page), &wait, bit_wait_io,
> +	__wait_on_bit_lock(page_waitqueue(page_head), &wait, bit_wait_io,
>   							TASK_UNINTERRUPTIBLE);
>   }
>   EXPORT_SYMBOL(__lock_page);
>
>   int __lock_page_killable(struct page *page)
>   {
> -	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
> +	struct page *page_head = compound_head(page);
> +	DEFINE_WAIT_BIT(wait, &page_head->flags, PG_locked);
>
> -	return __wait_on_bit_lock(page_waitqueue(page), &wait,
> +	return __wait_on_bit_lock(page_waitqueue(page_head), &wait,
>   					bit_wait_io, TASK_KILLABLE);
>   }
>   EXPORT_SYMBOL_GPL(__lock_page_killable);
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 4162dce2eb44..23138e99a531 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1884,7 +1884,7 @@ struct page *ksm_might_need_to_copy(struct page *page,
>
>   		SetPageDirty(new_page);
>   		__SetPageUptodate(new_page);
> -		__set_page_locked(new_page);
> +		__SetPageLocked(new_page);
>   	}
>
>   	return new_page;
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index d487f8dc6d39..399eee44d13d 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1136,7 +1136,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
>   	/*
>   	 * We ignore non-LRU pages for good reasons.
>   	 * - PG_locked is only well defined for LRU pages and a few others
> -	 * - to avoid races with __set_page_locked()
> +	 * - to avoid races with __SetPageLocked()
>   	 * - to avoid races with __SetPageSlab*() (and more non-atomic ops)
>   	 * The check (unnecessarily) ignores LRU pages being isolated and
>   	 * walked by the page reclaim code, however that's not a big loss.
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 6aa9a4222ea9..114602a68111 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1734,7 +1734,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
>   		flush_tlb_range(vma, mmun_start, mmun_end);
>
>   	/* Prepare a page as a migration target */
> -	__set_page_locked(new_page);
> +	__SetPageLocked(new_page);
>   	SetPageSwapBacked(new_page);
>
>   	/* anon mapping, we can simply copy page->mapping to the new page: */
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 80b360c7bcd1..2e2b943c8e62 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -981,7 +981,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
>   	copy_highpage(newpage, oldpage);
>   	flush_dcache_page(newpage);
>
> -	__set_page_locked(newpage);
> +	__SetPageLocked(newpage);
>   	SetPageUptodate(newpage);
>   	SetPageSwapBacked(newpage);
>   	set_page_private(newpage, swap_index);
> @@ -1173,7 +1173,7 @@ repeat:
>   		}
>
>   		__SetPageSwapBacked(page);
> -		__set_page_locked(page);
> +		__SetPageLocked(page);
>   		if (sgp == SGP_WRITE)
>   			__SetPageReferenced(page);
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 2584d4ff02eb..f33ae2b7a5e7 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -338,11 +338,13 @@ static inline int oo_objects(struct kmem_cache_order_objects x)
>    */
>   static __always_inline void slab_lock(struct page *page)
>   {
> +	VM_BUG_ON_PAGE(PageTail(page), page);
>   	bit_spin_lock(PG_locked, &page->flags);
>   }
>
>   static __always_inline void slab_unlock(struct page *page)
>   {
> +	VM_BUG_ON_PAGE(PageTail(page), page);
>   	__bit_spin_unlock(PG_locked, &page->flags);
>   }
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 405923f77334..d1c4a25b4362 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -357,7 +357,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>   		}
>
>   		/* May fail (-ENOMEM) if radix-tree node allocation failed. */
> -		__set_page_locked(new_page);
> +		__SetPageLocked(new_page);
>   		SetPageSwapBacked(new_page);
>   		err = __add_to_swap_cache(new_page, entry);
>   		if (likely(!err)) {
> @@ -371,7 +371,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>   		}
>   		radix_tree_preload_end();
>   		ClearPageSwapBacked(new_page);
> -		__clear_page_locked(new_page);
> +		__ClearPageLocked(new_page);
>   		/*
>   		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
>   		 * clear SWAP_HAS_CACHE flag.
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 260c413d39cd..dc6cd51577a6 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1062,7 +1062,7 @@ unmap:
>   				VM_BUG_ON_PAGE(PageSwapCache(page), page);
>   				if (!page_freeze_refs(page, 1))
>   					goto keep_locked;
> -				__clear_page_locked(page);
> +				__ClearPageLocked(page);
>   				count_vm_event(PGLAZYFREED);
>   				goto free_it;
>   			}
> @@ -1174,7 +1174,7 @@ unmap:
>   		 * we obviously don't have to worry about waking up a process
>   		 * waiting on the page lock, because there are no references.
>   		 */
> -		__clear_page_locked(page);
> +		__ClearPageLocked(page);
>   free_it:
>   		nr_reclaimed++;
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 4249e82ff934..f8583f1fc938 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -490,7 +490,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
>   		}
>
>   		/* May fail (-ENOMEM) if radix-tree node allocation failed. */
> -		__set_page_locked(new_page);
> +		__SetPageLocked(new_page);
>   		SetPageSwapBacked(new_page);
>   		err = __add_to_swap_cache(new_page, entry);
>   		if (likely(!err)) {
> @@ -501,7 +501,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
>   		}
>   		radix_tree_preload_end();
>   		ClearPageSwapBacked(new_page);
> -		__clear_page_locked(new_page);
> +		__ClearPageLocked(new_page);
>   		/*
>   		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
>   		 * clear SWAP_HAS_CACHE flag.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 04/16] page-flags: define PG_locked behavior on compound pages
  2015-03-27 15:13     ` Mateusz Krawczuk
  (?)
@ 2015-03-27 16:37       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-27 16:37 UTC (permalink / raw)
  To: Mateusz Krawczuk
  Cc: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli,
	Hugh Dickins, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, linux-kernel, linux-mm, linux-next, sfr

On Fri, Mar 27, 2015 at 04:13:08PM +0100, Mateusz Krawczuk wrote:
> Hi!
> 
> This patch breaks build of linux next since 2015-03-25 on arm using
> exynos_defconfig with arm-linux-gnueabi-linaro_4.7.4-2014.04,
> arm-linux-gnueabi-linaro_4.8.3-2014.04 and
> arm-linux-gnueabi-4.7.3-12ubuntu1(from ubuntu 14.04 lts). Compiler shows
> this error message:
> mm/migrate.c: In function ‘migrate_pages’:
> mm/migrate.c:1148:1: internal compiler error: in push_minipool_fix, at
> config/arm/arm.c:13500
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See <file:///usr/share/doc/gcc-4.7/README.Bugs> for instructions.
> 
> It builds fine with arm-linux-gnueabi-linaro_4.9.1-2014.07.

Obviously, you need to report bug against your compiler. It's not a kernel
bug.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 04/16] page-flags: define PG_locked behavior on compound pages
@ 2015-03-27 16:37       ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-27 16:37 UTC (permalink / raw)
  To: Mateusz Krawczuk
  Cc: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli,
	Hugh Dickins, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, linux-kernel, linux-mm, linux-next, sfr

On Fri, Mar 27, 2015 at 04:13:08PM +0100, Mateusz Krawczuk wrote:
> Hi!
> 
> This patch breaks build of linux next since 2015-03-25 on arm using
> exynos_defconfig with arm-linux-gnueabi-linaro_4.7.4-2014.04,
> arm-linux-gnueabi-linaro_4.8.3-2014.04 and
> arm-linux-gnueabi-4.7.3-12ubuntu1(from ubuntu 14.04 lts). Compiler shows
> this error message:
> mm/migrate.c: In function ‘migrate_pages’:
> mm/migrate.c:1148:1: internal compiler error: in push_minipool_fix, at
> config/arm/arm.c:13500
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See <file:///usr/share/doc/gcc-4.7/README.Bugs> for instructions.
> 
> It builds fine with arm-linux-gnueabi-linaro_4.9.1-2014.07.

Obviously, you need to report bug against your compiler. It's not a kernel
bug.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 04/16] page-flags: define PG_locked behavior on compound pages
@ 2015-03-27 16:37       ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-03-27 16:37 UTC (permalink / raw)
  To: Mateusz Krawczuk
  Cc: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli,
	Hugh Dickins, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, linux-kernel, linux-mm, linux-next, sfr

On Fri, Mar 27, 2015 at 04:13:08PM +0100, Mateusz Krawczuk wrote:
> Hi!
> 
> This patch breaks build of linux next since 2015-03-25 on arm using
> exynos_defconfig with arm-linux-gnueabi-linaro_4.7.4-2014.04,
> arm-linux-gnueabi-linaro_4.8.3-2014.04 and
> arm-linux-gnueabi-4.7.3-12ubuntu1(from ubuntu 14.04 lts). Compiler shows
> this error message:
> mm/migrate.c: In function a??migrate_pagesa??:
> mm/migrate.c:1148:1: internal compiler error: in push_minipool_fix, at
> config/arm/arm.c:13500
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See <file:///usr/share/doc/gcc-4.7/README.Bugs> for instructions.
> 
> It builds fine with arm-linux-gnueabi-linaro_4.9.1-2014.07.

Obviously, you need to report bug against your compiler. It's not a kernel
bug.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 00/16] Sanitize usage of ->flags and ->mapping for tail pages
  2015-03-19 17:08 ` Kirill A. Shutemov
@ 2015-07-15 20:20   ` Christoph Lameter
  -1 siblings, 0 replies; 119+ messages in thread
From: Christoph Lameter @ 2015-07-15 20:20 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, Dave Hansen,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, linux-kernel, linux-mm

On Thu, 19 Mar 2015, Kirill A. Shutemov wrote:

> Currently we take naive approach to page flags on compound -- we set the
> flag on the page without consideration if the flag makes sense for tail
> page or for compound page in general. This patchset try to sort this out
> by defining per-flag policy on what need to be done if page-flag helper
> operate on compound page.

Well we hand pointers to head pages around if handling compound pages.
References to tail pages are dicey and should only be used in a limited
way. At least that is true in the slab allocators and that was my
understanding in earlier years. Therefore it does not make sense
then check for tail pages.

> For now I catched one case of illigal usage of page flags or ->mapping:
> sound subsystem allocates pages with __GFP_COMP and maps them with PTEs.
> It leads to setting dirty bit on tail pages and access to tail_page's
> ->mapping. I don't see any bad behaviour caused by this, but worth fixing
> anyway.

Does this catch any errors?

> This patchset makes more sense if you take my THP refcounting into
> account: we will see more compound pages mapped with PTEs and we need to
> define behaviour of flags on compound pages to avoid bugs.

Ok that introduces the risk of pointers to tail pages becoming more of an
issue. But that does not affect non pagecache pages.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 00/16] Sanitize usage of ->flags and ->mapping for tail pages
@ 2015-07-15 20:20   ` Christoph Lameter
  0 siblings, 0 replies; 119+ messages in thread
From: Christoph Lameter @ 2015-07-15 20:20 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, Dave Hansen,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, linux-kernel, linux-mm

On Thu, 19 Mar 2015, Kirill A. Shutemov wrote:

> Currently we take naive approach to page flags on compound -- we set the
> flag on the page without consideration if the flag makes sense for tail
> page or for compound page in general. This patchset try to sort this out
> by defining per-flag policy on what need to be done if page-flag helper
> operate on compound page.

Well we hand pointers to head pages around if handling compound pages.
References to tail pages are dicey and should only be used in a limited
way. At least that is true in the slab allocators and that was my
understanding in earlier years. Therefore it does not make sense
then check for tail pages.

> For now I catched one case of illigal usage of page flags or ->mapping:
> sound subsystem allocates pages with __GFP_COMP and maps them with PTEs.
> It leads to setting dirty bit on tail pages and access to tail_page's
> ->mapping. I don't see any bad behaviour caused by this, but worth fixing
> anyway.

Does this catch any errors?

> This patchset makes more sense if you take my THP refcounting into
> account: we will see more compound pages mapped with PTEs and we need to
> define behaviour of flags on compound pages to avoid bugs.

Ok that introduces the risk of pointers to tail pages becoming more of an
issue. But that does not affect non pagecache pages.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 04/16] page-flags: define PG_locked behavior on compound pages
  2015-03-19 17:08   ` Kirill A. Shutemov
@ 2015-07-15 20:20     ` Christoph Lameter
  -1 siblings, 0 replies; 119+ messages in thread
From: Christoph Lameter @ 2015-07-15 20:20 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, Dave Hansen,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, linux-kernel, linux-mm

On Thu, 19 Mar 2015, Kirill A. Shutemov wrote:

> SLUB uses PG_locked as a bit spin locked. IIUC, tail pages should never
> appear there. VM_BUG_ON() is added to make sure that this assumption is
> correct.

Its really not worth checking that AFAICT. Tail page pointers are not
used.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 04/16] page-flags: define PG_locked behavior on compound pages
@ 2015-07-15 20:20     ` Christoph Lameter
  0 siblings, 0 replies; 119+ messages in thread
From: Christoph Lameter @ 2015-07-15 20:20 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, Dave Hansen,
	Mel Gorman, Rik van Riel, Vlastimil Babka, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, linux-kernel, linux-mm

On Thu, 19 Mar 2015, Kirill A. Shutemov wrote:

> SLUB uses PG_locked as a bit spin locked. IIUC, tail pages should never
> appear there. VM_BUG_ON() is added to make sure that this assumption is
> correct.

Its really not worth checking that AFAICT. Tail page pointers are not
used.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 00/16] Sanitize usage of ->flags and ->mapping for tail pages
  2015-07-15 20:20   ` Christoph Lameter
@ 2015-07-15 21:18     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-07-15 21:18 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli,
	Hugh Dickins, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V,
	Johannes Weiner, Michal Hocko, Jerome Marchand, linux-kernel,
	linux-mm

On Wed, Jul 15, 2015 at 03:20:01PM -0500, Christoph Lameter wrote:
> On Thu, 19 Mar 2015, Kirill A. Shutemov wrote:
> 
> > Currently we take naive approach to page flags on compound -- we set the
> > flag on the page without consideration if the flag makes sense for tail
> > page or for compound page in general. This patchset try to sort this out
> > by defining per-flag policy on what need to be done if page-flag helper
> > operate on compound page.
> 
> Well we hand pointers to head pages around if handling compound pages.
> References to tail pages are dicey and should only be used in a limited
> way. At least that is true in the slab allocators and that was my
> understanding in earlier years. Therefore it does not make sense
> then check for tail pages.

This is preparation patchset for THP refcounting rework. With new
refcounting sub-pages for THP can be mapped with PTEs, therefore we will
see tail pages returned from pte_page().

I've tried ad-hoc approach to page flags wrt tail pages on earlier (pre
LFS/MM) revisions of THP refcounting patchset. And IIRC, *you* pointed
that it would be nice to have more systematic approach.

And here's my attempt.

> > For now I catched one case of illigal usage of page flags or ->mapping:
> > sound subsystem allocates pages with __GFP_COMP and maps them with PTEs.
> > It leads to setting dirty bit on tail pages and access to tail_page's
> > ->mapping. I don't see any bad behaviour caused by this, but worth fixing
> > anyway.
> 
> Does this catch any errors?

It helped to catch BUG fixed by c761471b58e6 (mm: avoid tail page
refcounting on non-THP compound pages) and helped with work on
refcounting patchset.
 
> > This patchset makes more sense if you take my THP refcounting into
> > account: we will see more compound pages mapped with PTEs and we need to
> > define behaviour of flags on compound pages to avoid bugs.
> 
> Ok that introduces the risk of pointers to tail pages becoming more of an
> issue. But that does not affect non pagecache pages.

We don't have huge pages in pagecache yet. Refcounting patchset only
affects anon-THP. And makes compound pages suitable for pagecache.

We also have PTE-mapped compound pages -- in sound subsystem and some
drivers (framebuffer, etc.)

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 00/16] Sanitize usage of ->flags and ->mapping for tail pages
@ 2015-07-15 21:18     ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-07-15 21:18 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli,
	Hugh Dickins, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V,
	Johannes Weiner, Michal Hocko, Jerome Marchand, linux-kernel,
	linux-mm

On Wed, Jul 15, 2015 at 03:20:01PM -0500, Christoph Lameter wrote:
> On Thu, 19 Mar 2015, Kirill A. Shutemov wrote:
> 
> > Currently we take naive approach to page flags on compound -- we set the
> > flag on the page without consideration if the flag makes sense for tail
> > page or for compound page in general. This patchset try to sort this out
> > by defining per-flag policy on what need to be done if page-flag helper
> > operate on compound page.
> 
> Well we hand pointers to head pages around if handling compound pages.
> References to tail pages are dicey and should only be used in a limited
> way. At least that is true in the slab allocators and that was my
> understanding in earlier years. Therefore it does not make sense
> then check for tail pages.

This is preparation patchset for THP refcounting rework. With new
refcounting sub-pages for THP can be mapped with PTEs, therefore we will
see tail pages returned from pte_page().

I've tried ad-hoc approach to page flags wrt tail pages on earlier (pre
LFS/MM) revisions of THP refcounting patchset. And IIRC, *you* pointed
that it would be nice to have more systematic approach.

And here's my attempt.

> > For now I catched one case of illigal usage of page flags or ->mapping:
> > sound subsystem allocates pages with __GFP_COMP and maps them with PTEs.
> > It leads to setting dirty bit on tail pages and access to tail_page's
> > ->mapping. I don't see any bad behaviour caused by this, but worth fixing
> > anyway.
> 
> Does this catch any errors?

It helped to catch BUG fixed by c761471b58e6 (mm: avoid tail page
refcounting on non-THP compound pages) and helped with work on
refcounting patchset.
 
> > This patchset makes more sense if you take my THP refcounting into
> > account: we will see more compound pages mapped with PTEs and we need to
> > define behaviour of flags on compound pages to avoid bugs.
> 
> Ok that introduces the risk of pointers to tail pages becoming more of an
> issue. But that does not affect non pagecache pages.

We don't have huge pages in pagecache yet. Refcounting patchset only
affects anon-THP. And makes compound pages suitable for pagecache.

We also have PTE-mapped compound pages -- in sound subsystem and some
drivers (framebuffer, etc.)

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* page-flags behavior on compound pages: a worry
  2015-03-19 17:08   ` Kirill A. Shutemov
@ 2015-08-06  4:15     ` Hugh Dickins
  -1 siblings, 0 replies; 119+ messages in thread
From: Hugh Dickins @ 2015-08-06  4:15 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Andrea Arcangeli, David Rientjes, Hugh Dickins,
	Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm

Hi Kirill,

I had a nasty thought this morning.

Andrew had prodded me gently to re-examine my concerns with your
page-flags rework in mmotm.  I still dislike the bloat (my mm/built-in.o
text goes up from 478513 to 490183 bytes on a non-DEBUG_VM build); but I
was hoping to set that aside, to let us move forward.

But looking into the bloat led me to what seems a more serious issue
with it.  I'd tacked a little function on to the end of mm/filemap.c:

bool page_is_locked(struct page *page)
{
	return !!PageLocked(page);
}

which came out as:

0000000000003a60 <page_is_locked>:
    3a60:	48 8b 07             	mov    (%rdi),%rax
    3a63:	55                   	push   %rbp
    3a64:	48 89 e5             	mov    %rsp,%rbp

[instructions above same as without your patches; those below added by them]

    3a67:	f6 c4 80             	test   $0x80,%ah
    3a6a:	74 10                	je     3a7c <page_is_locked+0x1c>
    3a6c:	48 8b 47 30          	mov    0x30(%rdi),%rax
    3a70:	48 8b 17             	mov    (%rdi),%rdx
    3a73:	80 e6 80             	and    $0x80,%dh
    3a76:	48 0f 44 c7          	cmove  %rdi,%rax
    3a7a:	eb 03                	jmp    3a7f <page_is_locked+0x1f>
    3a7c:	48 89 f8             	mov    %rdi,%rax
    3a7f:	48 8b 00             	mov    (%rax),%rax

[instructions above added by your patches; those below same as before]

    3a82:	5d                   	pop    %rbp
    3a83:	83 e0 01             	and    $0x1,%eax
    3a86:	c3                   	retq   

The "and $0x80,%dh" looked superfluous at first, but of course it isn't:
it's from the smp_rmb() in David's 668f9abbd433 "mm: close PageTail race"
(a later commit refactors compound_head() but doesn't change the story).

And it's that race, or a worse race of that kind, that now worries me.
Relying on smp_wmb() and smp_rmb() may be all that was needed in the
case that David was fixing; and (I dare not look at them to audit!)
all uses of compound_head() in our current v4.2-rc tree may well be
safe, for this or that contingent reason in each place that it's used.

But there is no locking within compound_head(page) to make it safe
everywhere, yet your page-flags rework is changing a large number
of PageWhatever()s and SetPageWhatever()s and ClearPageWhatever()s
now to do a hidden compound_head(page) beneath the covers.

To be more specific: if preemption, or an interrupt, or entry to SMM
mode, or whatever, delays this thread somewhere in that compound_head()
sequence of instructions, how can we be sure that the "head" returned
by compound_head() is good?  We know the page was PageTail just before
looking up page->first_page, and we know it was PageTail just after,
but we don't know that it was PageTail throughout, and we don't know
whether page->first_page is even a good page pointer, or something
else from the private/ptl/slab_cache union.

Of course it would be very rare for it to go wrong; and most callsites
will obviously be safe for this or that reason; though, sadly, none of
them safe from holding a reference to the tail page in question, since
its count is frozen at 0 and cannot be grabbed by get_page_unless_zero.

But I don't see how it can be safe to rely on compound_head() inside
a general purpose page-flag function, that we're all accustomed to
think of as a simple bitop, that can be applied without great care.

Hugh

^ permalink raw reply	[flat|nested] 119+ messages in thread

* page-flags behavior on compound pages: a worry
@ 2015-08-06  4:15     ` Hugh Dickins
  0 siblings, 0 replies; 119+ messages in thread
From: Hugh Dickins @ 2015-08-06  4:15 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Andrea Arcangeli, David Rientjes, Hugh Dickins,
	Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm

Hi Kirill,

I had a nasty thought this morning.

Andrew had prodded me gently to re-examine my concerns with your
page-flags rework in mmotm.  I still dislike the bloat (my mm/built-in.o
text goes up from 478513 to 490183 bytes on a non-DEBUG_VM build); but I
was hoping to set that aside, to let us move forward.

But looking into the bloat led me to what seems a more serious issue
with it.  I'd tacked a little function on to the end of mm/filemap.c:

bool page_is_locked(struct page *page)
{
	return !!PageLocked(page);
}

which came out as:

0000000000003a60 <page_is_locked>:
    3a60:	48 8b 07             	mov    (%rdi),%rax
    3a63:	55                   	push   %rbp
    3a64:	48 89 e5             	mov    %rsp,%rbp

[instructions above same as without your patches; those below added by them]

    3a67:	f6 c4 80             	test   $0x80,%ah
    3a6a:	74 10                	je     3a7c <page_is_locked+0x1c>
    3a6c:	48 8b 47 30          	mov    0x30(%rdi),%rax
    3a70:	48 8b 17             	mov    (%rdi),%rdx
    3a73:	80 e6 80             	and    $0x80,%dh
    3a76:	48 0f 44 c7          	cmove  %rdi,%rax
    3a7a:	eb 03                	jmp    3a7f <page_is_locked+0x1f>
    3a7c:	48 89 f8             	mov    %rdi,%rax
    3a7f:	48 8b 00             	mov    (%rax),%rax

[instructions above added by your patches; those below same as before]

    3a82:	5d                   	pop    %rbp
    3a83:	83 e0 01             	and    $0x1,%eax
    3a86:	c3                   	retq   

The "and $0x80,%dh" looked superfluous at first, but of course it isn't:
it's from the smp_rmb() in David's 668f9abbd433 "mm: close PageTail race"
(a later commit refactors compound_head() but doesn't change the story).

And it's that race, or a worse race of that kind, that now worries me.
Relying on smp_wmb() and smp_rmb() may be all that was needed in the
case that David was fixing; and (I dare not look at them to audit!)
all uses of compound_head() in our current v4.2-rc tree may well be
safe, for this or that contingent reason in each place that it's used.

But there is no locking within compound_head(page) to make it safe
everywhere, yet your page-flags rework is changing a large number
of PageWhatever()s and SetPageWhatever()s and ClearPageWhatever()s
now to do a hidden compound_head(page) beneath the covers.

To be more specific: if preemption, or an interrupt, or entry to SMM
mode, or whatever, delays this thread somewhere in that compound_head()
sequence of instructions, how can we be sure that the "head" returned
by compound_head() is good?  We know the page was PageTail just before
looking up page->first_page, and we know it was PageTail just after,
but we don't know that it was PageTail throughout, and we don't know
whether page->first_page is even a good page pointer, or something
else from the private/ptl/slab_cache union.

Of course it would be very rare for it to go wrong; and most callsites
will obviously be safe for this or that reason; though, sadly, none of
them safe from holding a reference to the tail page in question, since
its count is frozen at 0 and cannot be grabbed by get_page_unless_zero.

But I don't see how it can be safe to rely on compound_head() inside
a general purpose page-flag function, that we're all accustomed to
think of as a simple bitop, that can be applied without great care.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
  2015-08-06  4:15     ` Hugh Dickins
@ 2015-08-06 15:33       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-08-06 15:33 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli,
	David Rientjes, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, linux-kernel, linux-mm

On Wed, Aug 05, 2015 at 09:15:57PM -0700, Hugh Dickins wrote:
> Hi Kirill,
> 
> I had a nasty thought this morning.
 
Tough day.

I'm trying to wrap my head around this mail and not sure if I succeed
much. :-|

> Andrew had prodded me gently to re-examine my concerns with your
> page-flags rework in mmotm.  I still dislike the bloat (my mm/built-in.o
> text goes up from 478513 to 490183 bytes on a non-DEBUG_VM build); but I
> was hoping to set that aside, to let us move forward.
> 
> But looking into the bloat led me to what seems a more serious issue
> with it.  I'd tacked a little function on to the end of mm/filemap.c:
> 
> bool page_is_locked(struct page *page)
> {
> 	return !!PageLocked(page);
> }
> 
> which came out as:
> 
> 0000000000003a60 <page_is_locked>:
>     3a60:	48 8b 07             	mov    (%rdi),%rax
>     3a63:	55                   	push   %rbp
>     3a64:	48 89 e5             	mov    %rsp,%rbp
> 
> [instructions above same as without your patches; those below added by them]
> 
>     3a67:	f6 c4 80             	test   $0x80,%ah
>     3a6a:	74 10                	je     3a7c <page_is_locked+0x1c>
>     3a6c:	48 8b 47 30          	mov    0x30(%rdi),%rax
>     3a70:	48 8b 17             	mov    (%rdi),%rdx
>     3a73:	80 e6 80             	and    $0x80,%dh
>     3a76:	48 0f 44 c7          	cmove  %rdi,%rax
>     3a7a:	eb 03                	jmp    3a7f <page_is_locked+0x1f>
>     3a7c:	48 89 f8             	mov    %rdi,%rax
>     3a7f:	48 8b 00             	mov    (%rax),%rax
> 
> [instructions above added by your patches; those below same as before]
> 
>     3a82:	5d                   	pop    %rbp
>     3a83:	83 e0 01             	and    $0x1,%eax
>     3a86:	c3                   	retq   
> 
> The "and $0x80,%dh" looked superfluous at first, but of course it isn't:
> it's from the smp_rmb() in David's 668f9abbd433 "mm: close PageTail race"
> (a later commit refactors compound_head() but doesn't change the story).
> 
> And it's that race, or a worse race of that kind, that now worries me.
> Relying on smp_wmb() and smp_rmb() may be all that was needed in the
> case that David was fixing; and (I dare not look at them to audit!)
> all uses of compound_head() in our current v4.2-rc tree may well be
> safe, for this or that contingent reason in each place that it's used.
> 
> But there is no locking within compound_head(page) to make it safe
> everywhere, yet your page-flags rework is changing a large number
> of PageWhatever()s and SetPageWhatever()s and ClearPageWhatever()s
> now to do a hidden compound_head(page) beneath the covers.
> 
> To be more specific: if preemption, or an interrupt, or entry to SMM
> mode, or whatever, delays this thread somewhere in that compound_head()
> sequence of instructions, how can we be sure that the "head" returned
> by compound_head() is good?  We know the page was PageTail just before
> looking up page->first_page, and we know it was PageTail just after,
> but we don't know that it was PageTail throughout, and we don't know
> whether page->first_page is even a good page pointer, or something
> else from the private/ptl/slab_cache union.

That looks like a very valid worry to me. For current -mm tree.

But let's take my refcounting rework into picture.

One thing it simplifies is protection against splitting. Once you've got a
reference to a page, it cannot be split under you. It makes PageTail() and
->first_page stable for most callsites.

We can access the page's flags under ptl, without having reference the
page. And that's fine: ptl protects against splitting too.

Fast GUP also have a way to protect against split.

IIUC, the only potentially problematic callsites left are physical memory
scanners. This code requires audit. I'll do that.
 
Do I miss something else?

> Of course it would be very rare for it to go wrong; and most callsites
> will obviously be safe for this or that reason; though, sadly, none of
> them safe from holding a reference to the tail page in question, since
> its count is frozen at 0 and cannot be grabbed by get_page_unless_zero.

Do you mean that grabbing head page's ->_count is not enough to protect
against splitting and freeing tail page under you?

I know a patchset which solves this! ;)

> But I don't see how it can be safe to rely on compound_head() inside
> a general purpose page-flag function, that we're all accustomed to
> think of as a simple bitop, that can be applied without great care.
> 
> Hugh
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
@ 2015-08-06 15:33       ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-08-06 15:33 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli,
	David Rientjes, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, linux-kernel, linux-mm

On Wed, Aug 05, 2015 at 09:15:57PM -0700, Hugh Dickins wrote:
> Hi Kirill,
> 
> I had a nasty thought this morning.
 
Tough day.

I'm trying to wrap my head around this mail and not sure if I succeed
much. :-|

> Andrew had prodded me gently to re-examine my concerns with your
> page-flags rework in mmotm.  I still dislike the bloat (my mm/built-in.o
> text goes up from 478513 to 490183 bytes on a non-DEBUG_VM build); but I
> was hoping to set that aside, to let us move forward.
> 
> But looking into the bloat led me to what seems a more serious issue
> with it.  I'd tacked a little function on to the end of mm/filemap.c:
> 
> bool page_is_locked(struct page *page)
> {
> 	return !!PageLocked(page);
> }
> 
> which came out as:
> 
> 0000000000003a60 <page_is_locked>:
>     3a60:	48 8b 07             	mov    (%rdi),%rax
>     3a63:	55                   	push   %rbp
>     3a64:	48 89 e5             	mov    %rsp,%rbp
> 
> [instructions above same as without your patches; those below added by them]
> 
>     3a67:	f6 c4 80             	test   $0x80,%ah
>     3a6a:	74 10                	je     3a7c <page_is_locked+0x1c>
>     3a6c:	48 8b 47 30          	mov    0x30(%rdi),%rax
>     3a70:	48 8b 17             	mov    (%rdi),%rdx
>     3a73:	80 e6 80             	and    $0x80,%dh
>     3a76:	48 0f 44 c7          	cmove  %rdi,%rax
>     3a7a:	eb 03                	jmp    3a7f <page_is_locked+0x1f>
>     3a7c:	48 89 f8             	mov    %rdi,%rax
>     3a7f:	48 8b 00             	mov    (%rax),%rax
> 
> [instructions above added by your patches; those below same as before]
> 
>     3a82:	5d                   	pop    %rbp
>     3a83:	83 e0 01             	and    $0x1,%eax
>     3a86:	c3                   	retq   
> 
> The "and $0x80,%dh" looked superfluous at first, but of course it isn't:
> it's from the smp_rmb() in David's 668f9abbd433 "mm: close PageTail race"
> (a later commit refactors compound_head() but doesn't change the story).
> 
> And it's that race, or a worse race of that kind, that now worries me.
> Relying on smp_wmb() and smp_rmb() may be all that was needed in the
> case that David was fixing; and (I dare not look at them to audit!)
> all uses of compound_head() in our current v4.2-rc tree may well be
> safe, for this or that contingent reason in each place that it's used.
> 
> But there is no locking within compound_head(page) to make it safe
> everywhere, yet your page-flags rework is changing a large number
> of PageWhatever()s and SetPageWhatever()s and ClearPageWhatever()s
> now to do a hidden compound_head(page) beneath the covers.
> 
> To be more specific: if preemption, or an interrupt, or entry to SMM
> mode, or whatever, delays this thread somewhere in that compound_head()
> sequence of instructions, how can we be sure that the "head" returned
> by compound_head() is good?  We know the page was PageTail just before
> looking up page->first_page, and we know it was PageTail just after,
> but we don't know that it was PageTail throughout, and we don't know
> whether page->first_page is even a good page pointer, or something
> else from the private/ptl/slab_cache union.

That looks like a very valid worry to me. For current -mm tree.

But let's take my refcounting rework into picture.

One thing it simplifies is protection against splitting. Once you've got a
reference to a page, it cannot be split under you. It makes PageTail() and
->first_page stable for most callsites.

We can access the page's flags under ptl, without having reference the
page. And that's fine: ptl protects against splitting too.

Fast GUP also have a way to protect against split.

IIUC, the only potentially problematic callsites left are physical memory
scanners. This code requires audit. I'll do that.
 
Do I miss something else?

> Of course it would be very rare for it to go wrong; and most callsites
> will obviously be safe for this or that reason; though, sadly, none of
> them safe from holding a reference to the tail page in question, since
> its count is frozen at 0 and cannot be grabbed by get_page_unless_zero.

Do you mean that grabbing head page's ->_count is not enough to protect
against splitting and freeing tail page under you?

I know a patchset which solves this! ;)

> But I don't see how it can be safe to rely on compound_head() inside
> a general purpose page-flag function, that we're all accustomed to
> think of as a simple bitop, that can be applied without great care.
> 
> Hugh
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
  2015-08-06 15:33       ` Kirill A. Shutemov
@ 2015-08-06 19:24         ` Hugh Dickins
  -1 siblings, 0 replies; 119+ messages in thread
From: Hugh Dickins @ 2015-08-06 19:24 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Kirill A. Shutemov, Andrew Morton,
	Andrea Arcangeli, David Rientjes, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On Thu, 6 Aug 2015, Kirill A. Shutemov wrote:
> On Wed, Aug 05, 2015 at 09:15:57PM -0700, Hugh Dickins wrote:
> > Hi Kirill,
> > 
> > I had a nasty thought this morning.
>  
> Tough day.
> 
> I'm trying to wrap my head around this mail and not sure if I succeed
> much. :-|

Sorry for not being clearer.

> > To be more specific: if preemption, or an interrupt, or entry to SMM
> > mode, or whatever, delays this thread somewhere in that compound_head()
> > sequence of instructions, how can we be sure that the "head" returned
> > by compound_head() is good?  We know the page was PageTail just before
> > looking up page->first_page, and we know it was PageTail just after,
> > but we don't know that it was PageTail throughout, and we don't know
> > whether page->first_page is even a good page pointer, or something
> > else from the private/ptl/slab_cache union.
> 
> That looks like a very valid worry to me. For current -mm tree.
> 
> But let's take my refcounting rework into picture.

Okay, let's do so.  I get very confused trying to think based on two
alternative schemes at the same time, so I'm happy to assume your
THP refcounting rework (which certainly has plenty to like in it
from the point of view of cleanup - though at present I think the
mlock splitting leaves it with a net regression in functionality).

That does say that this page-flags rework should not go to Linus
separately from your refcounting rework: I don't think the issues
here are ever likely to break someone's bisection, so it's fine for
the one series to precede the other, but any release should contain
both or neither.

> 
> One thing it simplifies is protection against splitting. Once you've got a
> reference to a page, it cannot be split under you. It makes PageTail() and
> ->first_page stable for most callsites.

Yes, but since you cannot acquire a reference to a tail page itself
(since it has count 0 throughout), don't you mean there that you
already hold a reference to the head?

In which case, why bother to make all the PageFlags operations on
tails redirect to the head: if the caller must hold a reference to
the head, then the caller should apply PageFlags to that head, with
no need for compound_head() redirection inside the operation, just a
VM_BUG_ON(PageTail).

Or so it seems from the outside: perhaps that becomes unworkable
somehow once you try to implement it.

> 
> We can access the page's flags under ptl, without having reference the
> page. And that's fine: ptl protects against splitting too.
> 
> Fast GUP also have a way to protect against split.

Yes and yes.  Perhaps it's those accesses under ptl which took you
in this compound_head-inside-PageFlags direction.  Fast GUP is easy
to do something special in, but there's probably a lot of scattered
PageFlags operations under ptl, which were tiresome to fiddle with
when you came to allow pte mappings of THP subpages.

> 
> IIUC, the only potentially problematic callsites left are physical memory
> scanners. This code requires audit. I'll do that.

Please.

>  
> Do I miss something else?

Probably not; but please check - and I'm afraid you've set things up
so that every use of a PageFlags operation needs to be thought about,
if only briefly.

It's certainly the physical approaches to a page (isolation, compaction,
formerly lumpy reclaim, are there others?  /proc things?) which have
always been very tricky to get right.

I think it was for those that David added the barriered double PageTail
checking.  I wonder if something extra special should be done just there,
in the physical scans; and the barriered double PageTail checking avoided
elsewhere, in the normal places that you reckon are safe already.

Mind you, shifting the unlikely PageTail handling out of line to a
called function would reduce the bloat considerably, then maybe it
wouldn't matter how complicated it gets for the general case.

> 
> > Of course it would be very rare for it to go wrong; and most callsites
> > will obviously be safe for this or that reason; though, sadly, none of
> > them safe from holding a reference to the tail page in question, since
> > its count is frozen at 0 and cannot be grabbed by get_page_unless_zero.
> 
> Do you mean that grabbing head page's ->_count is not enough to protect
> against splitting and freeing tail page under you?

No, I mean that if you know head already then why are you bothering with
tail; and if you only have tail, then locating head in all the cases where
the PageFlags operation might be called may be unsafe in a few of them.

And that it's not possible to acquire a reference to the tail page to
make this safe.  But I accept your point above, that the existence of
a pte in a locked page table amounts to a stable reference, even though
it does not contribute to that tail page's reference count.

> 
> I know a patchset which solves this! ;)

Oh, and I know a patchset which avoids these problems completely,
by not using compound pages at all ;)

Hugh

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
@ 2015-08-06 19:24         ` Hugh Dickins
  0 siblings, 0 replies; 119+ messages in thread
From: Hugh Dickins @ 2015-08-06 19:24 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Kirill A. Shutemov, Andrew Morton,
	Andrea Arcangeli, David Rientjes, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On Thu, 6 Aug 2015, Kirill A. Shutemov wrote:
> On Wed, Aug 05, 2015 at 09:15:57PM -0700, Hugh Dickins wrote:
> > Hi Kirill,
> > 
> > I had a nasty thought this morning.
>  
> Tough day.
> 
> I'm trying to wrap my head around this mail and not sure if I succeed
> much. :-|

Sorry for not being clearer.

> > To be more specific: if preemption, or an interrupt, or entry to SMM
> > mode, or whatever, delays this thread somewhere in that compound_head()
> > sequence of instructions, how can we be sure that the "head" returned
> > by compound_head() is good?  We know the page was PageTail just before
> > looking up page->first_page, and we know it was PageTail just after,
> > but we don't know that it was PageTail throughout, and we don't know
> > whether page->first_page is even a good page pointer, or something
> > else from the private/ptl/slab_cache union.
> 
> That looks like a very valid worry to me. For current -mm tree.
> 
> But let's take my refcounting rework into picture.

Okay, let's do so.  I get very confused trying to think based on two
alternative schemes at the same time, so I'm happy to assume your
THP refcounting rework (which certainly has plenty to like in it
from the point of view of cleanup - though at present I think the
mlock splitting leaves it with a net regression in functionality).

That does say that this page-flags rework should not go to Linus
separately from your refcounting rework: I don't think the issues
here are ever likely to break someone's bisection, so it's fine for
the one series to precede the other, but any release should contain
both or neither.

> 
> One thing it simplifies is protection against splitting. Once you've got a
> reference to a page, it cannot be split under you. It makes PageTail() and
> ->first_page stable for most callsites.

Yes, but since you cannot acquire a reference to a tail page itself
(since it has count 0 throughout), don't you mean there that you
already hold a reference to the head?

In which case, why bother to make all the PageFlags operations on
tails redirect to the head: if the caller must hold a reference to
the head, then the caller should apply PageFlags to that head, with
no need for compound_head() redirection inside the operation, just a
VM_BUG_ON(PageTail).

Or so it seems from the outside: perhaps that becomes unworkable
somehow once you try to implement it.

> 
> We can access the page's flags under ptl, without having reference the
> page. And that's fine: ptl protects against splitting too.
> 
> Fast GUP also have a way to protect against split.

Yes and yes.  Perhaps it's those accesses under ptl which took you
in this compound_head-inside-PageFlags direction.  Fast GUP is easy
to do something special in, but there's probably a lot of scattered
PageFlags operations under ptl, which were tiresome to fiddle with
when you came to allow pte mappings of THP subpages.

> 
> IIUC, the only potentially problematic callsites left are physical memory
> scanners. This code requires audit. I'll do that.

Please.

>  
> Do I miss something else?

Probably not; but please check - and I'm afraid you've set things up
so that every use of a PageFlags operation needs to be thought about,
if only briefly.

It's certainly the physical approaches to a page (isolation, compaction,
formerly lumpy reclaim, are there others?  /proc things?) which have
always been very tricky to get right.

I think it was for those that David added the barriered double PageTail
checking.  I wonder if something extra special should be done just there,
in the physical scans; and the barriered double PageTail checking avoided
elsewhere, in the normal places that you reckon are safe already.

Mind you, shifting the unlikely PageTail handling out of line to a
called function would reduce the bloat considerably, then maybe it
wouldn't matter how complicated it gets for the general case.

> 
> > Of course it would be very rare for it to go wrong; and most callsites
> > will obviously be safe for this or that reason; though, sadly, none of
> > them safe from holding a reference to the tail page in question, since
> > its count is frozen at 0 and cannot be grabbed by get_page_unless_zero.
> 
> Do you mean that grabbing head page's ->_count is not enough to protect
> against splitting and freeing tail page under you?

No, I mean that if you know head already then why are you bothering with
tail; and if you only have tail, then locating head in all the cases where
the PageFlags operation might be called may be unsafe in a few of them.

And that it's not possible to acquire a reference to the tail page to
make this safe.  But I accept your point above, that the existence of
a pte in a locked page table amounts to a stable reference, even though
it does not contribute to that tail page's reference count.

> 
> I know a patchset which solves this! ;)

Oh, and I know a patchset which avoids these problems completely,
by not using compound pages at all ;)

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
  2015-08-06 19:24         ` Hugh Dickins
@ 2015-08-06 20:45           ` Christoph Lameter
  -1 siblings, 0 replies; 119+ messages in thread
From: Christoph Lameter @ 2015-08-06 20:45 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Kirill A. Shutemov, Kirill A. Shutemov, Andrew Morton,
	Andrea Arcangeli, David Rientjes, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm

On Thu, 6 Aug 2015, Hugh Dickins wrote:

> > I know a patchset which solves this! ;)
>
> Oh, and I know a patchset which avoids these problems completely,
> by not using compound pages at all ;)

Another dumb idea: Stop the insanity of splitting pages on the fly?
Splitting pages should work like page migration: Lock everything down and
ensure no one is using the page and then do it. That way the compound pages
and its metadata are as stable as a regular page.


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
@ 2015-08-06 20:45           ` Christoph Lameter
  0 siblings, 0 replies; 119+ messages in thread
From: Christoph Lameter @ 2015-08-06 20:45 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Kirill A. Shutemov, Kirill A. Shutemov, Andrew Morton,
	Andrea Arcangeli, David Rientjes, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm

On Thu, 6 Aug 2015, Hugh Dickins wrote:

> > I know a patchset which solves this! ;)
>
> Oh, and I know a patchset which avoids these problems completely,
> by not using compound pages at all ;)

Another dumb idea: Stop the insanity of splitting pages on the fly?
Splitting pages should work like page migration: Lock everything down and
ensure no one is using the page and then do it. That way the compound pages
and its metadata are as stable as a regular page.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
  2015-08-06 19:24         ` Hugh Dickins
@ 2015-08-07 14:49           ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-08-07 14:49 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli,
	David Rientjes, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, linux-kernel, linux-mm

On Thu, Aug 06, 2015 at 12:24:22PM -0700, Hugh Dickins wrote:
> > I'm trying to wrap my head around this mail and not sure if I succeed
> > much. :-|
> 
> Sorry for not being clearer.

Not your fault.

The problem you've pointed to is on edge of my understanding of concurrency.

> On Thu, 6 Aug 2015, Kirill A. Shutemov wrote:
> > > To be more specific: if preemption, or an interrupt, or entry to SMM
> > > mode, or whatever, delays this thread somewhere in that compound_head()
> > > sequence of instructions, how can we be sure that the "head" returned
> > > by compound_head() is good?  We know the page was PageTail just before
> > > looking up page->first_page, and we know it was PageTail just after,
> > > but we don't know that it was PageTail throughout, and we don't know
> > > whether page->first_page is even a good page pointer, or something
> > > else from the private/ptl/slab_cache union.
> > 
> > That looks like a very valid worry to me. For current -mm tree.
> > 
> > But let's take my refcounting rework into picture.
> 
> Okay, let's do so.  I get very confused trying to think based on two
> alternative schemes at the same time, so I'm happy to assume your
> THP refcounting rework (which certainly has plenty to like in it
> from the point of view of cleanup - though at present I think the
> mlock splitting leaves it with a net regression in functionality).

The plan is to bring it a bit later. The refcounting patchset is huge
enough as it is.

> That does say that this page-flags rework should not go to Linus
> separately from your refcounting rework: I don't think the issues
> here are ever likely to break someone's bisection, so it's fine for
> the one series to precede the other, but any release should contain
> both or neither.

Agreed.

> > One thing it simplifies is protection against splitting. Once you've got a
> > reference to a page, it cannot be split under you. It makes PageTail() and
> > ->first_page stable for most callsites.
> 
> Yes, but since you cannot acquire a reference to a tail page itself
> (since it has count 0 throughout), don't you mean there that you
> already hold a reference to the head?
> 
> In which case, why bother to make all the PageFlags operations on
> tails redirect to the head: if the caller must hold a reference to
> the head, then the caller should apply PageFlags to that head, with
> no need for compound_head() redirection inside the operation, just a
> VM_BUG_ON(PageTail).

get_page() and put_page() hide the fact that refcounting is applied to
head page. And that's handy. Otherwise we would need drag pointers to two
pages on caller side, instead of one we have now.

The only special case is again get_page_unless_zero() users. They have to
deal with head vs. tail pages on their own. We have only few such places.
And it's manageable I believe.

> Or so it seems from the outside: perhaps that becomes unworkable
> somehow once you try to implement it.
> 
> > 
> > We can access the page's flags under ptl, without having reference the
> > page. And that's fine: ptl protects against splitting too.
> > 
> > Fast GUP also have a way to protect against split.
> 
> Yes and yes.  Perhaps it's those accesses under ptl which took you
> in this compound_head-inside-PageFlags direction.  Fast GUP is easy
> to do something special in, but there's probably a lot of scattered
> PageFlags operations under ptl, which were tiresome to fiddle with
> when you came to allow pte mappings of THP subpages.

Right.

> > IIUC, the only potentially problematic callsites left are physical memory
> > scanners. This code requires audit. I'll do that.
> 
> Please.

I'll bring some report on this next week.

> > Do I miss something else?
> 
> Probably not; but please check - and I'm afraid you've set things up
> so that every use of a PageFlags operation needs to be thought about,
> if only briefly.
> 
> It's certainly the physical approaches to a page (isolation, compaction,
> formerly lumpy reclaim, are there others?  /proc things?) which have
> always been very tricky to get right.

I didn't think about /proc as potential issue. Thanks.

> I think it was for those that David added the barriered double PageTail
> checking.  I wonder if something extra special should be done just there,
> in the physical scans; and the barriered double PageTail checking avoided
> elsewhere, in the normal places that you reckon are safe already.
> 
> Mind you, shifting the unlikely PageTail handling out of line to a
> called function would reduce the bloat considerably, then maybe it
> wouldn't matter how complicated it gets for the general case.

I'll try that.

> > > Of course it would be very rare for it to go wrong; and most callsites
> > > will obviously be safe for this or that reason; though, sadly, none of
> > > them safe from holding a reference to the tail page in question, since
> > > its count is frozen at 0 and cannot be grabbed by get_page_unless_zero.
> > 
> > Do you mean that grabbing head page's ->_count is not enough to protect
> > against splitting and freeing tail page under you?
> 
> No, I mean that if you know head already then why are you bothering with
> tail; and if you only have tail, then locating head in all the cases where
> the PageFlags operation might be called may be unsafe in a few of them.

See above.

> And that it's not possible to acquire a reference to the tail page to
> make this safe.  But I accept your point above, that the existence of
> a pte in a locked page table amounts to a stable reference, even though
> it does not contribute to that tail page's reference count.
> 
> > 
> > I know a patchset which solves this! ;)
> 
> Oh, and I know a patchset which avoids these problems completely,
> by not using compound pages at all ;)

BTW, I haven't heard anything about the patchset for a while.
What's the status?

Optimizing rmap operations in my patchset (see PG_double_map), I found
that it would be very tricky to expand team pages to anon-THP without
performance regression on rmap side due to amount of atomic ops it
requires.

Is there any clever approach to the issue?

Team pages are probably fine for file mappings due different performance
baseline. I'm less optimistic about anon-THP.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
@ 2015-08-07 14:49           ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-08-07 14:49 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli,
	David Rientjes, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, linux-kernel, linux-mm

On Thu, Aug 06, 2015 at 12:24:22PM -0700, Hugh Dickins wrote:
> > I'm trying to wrap my head around this mail and not sure if I succeed
> > much. :-|
> 
> Sorry for not being clearer.

Not your fault.

The problem you've pointed to is on edge of my understanding of concurrency.

> On Thu, 6 Aug 2015, Kirill A. Shutemov wrote:
> > > To be more specific: if preemption, or an interrupt, or entry to SMM
> > > mode, or whatever, delays this thread somewhere in that compound_head()
> > > sequence of instructions, how can we be sure that the "head" returned
> > > by compound_head() is good?  We know the page was PageTail just before
> > > looking up page->first_page, and we know it was PageTail just after,
> > > but we don't know that it was PageTail throughout, and we don't know
> > > whether page->first_page is even a good page pointer, or something
> > > else from the private/ptl/slab_cache union.
> > 
> > That looks like a very valid worry to me. For current -mm tree.
> > 
> > But let's take my refcounting rework into picture.
> 
> Okay, let's do so.  I get very confused trying to think based on two
> alternative schemes at the same time, so I'm happy to assume your
> THP refcounting rework (which certainly has plenty to like in it
> from the point of view of cleanup - though at present I think the
> mlock splitting leaves it with a net regression in functionality).

The plan is to bring it a bit later. The refcounting patchset is huge
enough as it is.

> That does say that this page-flags rework should not go to Linus
> separately from your refcounting rework: I don't think the issues
> here are ever likely to break someone's bisection, so it's fine for
> the one series to precede the other, but any release should contain
> both or neither.

Agreed.

> > One thing it simplifies is protection against splitting. Once you've got a
> > reference to a page, it cannot be split under you. It makes PageTail() and
> > ->first_page stable for most callsites.
> 
> Yes, but since you cannot acquire a reference to a tail page itself
> (since it has count 0 throughout), don't you mean there that you
> already hold a reference to the head?
> 
> In which case, why bother to make all the PageFlags operations on
> tails redirect to the head: if the caller must hold a reference to
> the head, then the caller should apply PageFlags to that head, with
> no need for compound_head() redirection inside the operation, just a
> VM_BUG_ON(PageTail).

get_page() and put_page() hide the fact that refcounting is applied to
head page. And that's handy. Otherwise we would need drag pointers to two
pages on caller side, instead of one we have now.

The only special case is again get_page_unless_zero() users. They have to
deal with head vs. tail pages on their own. We have only few such places.
And it's manageable I believe.

> Or so it seems from the outside: perhaps that becomes unworkable
> somehow once you try to implement it.
> 
> > 
> > We can access the page's flags under ptl, without having reference the
> > page. And that's fine: ptl protects against splitting too.
> > 
> > Fast GUP also have a way to protect against split.
> 
> Yes and yes.  Perhaps it's those accesses under ptl which took you
> in this compound_head-inside-PageFlags direction.  Fast GUP is easy
> to do something special in, but there's probably a lot of scattered
> PageFlags operations under ptl, which were tiresome to fiddle with
> when you came to allow pte mappings of THP subpages.

Right.

> > IIUC, the only potentially problematic callsites left are physical memory
> > scanners. This code requires audit. I'll do that.
> 
> Please.

I'll bring some report on this next week.

> > Do I miss something else?
> 
> Probably not; but please check - and I'm afraid you've set things up
> so that every use of a PageFlags operation needs to be thought about,
> if only briefly.
> 
> It's certainly the physical approaches to a page (isolation, compaction,
> formerly lumpy reclaim, are there others?  /proc things?) which have
> always been very tricky to get right.

I didn't think about /proc as potential issue. Thanks.

> I think it was for those that David added the barriered double PageTail
> checking.  I wonder if something extra special should be done just there,
> in the physical scans; and the barriered double PageTail checking avoided
> elsewhere, in the normal places that you reckon are safe already.
> 
> Mind you, shifting the unlikely PageTail handling out of line to a
> called function would reduce the bloat considerably, then maybe it
> wouldn't matter how complicated it gets for the general case.

I'll try that.

> > > Of course it would be very rare for it to go wrong; and most callsites
> > > will obviously be safe for this or that reason; though, sadly, none of
> > > them safe from holding a reference to the tail page in question, since
> > > its count is frozen at 0 and cannot be grabbed by get_page_unless_zero.
> > 
> > Do you mean that grabbing head page's ->_count is not enough to protect
> > against splitting and freeing tail page under you?
> 
> No, I mean that if you know head already then why are you bothering with
> tail; and if you only have tail, then locating head in all the cases where
> the PageFlags operation might be called may be unsafe in a few of them.

See above.

> And that it's not possible to acquire a reference to the tail page to
> make this safe.  But I accept your point above, that the existence of
> a pte in a locked page table amounts to a stable reference, even though
> it does not contribute to that tail page's reference count.
> 
> > 
> > I know a patchset which solves this! ;)
> 
> Oh, and I know a patchset which avoids these problems completely,
> by not using compound pages at all ;)

BTW, I haven't heard anything about the patchset for a while.
What's the status?

Optimizing rmap operations in my patchset (see PG_double_map), I found
that it would be very tricky to expand team pages to anon-THP without
performance regression on rmap side due to amount of atomic ops it
requires.

Is there any clever approach to the issue?

Team pages are probably fine for file mappings due different performance
baseline. I'm less optimistic about anon-THP.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
  2015-08-06 20:45           ` Christoph Lameter
@ 2015-08-07 14:50             ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-08-07 14:50 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Hugh Dickins, Kirill A. Shutemov, Andrew Morton,
	Andrea Arcangeli, David Rientjes, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm

On Thu, Aug 06, 2015 at 03:45:31PM -0500, Christoph Lameter wrote:
> On Thu, 6 Aug 2015, Hugh Dickins wrote:
> 
> > > I know a patchset which solves this! ;)
> >
> > Oh, and I know a patchset which avoids these problems completely,
> > by not using compound pages at all ;)
> 
> Another dumb idea: Stop the insanity of splitting pages on the fly?
> Splitting pages should work like page migration: Lock everything down and
> ensure no one is using the page and then do it. That way the compound pages
> and its metadata are as stable as a regular page.
 
That's what I do in refcounting patchset.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
@ 2015-08-07 14:50             ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-08-07 14:50 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Hugh Dickins, Kirill A. Shutemov, Andrew Morton,
	Andrea Arcangeli, David Rientjes, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm

On Thu, Aug 06, 2015 at 03:45:31PM -0500, Christoph Lameter wrote:
> On Thu, 6 Aug 2015, Hugh Dickins wrote:
> 
> > > I know a patchset which solves this! ;)
> >
> > Oh, and I know a patchset which avoids these problems completely,
> > by not using compound pages at all ;)
> 
> Another dumb idea: Stop the insanity of splitting pages on the fly?
> Splitting pages should work like page migration: Lock everything down and
> ensure no one is using the page and then do it. That way the compound pages
> and its metadata are as stable as a regular page.
 
That's what I do in refcounting patchset.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
  2015-08-07 14:50             ` Kirill A. Shutemov
@ 2015-08-07 15:28               ` Christoph Lameter
  -1 siblings, 0 replies; 119+ messages in thread
From: Christoph Lameter @ 2015-08-07 15:28 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Kirill A. Shutemov, Andrew Morton,
	Andrea Arcangeli, David Rientjes, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm

On Fri, 7 Aug 2015, Kirill A. Shutemov wrote:

> On Thu, Aug 06, 2015 at 03:45:31PM -0500, Christoph Lameter wrote:
> > On Thu, 6 Aug 2015, Hugh Dickins wrote:
> >
> > > > I know a patchset which solves this! ;)
> > >
> > > Oh, and I know a patchset which avoids these problems completely,
> > > by not using compound pages at all ;)
> >
> > Another dumb idea: Stop the insanity of splitting pages on the fly?
> > Splitting pages should work like page migration: Lock everything down and
> > ensure no one is using the page and then do it. That way the compound pages
> > and its metadata are as stable as a regular page.
>
> That's what I do in refcounting patchset.

Looks like you make refcounting easier and avoid splitting in some cases
maybe only splitting the pmd. But the fundamental issue still remains.
Complexity is high since individual pages of a compound can be mapped and
unmapped in multiple processes.

The compound would need to be always treated as a single order N entity
in order to really get things simplified and make code cleaner.

Either all pages are mapped or none. Otherwise you have to manage the
a schizoprenic view of pages. Sometimes an order N size entity is
managed and sometimes a base page size page which is a fraction of the
whole. Such a view of a memory object is pretty difficult to manage.







^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
@ 2015-08-07 15:28               ` Christoph Lameter
  0 siblings, 0 replies; 119+ messages in thread
From: Christoph Lameter @ 2015-08-07 15:28 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Kirill A. Shutemov, Andrew Morton,
	Andrea Arcangeli, David Rientjes, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm

On Fri, 7 Aug 2015, Kirill A. Shutemov wrote:

> On Thu, Aug 06, 2015 at 03:45:31PM -0500, Christoph Lameter wrote:
> > On Thu, 6 Aug 2015, Hugh Dickins wrote:
> >
> > > > I know a patchset which solves this! ;)
> > >
> > > Oh, and I know a patchset which avoids these problems completely,
> > > by not using compound pages at all ;)
> >
> > Another dumb idea: Stop the insanity of splitting pages on the fly?
> > Splitting pages should work like page migration: Lock everything down and
> > ensure no one is using the page and then do it. That way the compound pages
> > and its metadata are as stable as a regular page.
>
> That's what I do in refcounting patchset.

Looks like you make refcounting easier and avoid splitting in some cases
maybe only splitting the pmd. But the fundamental issue still remains.
Complexity is high since individual pages of a compound can be mapped and
unmapped in multiple processes.

The compound would need to be always treated as a single order N entity
in order to really get things simplified and make code cleaner.

Either all pages are mapped or none. Otherwise you have to manage the
a schizoprenic view of pages. Sometimes an order N size entity is
managed and sometimes a base page size page which is a fraction of the
whole. Such a view of a memory object is pretty difficult to manage.






--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
  2015-08-07 15:28               ` Christoph Lameter
@ 2015-08-10 11:09                 ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-08-10 11:09 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Hugh Dickins, Kirill A. Shutemov, Andrew Morton,
	Andrea Arcangeli, David Rientjes, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm

On Fri, Aug 07, 2015 at 10:28:49AM -0500, Christoph Lameter wrote:
> On Fri, 7 Aug 2015, Kirill A. Shutemov wrote:
> 
> > On Thu, Aug 06, 2015 at 03:45:31PM -0500, Christoph Lameter wrote:
> > > On Thu, 6 Aug 2015, Hugh Dickins wrote:
> > >
> > > > > I know a patchset which solves this! ;)
> > > >
> > > > Oh, and I know a patchset which avoids these problems completely,
> > > > by not using compound pages at all ;)
> > >
> > > Another dumb idea: Stop the insanity of splitting pages on the fly?
> > > Splitting pages should work like page migration: Lock everything down and
> > > ensure no one is using the page and then do it. That way the compound pages
> > > and its metadata are as stable as a regular page.
> >
> > That's what I do in refcounting patchset.
> 
> Looks like you make refcounting easier and avoid splitting in some cases
> maybe only splitting the pmd. But the fundamental issue still remains.
> Complexity is high since individual pages of a compound can be mapped and
> unmapped in multiple processes.
> 
> The compound would need to be always treated as a single order N entity
> in order to really get things simplified and make code cleaner.
> 
> Either all pages are mapped or none. Otherwise you have to manage the
> a schizoprenic view of pages. Sometimes an order N size entity is
> managed and sometimes a base page size page which is a fraction of the
> whole. Such a view of a memory object is pretty difficult to manage.

I don't see anything actionable here. Your wish list doesn't cope with
reality. Compound pages are mapped with PTEs for almost ten years and I
don't see why we should stop the practice.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
@ 2015-08-10 11:09                 ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-08-10 11:09 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Hugh Dickins, Kirill A. Shutemov, Andrew Morton,
	Andrea Arcangeli, David Rientjes, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm

On Fri, Aug 07, 2015 at 10:28:49AM -0500, Christoph Lameter wrote:
> On Fri, 7 Aug 2015, Kirill A. Shutemov wrote:
> 
> > On Thu, Aug 06, 2015 at 03:45:31PM -0500, Christoph Lameter wrote:
> > > On Thu, 6 Aug 2015, Hugh Dickins wrote:
> > >
> > > > > I know a patchset which solves this! ;)
> > > >
> > > > Oh, and I know a patchset which avoids these problems completely,
> > > > by not using compound pages at all ;)
> > >
> > > Another dumb idea: Stop the insanity of splitting pages on the fly?
> > > Splitting pages should work like page migration: Lock everything down and
> > > ensure no one is using the page and then do it. That way the compound pages
> > > and its metadata are as stable as a regular page.
> >
> > That's what I do in refcounting patchset.
> 
> Looks like you make refcounting easier and avoid splitting in some cases
> maybe only splitting the pmd. But the fundamental issue still remains.
> Complexity is high since individual pages of a compound can be mapped and
> unmapped in multiple processes.
> 
> The compound would need to be always treated as a single order N entity
> in order to really get things simplified and make code cleaner.
> 
> Either all pages are mapped or none. Otherwise you have to manage the
> a schizoprenic view of pages. Sometimes an order N size entity is
> managed and sometimes a base page size page which is a fraction of the
> whole. Such a view of a memory object is pretty difficult to manage.

I don't see anything actionable here. Your wish list doesn't cope with
reality. Compound pages are mapped with PTEs for almost ten years and I
don't see why we should stop the practice.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
  2015-08-10 11:09                 ` Kirill A. Shutemov
@ 2015-08-10 13:50                   ` Christoph Lameter
  -1 siblings, 0 replies; 119+ messages in thread
From: Christoph Lameter @ 2015-08-10 13:50 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Kirill A. Shutemov, Andrew Morton,
	Andrea Arcangeli, David Rientjes, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm

On Mon, 10 Aug 2015, Kirill A. Shutemov wrote:

> I don't see anything actionable here. Your wish list doesn't cope with
> reality. Compound pages are mapped with PTEs for almost ten years and I
> don't see why we should stop the practice.

Well they have to if they are smaller than huge pages. Treating each PTE
as each having their own state instead of having the whole compound mapped
completely causes the problem. Refcounting in tail pages is not necessary
if the whole compound is either mapped or not mapped at all by a process.
Refcounting in tail pages is only necessary if you allow 4k slices to be
mapped.


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
@ 2015-08-10 13:50                   ` Christoph Lameter
  0 siblings, 0 replies; 119+ messages in thread
From: Christoph Lameter @ 2015-08-10 13:50 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Kirill A. Shutemov, Andrew Morton,
	Andrea Arcangeli, David Rientjes, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm

On Mon, 10 Aug 2015, Kirill A. Shutemov wrote:

> I don't see anything actionable here. Your wish list doesn't cope with
> reality. Compound pages are mapped with PTEs for almost ten years and I
> don't see why we should stop the practice.

Well they have to if they are smaller than huge pages. Treating each PTE
as each having their own state instead of having the whole compound mapped
completely causes the problem. Refcounting in tail pages is not necessary
if the whole compound is either mapped or not mapped at all by a process.
Refcounting in tail pages is only necessary if you allow 4k slices to be
mapped.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
  2015-08-06 19:24         ` Hugh Dickins
@ 2015-08-12 14:35           ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-08-12 14:35 UTC (permalink / raw)
  To: Hugh Dickins, David Rientjes, Vlastimil Babka
  Cc: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli, Dave Hansen,
	Mel Gorman, Rik van Riel, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, linux-kernel, linux-mm

On Thu, Aug 06, 2015 at 12:24:22PM -0700, Hugh Dickins wrote:
> > IIUC, the only potentially problematic callsites left are physical memory
> > scanners. This code requires audit. I'll do that.
> 
> Please.

I haven't finished the exercise yet. But here's an issue I believe present
in current *Linus* tree:

>From e78eec7d7a8c4cba8b5952a997973f7741e704f4 Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Date: Wed, 12 Aug 2015 17:09:16 +0300
Subject: [PATCH] mm: fix potential race in isolate_migratepages_block()

Hugh has pointed that compound_head() call can be unsafe in some context.
There's one example:

	CPU0					CPU1

isolate_migratepages_block()
  page_count()
    compound_head()
      !!PageTail() == true
					put_page()
					  tail->first_page = NULL
      head = tail->first_page
					alloc_pages(__GFP_COMP)
					   prep_compound_page()
					     tail->first_page = head
					     __SetPageTail(p);
      !!PageTail() == true
    <head == NULL dereferencing>

The race is pure theoretical. I don't it's possible to trigger it in
practice. But who knows.

This can be fixed by avoiding compound_head() in unsafe context.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 86f04e556f96..bec727b700d3 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -787,7 +787,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		 * admittedly racy check.
 		 */
 		if (!page_mapping(page) &&
-		    page_count(page) > page_mapcount(page))
+		    atomic_read(&page->_count) > page_mapcount(page))
 			continue;
 
 		/* If we already hold the lock, we can skip some rechecking */
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
@ 2015-08-12 14:35           ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-08-12 14:35 UTC (permalink / raw)
  To: Hugh Dickins, David Rientjes, Vlastimil Babka
  Cc: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli, Dave Hansen,
	Mel Gorman, Rik van Riel, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, linux-kernel, linux-mm

On Thu, Aug 06, 2015 at 12:24:22PM -0700, Hugh Dickins wrote:
> > IIUC, the only potentially problematic callsites left are physical memory
> > scanners. This code requires audit. I'll do that.
> 
> Please.

I haven't finished the exercise yet. But here's an issue I believe present
in current *Linus* tree:

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
  2015-08-12 14:35           ` Kirill A. Shutemov
@ 2015-08-12 14:47             ` Vlastimil Babka
  -1 siblings, 0 replies; 119+ messages in thread
From: Vlastimil Babka @ 2015-08-12 14:47 UTC (permalink / raw)
  To: Kirill A. Shutemov, Hugh Dickins, David Rientjes
  Cc: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli, Dave Hansen,
	Mel Gorman, Rik van Riel, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, linux-kernel, linux-mm

On 08/12/2015 04:35 PM, Kirill A. Shutemov wrote:
> On Thu, Aug 06, 2015 at 12:24:22PM -0700, Hugh Dickins wrote:
>>> IIUC, the only potentially problematic callsites left are physical memory
>>> scanners. This code requires audit. I'll do that.
>>
>> Please.
>
> I haven't finished the exercise yet. But here's an issue I believe present
> in current *Linus* tree:
>
>  From e78eec7d7a8c4cba8b5952a997973f7741e704f4 Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Wed, 12 Aug 2015 17:09:16 +0300
> Subject: [PATCH] mm: fix potential race in isolate_migratepages_block()
>
> Hugh has pointed that compound_head() call can be unsafe in some context.
> There's one example:
>
> 	CPU0					CPU1
>
> isolate_migratepages_block()
>    page_count()
>      compound_head()
>        !!PageTail() == true
> 					put_page()
> 					  tail->first_page = NULL
>        head = tail->first_page
> 					alloc_pages(__GFP_COMP)
> 					   prep_compound_page()
> 					     tail->first_page = head
> 					     __SetPageTail(p);
>        !!PageTail() == true
>      <head == NULL dereferencing>
>
> The race is pure theoretical. I don't it's possible to trigger it in
> practice. But who knows.

It's even less probable thanks to the fact that before this check we 
determined it's a PageLRU (and thus !PageTail).

>
> This can be fixed by avoiding compound_head() in unsafe context.

This is OK because if page becomes tail and we read zero page count, 
it's not fatal.

> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Fixes: 119d6d59dc ("mm, compaction: avoid isolating pinned pages")

Potentially stable 3.15+ if theoretical races qualify. They don't per 
stable rules, but we seem to be bending that a lot anyway.

> Cc: Hugh Dickins <hughd@google.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>   mm/compaction.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 86f04e556f96..bec727b700d3 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -787,7 +787,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>   		 * admittedly racy check.
>   		 */
>   		if (!page_mapping(page) &&
> -		    page_count(page) > page_mapcount(page))
> +		    atomic_read(&page->_count) > page_mapcount(page))
>   			continue;
>
>   		/* If we already hold the lock, we can skip some rechecking */
>


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
@ 2015-08-12 14:47             ` Vlastimil Babka
  0 siblings, 0 replies; 119+ messages in thread
From: Vlastimil Babka @ 2015-08-12 14:47 UTC (permalink / raw)
  To: Kirill A. Shutemov, Hugh Dickins, David Rientjes
  Cc: Kirill A. Shutemov, Andrew Morton, Andrea Arcangeli, Dave Hansen,
	Mel Gorman, Rik van Riel, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, linux-kernel, linux-mm

On 08/12/2015 04:35 PM, Kirill A. Shutemov wrote:
> On Thu, Aug 06, 2015 at 12:24:22PM -0700, Hugh Dickins wrote:
>>> IIUC, the only potentially problematic callsites left are physical memory
>>> scanners. This code requires audit. I'll do that.
>>
>> Please.
>
> I haven't finished the exercise yet. But here's an issue I believe present
> in current *Linus* tree:
>
>  From e78eec7d7a8c4cba8b5952a997973f7741e704f4 Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Wed, 12 Aug 2015 17:09:16 +0300
> Subject: [PATCH] mm: fix potential race in isolate_migratepages_block()
>
> Hugh has pointed that compound_head() call can be unsafe in some context.
> There's one example:
>
> 	CPU0					CPU1
>
> isolate_migratepages_block()
>    page_count()
>      compound_head()
>        !!PageTail() == true
> 					put_page()
> 					  tail->first_page = NULL
>        head = tail->first_page
> 					alloc_pages(__GFP_COMP)
> 					   prep_compound_page()
> 					     tail->first_page = head
> 					     __SetPageTail(p);
>        !!PageTail() == true
>      <head == NULL dereferencing>
>
> The race is pure theoretical. I don't it's possible to trigger it in
> practice. But who knows.

It's even less probable thanks to the fact that before this check we 
determined it's a PageLRU (and thus !PageTail).

>
> This can be fixed by avoiding compound_head() in unsafe context.

This is OK because if page becomes tail and we read zero page count, 
it's not fatal.

> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Fixes: 119d6d59dc ("mm, compaction: avoid isolating pinned pages")

Potentially stable 3.15+ if theoretical races qualify. They don't per 
stable rules, but we seem to be bending that a lot anyway.

> Cc: Hugh Dickins <hughd@google.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>   mm/compaction.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 86f04e556f96..bec727b700d3 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -787,7 +787,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>   		 * admittedly racy check.
>   		 */
>   		if (!page_mapping(page) &&
> -		    page_count(page) > page_mapcount(page))
> +		    atomic_read(&page->_count) > page_mapcount(page))
>   			continue;
>
>   		/* If we already hold the lock, we can skip some rechecking */
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
  2015-08-12 14:35           ` Kirill A. Shutemov
@ 2015-08-12 21:16             ` Andrew Morton
  -1 siblings, 0 replies; 119+ messages in thread
From: Andrew Morton @ 2015-08-12 21:16 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, David Rientjes, Vlastimil Babka,
	Kirill A. Shutemov, Andrea Arcangeli, Dave Hansen, Mel Gorman,
	Rik van Riel, Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm

On Wed, 12 Aug 2015 17:35:09 +0300 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Thu, Aug 06, 2015 at 12:24:22PM -0700, Hugh Dickins wrote:
> > > IIUC, the only potentially problematic callsites left are physical memory
> > > scanners. This code requires audit. I'll do that.
> > 
> > Please.
> 
> I haven't finished the exercise yet. But here's an issue I believe present
> in current *Linus* tree:
> 
> >From e78eec7d7a8c4cba8b5952a997973f7741e704f4 Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Wed, 12 Aug 2015 17:09:16 +0300
> Subject: [PATCH] mm: fix potential race in isolate_migratepages_block()
> 
> Hugh has pointed that compound_head() call can be unsafe in some context.
> There's one example:
> 
> 	CPU0					CPU1
> 
> isolate_migratepages_block()
>   page_count()
>     compound_head()
>       !!PageTail() == true
> 					put_page()
> 					  tail->first_page = NULL
>       head = tail->first_page
> 					alloc_pages(__GFP_COMP)
> 					   prep_compound_page()
> 					     tail->first_page = head
> 					     __SetPageTail(p);
>       !!PageTail() == true
>     <head == NULL dereferencing>
> 
> The race is pure theoretical. I don't it's possible to trigger it in
> practice. But who knows.
> 
> This can be fixed by avoiding compound_head() in unsafe context.

This is nuts :( page_count() should Just Work without us having to
worry about bizarre races against splitting.  Sigh.

> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -787,7 +787,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  		 * admittedly racy check.
>  		 */
>  		if (!page_mapping(page) &&
> -		    page_count(page) > page_mapcount(page))
> +		    atomic_read(&page->_count) > page_mapcount(page))
>  			continue;

If we're going to do this sort of thing, can we please do it in a more
transparent manner?  Let's not sprinkle unexplained and
incomprehensible direct accesses to ->_count all over the place.

Create a formal function to do this, with an appropriate name and with
documentation which fully explains what's going on.  Then use that
here, and in has_unmovable_pages() (at least).

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
@ 2015-08-12 21:16             ` Andrew Morton
  0 siblings, 0 replies; 119+ messages in thread
From: Andrew Morton @ 2015-08-12 21:16 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, David Rientjes, Vlastimil Babka,
	Kirill A. Shutemov, Andrea Arcangeli, Dave Hansen, Mel Gorman,
	Rik van Riel, Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm

On Wed, 12 Aug 2015 17:35:09 +0300 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Thu, Aug 06, 2015 at 12:24:22PM -0700, Hugh Dickins wrote:
> > > IIUC, the only potentially problematic callsites left are physical memory
> > > scanners. This code requires audit. I'll do that.
> > 
> > Please.
> 
> I haven't finished the exercise yet. But here's an issue I believe present
> in current *Linus* tree:
> 
> >From e78eec7d7a8c4cba8b5952a997973f7741e704f4 Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Wed, 12 Aug 2015 17:09:16 +0300
> Subject: [PATCH] mm: fix potential race in isolate_migratepages_block()
> 
> Hugh has pointed that compound_head() call can be unsafe in some context.
> There's one example:
> 
> 	CPU0					CPU1
> 
> isolate_migratepages_block()
>   page_count()
>     compound_head()
>       !!PageTail() == true
> 					put_page()
> 					  tail->first_page = NULL
>       head = tail->first_page
> 					alloc_pages(__GFP_COMP)
> 					   prep_compound_page()
> 					     tail->first_page = head
> 					     __SetPageTail(p);
>       !!PageTail() == true
>     <head == NULL dereferencing>
> 
> The race is pure theoretical. I don't it's possible to trigger it in
> practice. But who knows.
> 
> This can be fixed by avoiding compound_head() in unsafe context.

This is nuts :( page_count() should Just Work without us having to
worry about bizarre races against splitting.  Sigh.

> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -787,7 +787,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  		 * admittedly racy check.
>  		 */
>  		if (!page_mapping(page) &&
> -		    page_count(page) > page_mapcount(page))
> +		    atomic_read(&page->_count) > page_mapcount(page))
>  			continue;

If we're going to do this sort of thing, can we please do it in a more
transparent manner?  Let's not sprinkle unexplained and
incomprehensible direct accesses to ->_count all over the place.

Create a formal function to do this, with an appropriate name and with
documentation which fully explains what's going on.  Then use that
here, and in has_unmovable_pages() (at least).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
  2015-08-12 21:16             ` Andrew Morton
@ 2015-08-12 22:21               ` Kirill A. Shutemov
  -1 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-08-12 22:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, David Rientjes, Vlastimil Babka,
	Kirill A. Shutemov, Andrea Arcangeli, Dave Hansen, Mel Gorman,
	Rik van Riel, Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm

On Wed, Aug 12, 2015 at 02:16:44PM -0700, Andrew Morton wrote:
> On Wed, 12 Aug 2015 17:35:09 +0300 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Thu, Aug 06, 2015 at 12:24:22PM -0700, Hugh Dickins wrote:
> > > > IIUC, the only potentially problematic callsites left are physical memory
> > > > scanners. This code requires audit. I'll do that.
> > > 
> > > Please.
> > 
> > I haven't finished the exercise yet. But here's an issue I believe present
> > in current *Linus* tree:
> > 
> > >From e78eec7d7a8c4cba8b5952a997973f7741e704f4 Mon Sep 17 00:00:00 2001
> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > Date: Wed, 12 Aug 2015 17:09:16 +0300
> > Subject: [PATCH] mm: fix potential race in isolate_migratepages_block()
> > 
> > Hugh has pointed that compound_head() call can be unsafe in some context.
> > There's one example:
> > 
> > 	CPU0					CPU1
> > 
> > isolate_migratepages_block()
> >   page_count()
> >     compound_head()
> >       !!PageTail() == true
> > 					put_page()
> > 					  tail->first_page = NULL
> >       head = tail->first_page
> > 					alloc_pages(__GFP_COMP)
> > 					   prep_compound_page()
> > 					     tail->first_page = head
> > 					     __SetPageTail(p);
> >       !!PageTail() == true
> >     <head == NULL dereferencing>
> > 
> > The race is pure theoretical. I don't it's possible to trigger it in
> > practice. But who knows.
> > 
> > This can be fixed by avoiding compound_head() in unsafe context.
> 
> This is nuts :( page_count() should Just Work without us having to
> worry about bizarre races against splitting.  Sigh.

Split is not involved. And this race is present even for THP=n. :(

> 
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -787,7 +787,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> >  		 * admittedly racy check.
> >  		 */
> >  		if (!page_mapping(page) &&
> > -		    page_count(page) > page_mapcount(page))
> > +		    atomic_read(&page->_count) > page_mapcount(page))
> >  			continue;
> 
> If we're going to do this sort of thing, can we please do it in a more
> transparent manner?  Let's not sprinkle unexplained and
> incomprehensible direct accesses to ->_count all over the place.
> 
> Create a formal function to do this, with an appropriate name and with
> documentation which fully explains what's going on.  Then use that
> here, and in has_unmovable_pages() (at least).

All this situation is ugly. I'm thinking on more general solution for
PageTail() vs. ->first_page race.

We would be able to avoid the race in first place if we encode PageTail()
and position of head page within the same word in struct page. This way we
update both thing in one shot without possibility of race.

Details get tricky.

I'm going to try tomorrow something like this: encode the position of head
as offset from the tail page and store it as negative number in the union
with ->mapping and ->s_mem. PageTail() can be implemented as check value
of the field to be in range -1..-MAX_ORDER_NR_PAGES. 

I'm not sure at all if it's going to work, especially looking on
ridiculously high CONFIG_FORCE_MAX_ZONEORDER some architectures allow.

We could also try to encode page order instead (again as negative number)
and calculate head page position based on alignment...

Any other ideas are welcome.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
@ 2015-08-12 22:21               ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-08-12 22:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, David Rientjes, Vlastimil Babka,
	Kirill A. Shutemov, Andrea Arcangeli, Dave Hansen, Mel Gorman,
	Rik van Riel, Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm

On Wed, Aug 12, 2015 at 02:16:44PM -0700, Andrew Morton wrote:
> On Wed, 12 Aug 2015 17:35:09 +0300 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Thu, Aug 06, 2015 at 12:24:22PM -0700, Hugh Dickins wrote:
> > > > IIUC, the only potentially problematic callsites left are physical memory
> > > > scanners. This code requires audit. I'll do that.
> > > 
> > > Please.
> > 
> > I haven't finished the exercise yet. But here's an issue I believe present
> > in current *Linus* tree:
> > 
> > >From e78eec7d7a8c4cba8b5952a997973f7741e704f4 Mon Sep 17 00:00:00 2001
> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > Date: Wed, 12 Aug 2015 17:09:16 +0300
> > Subject: [PATCH] mm: fix potential race in isolate_migratepages_block()
> > 
> > Hugh has pointed that compound_head() call can be unsafe in some context.
> > There's one example:
> > 
> > 	CPU0					CPU1
> > 
> > isolate_migratepages_block()
> >   page_count()
> >     compound_head()
> >       !!PageTail() == true
> > 					put_page()
> > 					  tail->first_page = NULL
> >       head = tail->first_page
> > 					alloc_pages(__GFP_COMP)
> > 					   prep_compound_page()
> > 					     tail->first_page = head
> > 					     __SetPageTail(p);
> >       !!PageTail() == true
> >     <head == NULL dereferencing>
> > 
> > The race is pure theoretical. I don't it's possible to trigger it in
> > practice. But who knows.
> > 
> > This can be fixed by avoiding compound_head() in unsafe context.
> 
> This is nuts :( page_count() should Just Work without us having to
> worry about bizarre races against splitting.  Sigh.

Split is not involved. And this race is present even for THP=n. :(

> 
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -787,7 +787,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> >  		 * admittedly racy check.
> >  		 */
> >  		if (!page_mapping(page) &&
> > -		    page_count(page) > page_mapcount(page))
> > +		    atomic_read(&page->_count) > page_mapcount(page))
> >  			continue;
> 
> If we're going to do this sort of thing, can we please do it in a more
> transparent manner?  Let's not sprinkle unexplained and
> incomprehensible direct accesses to ->_count all over the place.
> 
> Create a formal function to do this, with an appropriate name and with
> documentation which fully explains what's going on.  Then use that
> here, and in has_unmovable_pages() (at least).

All this situation is ugly. I'm thinking on more general solution for
PageTail() vs. ->first_page race.

We would be able to avoid the race in first place if we encode PageTail()
and position of head page within the same word in struct page. This way we
update both thing in one shot without possibility of race.

Details get tricky.

I'm going to try tomorrow something like this: encode the position of head
as offset from the tail page and store it as negative number in the union
with ->mapping and ->s_mem. PageTail() can be implemented as check value
of the field to be in range -1..-MAX_ORDER_NR_PAGES. 

I'm not sure at all if it's going to work, especially looking on
ridiculously high CONFIG_FORCE_MAX_ZONEORDER some architectures allow.

We could also try to encode page order instead (again as negative number)
and calculate head page position based on alignment...

Any other ideas are welcome.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
  2015-08-12 22:21               ` Kirill A. Shutemov
@ 2015-08-13  4:12                 ` Hugh Dickins
  -1 siblings, 0 replies; 119+ messages in thread
From: Hugh Dickins @ 2015-08-13  4:12 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Greg Thelen, Hugh Dickins, David Rientjes,
	Vlastimil Babka, Kirill A. Shutemov, Andrea Arcangeli,
	Dave Hansen, Mel Gorman, Rik van Riel, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On Thu, 13 Aug 2015, Kirill A. Shutemov wrote:
> 
> All this situation is ugly. I'm thinking on more general solution for
> PageTail() vs. ->first_page race.
> 
> We would be able to avoid the race in first place if we encode PageTail()
> and position of head page within the same word in struct page. This way we
> update both thing in one shot without possibility of race.
> 
> Details get tricky.
> 
> I'm going to try tomorrow something like this: encode the position of head
> as offset from the tail page and store it as negative number in the union
> with ->mapping and ->s_mem. PageTail() can be implemented as check value
> of the field to be in range -1..-MAX_ORDER_NR_PAGES. 
> 
> I'm not sure at all if it's going to work, especially looking on
> ridiculously high CONFIG_FORCE_MAX_ZONEORDER some architectures allow.
> 
> We could also try to encode page order instead (again as negative number)
> and calculate head page position based on alignment...
> 
> Any other ideas are welcome.

Good luck, I've not given it any thought, but hope it works out:
my reasoning was the same when I put the PageAnon bit into
page->mapping instead of page->flags.

Something to beware of though: although exceedingly unlikely to be a
problem, page->mapping always contained a pointer to or into a relevant
structure, or else something that could not possibly be a kernel pointer,
when I was working on KSM swapping: see comment above get_ksm_page() in
mm/ksm.c.  It is best to keep page->mapping for pointers if possible
(and probably avoid having the PageAnon bit set unless really Anon).

I've only just read your mail, and I'm too slow a thinker to have
worked through your isolate_migratepages_block() race yet.  But, given
the timing, cannot resist sending you a code fragment I wrote earlier
today for our v3.11-based kernel: which still has compound_trans_order(),
which we had been using in a similar racy physical scan.

I'm not for a moment suggesting that this fragment is relevant to your
race; but it is something amusing to consider when you're thinking of
such races.  Credit to Greg Thelen for thinking of the prep_compound_page()
end of it, when I'd been focussed on the __split_huge_page_refcount() end.

	/*
	 * It is not safe to use compound_lock (inside compound_trans_order)
	 * until we have a reference on the page (okay, done above) and have
	 * then seen PageLRU on it (just below): because mm/huge_memory.c uses
	 * the non-atomic __SetPageUptodate on a freshly allocated THPage in
	 * several places, believing it to be invisible to the outside world,
	 * but liable to race and leave PG_compound_lock set when cleared here.
	 */
	nr_pages = 1;
	if (PageHead(page)) {
		/*
		 * smp_rmb() against the smp_wmb() in the first iteration of
		 * prep_compound_page(), so that the PageTail test ensures
		 * that compound_order(page) is now correctly readable.
		 */
		smp_rmb();
		if (PageTail(page + 1)) {
			nr_pages = 1 << compound_order(page);
			/*
			 * Then smp_rmb() against smp_wmb() in last iteration of
			 * __split_huge_page_refcount(), to ensure that has not
			 * yet written something else into page[1].lru.prev.
			 */
			smp_rmb();
			if (!PageTail(page + 1))
				nr_pages = 1;
		}
	}

Hugh

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
@ 2015-08-13  4:12                 ` Hugh Dickins
  0 siblings, 0 replies; 119+ messages in thread
From: Hugh Dickins @ 2015-08-13  4:12 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Greg Thelen, Hugh Dickins, David Rientjes,
	Vlastimil Babka, Kirill A. Shutemov, Andrea Arcangeli,
	Dave Hansen, Mel Gorman, Rik van Riel, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On Thu, 13 Aug 2015, Kirill A. Shutemov wrote:
> 
> All this situation is ugly. I'm thinking on more general solution for
> PageTail() vs. ->first_page race.
> 
> We would be able to avoid the race in first place if we encode PageTail()
> and position of head page within the same word in struct page. This way we
> update both thing in one shot without possibility of race.
> 
> Details get tricky.
> 
> I'm going to try tomorrow something like this: encode the position of head
> as offset from the tail page and store it as negative number in the union
> with ->mapping and ->s_mem. PageTail() can be implemented as check value
> of the field to be in range -1..-MAX_ORDER_NR_PAGES. 
> 
> I'm not sure at all if it's going to work, especially looking on
> ridiculously high CONFIG_FORCE_MAX_ZONEORDER some architectures allow.
> 
> We could also try to encode page order instead (again as negative number)
> and calculate head page position based on alignment...
> 
> Any other ideas are welcome.

Good luck, I've not given it any thought, but hope it works out:
my reasoning was the same when I put the PageAnon bit into
page->mapping instead of page->flags.

Something to beware of though: although exceedingly unlikely to be a
problem, page->mapping always contained a pointer to or into a relevant
structure, or else something that could not possibly be a kernel pointer,
when I was working on KSM swapping: see comment above get_ksm_page() in
mm/ksm.c.  It is best to keep page->mapping for pointers if possible
(and probably avoid having the PageAnon bit set unless really Anon).

I've only just read your mail, and I'm too slow a thinker to have
worked through your isolate_migratepages_block() race yet.  But, given
the timing, cannot resist sending you a code fragment I wrote earlier
today for our v3.11-based kernel: which still has compound_trans_order(),
which we had been using in a similar racy physical scan.

I'm not for a moment suggesting that this fragment is relevant to your
race; but it is something amusing to consider when you're thinking of
such races.  Credit to Greg Thelen for thinking of the prep_compound_page()
end of it, when I'd been focussed on the __split_huge_page_refcount() end.

	/*
	 * It is not safe to use compound_lock (inside compound_trans_order)
	 * until we have a reference on the page (okay, done above) and have
	 * then seen PageLRU on it (just below): because mm/huge_memory.c uses
	 * the non-atomic __SetPageUptodate on a freshly allocated THPage in
	 * several places, believing it to be invisible to the outside world,
	 * but liable to race and leave PG_compound_lock set when cleared here.
	 */
	nr_pages = 1;
	if (PageHead(page)) {
		/*
		 * smp_rmb() against the smp_wmb() in the first iteration of
		 * prep_compound_page(), so that the PageTail test ensures
		 * that compound_order(page) is now correctly readable.
		 */
		smp_rmb();
		if (PageTail(page + 1)) {
			nr_pages = 1 << compound_order(page);
			/*
			 * Then smp_rmb() against smp_wmb() in last iteration of
			 * __split_huge_page_refcount(), to ensure that has not
			 * yet written something else into page[1].lru.prev.
			 */
			smp_rmb();
			if (!PageTail(page + 1))
				nr_pages = 1;
		}
	}

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
  2015-08-07 14:49           ` Kirill A. Shutemov
@ 2015-08-13  5:10             ` Hugh Dickins
  -1 siblings, 0 replies; 119+ messages in thread
From: Hugh Dickins @ 2015-08-13  5:10 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Kirill A. Shutemov, Andrew Morton,
	Andrea Arcangeli, David Rientjes, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On Fri, 7 Aug 2015, Kirill A. Shutemov wrote:
> On Thu, Aug 06, 2015 at 12:24:22PM -0700, Hugh Dickins wrote:
> > 
> > Oh, and I know a patchset which avoids these problems completely,
> > by not using compound pages at all ;)
> 
> BTW, I haven't heard anything about the patchset for a while.
> What's the status?

It's gone well, and being put into wider use here.  But I'm not
one for monthly updates of large patchsets myself, always too much
to do; and nobody else seemed anxious to have it yet, back in March.

As I said at the time of posting huge tmpfs against v3.19, it was
fully working (and little changed since), but once memory pressure
had disbanded a team to swap it out, there was nothing to put it
together again later, to restore the original hugepage performance.

I couldn't imagine people putting it into real use while that remained
the case, so spent the next months adding "huge tmpfs recovery" -
considered hooking into khugepaged, but settled on work item queued
from fault.

Which has worked out well, except that I had to rush it in before
I went on vacation in June, then spent last month fixing all the
concurrent hole-punching bugs Andres found with his fuzzing while
I was away.  Busy time, stable now; but I do want to reconsider a
few rushed decisions before offering the rebased and extended set.

And there's three pieces of the work not begun:

The page-table allocation delay in mm/memory.c had been great for
the first posting, but not good enough for recovery (replacing ptes
by pmd): for the moment I skate around that by guarding with mmap_sem,
but mmap_sem usually ends up regrettable, and shouldn't be necessary -
there's just a lot of scattered walks to work through, adjusting them
to racy replacement of ptes by pmd.  Maybe I can get away without
doing this for now, we seem to be working well enough without it.

And I suspect that my queueing a recovery work item from fault
is over eager, needs some stats and knobs to tune it down.  Though
not surfaced as a problem yet; and I don't think we could live with
the opposite extreme, of khugepaged lumbering its way around the vmas.

But the one I think I shall have to do something better about before
posting, is NUMA.  For a confluence of reasons, that rule out swapin
readahead for now, it's not a serious issue for us as yet.  But swapin
readahead and NUMA have always been a joke in tmpfs, and I'll be
amplifying that joke with my current NUMA placement in recovery.
Unfortunately, there's a lot of opportunity to make silly mistakes
when trying to get NUMA right: I doubt I can get it right, but do
need to get it a little less wrong before letting others take over.

> 
> Optimizing rmap operations in my patchset (see PG_double_map), I found
> that it would be very tricky to expand team pages to anon-THP without
> performance regression on rmap side due to amount of atomic ops it
> requires.

Thanks for thinking of it: I've been too busy with the recovery
to put more thought into extending teams to anon THP, though I'd
certainly like to try that once the huge tmpfs end is "complete".

Yes, there's not a doubt that starting from compound pages is more
rigid but should involve much less repetition; whereas starting from
the other end with a team of ordinary 4k pages, more flexible but a
lot of wasted effort.  I can't predict where we shall meet.

> 
> Is there any clever approach to the issue?

I'd been hoping that I could implement first, and then optimize away
the unnecessary; but you're right that it's easier to live with that
in the pagecache case, whereas with anon THP it would be a regression.

Hugh

> 
> Team pages are probably fine for file mappings due different performance
> baseline. I'm less optimistic about anon-THP.
> 
> -- 
>  Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: page-flags behavior on compound pages: a worry
@ 2015-08-13  5:10             ` Hugh Dickins
  0 siblings, 0 replies; 119+ messages in thread
From: Hugh Dickins @ 2015-08-13  5:10 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Kirill A. Shutemov, Andrew Morton,
	Andrea Arcangeli, David Rientjes, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On Fri, 7 Aug 2015, Kirill A. Shutemov wrote:
> On Thu, Aug 06, 2015 at 12:24:22PM -0700, Hugh Dickins wrote:
> > 
> > Oh, and I know a patchset which avoids these problems completely,
> > by not using compound pages at all ;)
> 
> BTW, I haven't heard anything about the patchset for a while.
> What's the status?

It's gone well, and being put into wider use here.  But I'm not
one for monthly updates of large patchsets myself, always too much
to do; and nobody else seemed anxious to have it yet, back in March.

As I said at the time of posting huge tmpfs against v3.19, it was
fully working (and little changed since), but once memory pressure
had disbanded a team to swap it out, there was nothing to put it
together again later, to restore the original hugepage performance.

I couldn't imagine people putting it into real use while that remained
the case, so spent the next months adding "huge tmpfs recovery" -
considered hooking into khugepaged, but settled on work item queued
from fault.

Which has worked out well, except that I had to rush it in before
I went on vacation in June, then spent last month fixing all the
concurrent hole-punching bugs Andres found with his fuzzing while
I was away.  Busy time, stable now; but I do want to reconsider a
few rushed decisions before offering the rebased and extended set.

And there's three pieces of the work not begun:

The page-table allocation delay in mm/memory.c had been great for
the first posting, but not good enough for recovery (replacing ptes
by pmd): for the moment I skate around that by guarding with mmap_sem,
but mmap_sem usually ends up regrettable, and shouldn't be necessary -
there's just a lot of scattered walks to work through, adjusting them
to racy replacement of ptes by pmd.  Maybe I can get away without
doing this for now, we seem to be working well enough without it.

And I suspect that my queueing a recovery work item from fault
is over eager, needs some stats and knobs to tune it down.  Though
not surfaced as a problem yet; and I don't think we could live with
the opposite extreme, of khugepaged lumbering its way around the vmas.

But the one I think I shall have to do something better about before
posting, is NUMA.  For a confluence of reasons, that rule out swapin
readahead for now, it's not a serious issue for us as yet.  But swapin
readahead and NUMA have always been a joke in tmpfs, and I'll be
amplifying that joke with my current NUMA placement in recovery.
Unfortunately, there's a lot of opportunity to make silly mistakes
when trying to get NUMA right: I doubt I can get it right, but do
need to get it a little less wrong before letting others take over.

> 
> Optimizing rmap operations in my patchset (see PG_double_map), I found
> that it would be very tricky to expand team pages to anon-THP without
> performance regression on rmap side due to amount of atomic ops it
> requires.

Thanks for thinking of it: I've been too busy with the recovery
to put more thought into extending teams to anon THP, though I'd
certainly like to try that once the huge tmpfs end is "complete".

Yes, there's not a doubt that starting from compound pages is more
rigid but should involve much less repetition; whereas starting from
the other end with a team of ordinary 4k pages, more flexible but a
lot of wasted effort.  I can't predict where we shall meet.

> 
> Is there any clever approach to the issue?

I'd been hoping that I could implement first, and then optimize away
the unnecessary; but you're right that it's easier to live with that
in the pagecache case, whereas with anon THP it would be a regression.

Hugh

> 
> Team pages are probably fine for file mappings due different performance
> baseline. I'm less optimistic about anon-THP.
> 
> -- 
>  Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 09/16] page-flags: define PG_reserved behavior on compound pages
  2015-03-19 17:08   ` Kirill A. Shutemov
  (?)
@ 2020-01-31 15:24   ` Chris Wilson
  2020-02-03 15:18     ` Kirill A. Shutemov
  -1 siblings, 1 reply; 119+ messages in thread
From: Chris Wilson @ 2020-01-31 15:24 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrea Arcangeli, Andrew Morton, Hugh Dickins
  Cc: Dave Hansen, Mel Gorman, Rik van Riel, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm, Kirill A. Shutemov

Quoting Kirill A. Shutemov (2015-03-19 17:08:15)
> As far as I can see there's no users of PG_reserved on compound pages.
> Let's use NO_COMPOUND here.

Much later than you would ever expect, but we just had a user update an
ancient device and trip over this.
https://gitlab.freedesktop.org/drm/intel/issues/1027

In drm_pci_alloc() we allocate a high-order page (for it to be physically
contiguous) and mark each page as Reserved.

        dmah->vaddr = dma_alloc_coherent(&dev->pdev->dev, size,
                                         &dmah->busaddr,
                                         GFP_KERNEL | __GFP_COMP);

        /* XXX - Is virt_to_page() legal for consistent mem? */
        /* Reserve */
        for (addr = (unsigned long)dmah->vaddr, sz = size;
             sz > 0; addr += PAGE_SIZE, sz -= PAGE_SIZE) {
                SetPageReserved(virt_to_page((void *)addr));
        }

It's been doing that since

commit ddf19b973be5a96d77c8467f657fe5bd7d126e0f
Author: Dave Airlie <airlied@linux.ie>
Date:   Sun Mar 19 18:56:12 2006 +1100

    drm: fixup PCI DMA support

I haven't found anything to say if we are meant to be reserving the
pages or not. So I bring it to your attention, asking for help.

Thanks,
-Chris

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 09/16] page-flags: define PG_reserved behavior on compound pages
  2020-01-31 15:24   ` Chris Wilson
@ 2020-02-03 15:18     ` Kirill A. Shutemov
  2020-02-03 15:24       ` Chris Wilson
  2020-02-03 17:29       ` Christoph Hellwig
  0 siblings, 2 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2020-02-03 15:18 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Andrew Morton,
	Hugh Dickins, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, linux-kernel, linux-mm

On Fri, Jan 31, 2020 at 03:24:12PM +0000, Chris Wilson wrote:
> Quoting Kirill A. Shutemov (2015-03-19 17:08:15)
> > As far as I can see there's no users of PG_reserved on compound pages.
> > Let's use NO_COMPOUND here.
> 
> Much later than you would ever expect, but we just had a user update an
> ancient device and trip over this.
> https://gitlab.freedesktop.org/drm/intel/issues/1027
> 
> In drm_pci_alloc() we allocate a high-order page (for it to be physically
> contiguous) and mark each page as Reserved.
> 
>         dmah->vaddr = dma_alloc_coherent(&dev->pdev->dev, size,
>                                          &dmah->busaddr,
>                                          GFP_KERNEL | __GFP_COMP);
> 
>         /* XXX - Is virt_to_page() legal for consistent mem? */
>         /* Reserve */
>         for (addr = (unsigned long)dmah->vaddr, sz = size;
>              sz > 0; addr += PAGE_SIZE, sz -= PAGE_SIZE) {
>                 SetPageReserved(virt_to_page((void *)addr));
>         }
> 
> It's been doing that since
> 
> commit ddf19b973be5a96d77c8467f657fe5bd7d126e0f
> Author: Dave Airlie <airlied@linux.ie>
> Date:   Sun Mar 19 18:56:12 2006 +1100
> 
>     drm: fixup PCI DMA support
> 
> I haven't found anything to say if we are meant to be reserving the
> pages or not. So I bring it to your attention, asking for help.

I don't see a real reason for these pages to be reserved. But I might be
wrong here.

I tried to look around: other users (infiniband/ethernet) of
dma_alloc_coherent(__GFP_COMP) don't mess with PG_reserved.

Could you try to drop it from DRM?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 09/16] page-flags: define PG_reserved behavior on compound pages
  2020-02-03 15:18     ` Kirill A. Shutemov
@ 2020-02-03 15:24       ` Chris Wilson
  2020-02-03 17:10         ` David Hildenbrand
  2020-02-03 17:29       ` Christoph Hellwig
  1 sibling, 1 reply; 119+ messages in thread
From: Chris Wilson @ 2020-02-03 15:24 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Andrew Morton,
	Hugh Dickins, Dave Hansen, Mel Gorman, Rik van Riel,
	Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
	Steve Capper, Aneesh Kumar K.V, Johannes Weiner, Michal Hocko,
	Jerome Marchand, linux-kernel, linux-mm

Quoting Kirill A. Shutemov (2020-02-03 15:18:44)
> On Fri, Jan 31, 2020 at 03:24:12PM +0000, Chris Wilson wrote:
> > Quoting Kirill A. Shutemov (2015-03-19 17:08:15)
> > > As far as I can see there's no users of PG_reserved on compound pages.
> > > Let's use NO_COMPOUND here.
> > 
> > Much later than you would ever expect, but we just had a user update an
> > ancient device and trip over this.
> > https://gitlab.freedesktop.org/drm/intel/issues/1027
> > 
> > In drm_pci_alloc() we allocate a high-order page (for it to be physically
> > contiguous) and mark each page as Reserved.
> > 
> >         dmah->vaddr = dma_alloc_coherent(&dev->pdev->dev, size,
> >                                          &dmah->busaddr,
> >                                          GFP_KERNEL | __GFP_COMP);
> > 
> >         /* XXX - Is virt_to_page() legal for consistent mem? */
> >         /* Reserve */
> >         for (addr = (unsigned long)dmah->vaddr, sz = size;
> >              sz > 0; addr += PAGE_SIZE, sz -= PAGE_SIZE) {
> >                 SetPageReserved(virt_to_page((void *)addr));
> >         }
> > 
> > It's been doing that since
> > 
> > commit ddf19b973be5a96d77c8467f657fe5bd7d126e0f
> > Author: Dave Airlie <airlied@linux.ie>
> > Date:   Sun Mar 19 18:56:12 2006 +1100
> > 
> >     drm: fixup PCI DMA support
> > 
> > I haven't found anything to say if we are meant to be reserving the
> > pages or not. So I bring it to your attention, asking for help.
> 
> I don't see a real reason for these pages to be reserved. But I might be
> wrong here.
> 
> I tried to look around: other users (infiniband/ethernet) of
> dma_alloc_coherent(__GFP_COMP) don't mess with PG_reserved.
> 
> Could you try to drop it from DRM?

That is the current plan. So long as there is nothing magical about
either the __GFP_COMP or SetPageReserved, we should be able to drop them
without any functional change. Only 2 very old bits of HW (r128, ancient
i915) depend on this routine, and i915 seems, touch wood, quite happy
with a plain dma_alloc_coherent().
-Chris

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 09/16] page-flags: define PG_reserved behavior on compound pages
  2020-02-03 15:24       ` Chris Wilson
@ 2020-02-03 17:10         ` David Hildenbrand
  0 siblings, 0 replies; 119+ messages in thread
From: David Hildenbrand @ 2020-02-03 17:10 UTC (permalink / raw)
  To: Chris Wilson, Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Andrew Morton,
	Hugh Dickins, Dave Hansen, Mel Gorman, Vlastimil Babka,
	Christoph Lameter, Naoya Horiguchi, Steve Capper,
	Aneesh Kumar K.V, Johannes Weiner, Michal Hocko, Jerome Marchand,
	linux-kernel, linux-mm

On 03.02.20 16:24, Chris Wilson wrote:
> Quoting Kirill A. Shutemov (2020-02-03 15:18:44)
>> On Fri, Jan 31, 2020 at 03:24:12PM +0000, Chris Wilson wrote:
>>> Quoting Kirill A. Shutemov (2015-03-19 17:08:15)
>>>> As far as I can see there's no users of PG_reserved on compound pages.
>>>> Let's use NO_COMPOUND here.
>>>
>>> Much later than you would ever expect, but we just had a user update an
>>> ancient device and trip over this.
>>> https://gitlab.freedesktop.org/drm/intel/issues/1027
>>>
>>> In drm_pci_alloc() we allocate a high-order page (for it to be physically
>>> contiguous) and mark each page as Reserved.
>>>
>>>         dmah->vaddr = dma_alloc_coherent(&dev->pdev->dev, size,
>>>                                          &dmah->busaddr,
>>>                                          GFP_KERNEL | __GFP_COMP);
>>>
>>>         /* XXX - Is virt_to_page() legal for consistent mem? */
>>>         /* Reserve */
>>>         for (addr = (unsigned long)dmah->vaddr, sz = size;
>>>              sz > 0; addr += PAGE_SIZE, sz -= PAGE_SIZE) {
>>>                 SetPageReserved(virt_to_page((void *)addr));
>>>         }
>>>
>>> It's been doing that since
>>>
>>> commit ddf19b973be5a96d77c8467f657fe5bd7d126e0f
>>> Author: Dave Airlie <airlied@linux.ie>
>>> Date:   Sun Mar 19 18:56:12 2006 +1100
>>>
>>>     drm: fixup PCI DMA support
>>>
>>> I haven't found anything to say if we are meant to be reserving the
>>> pages or not. So I bring it to your attention, asking for help.
>>
>> I don't see a real reason for these pages to be reserved. But I might be
>> wrong here.
>>
>> I tried to look around: other users (infiniband/ethernet) of
>> dma_alloc_coherent(__GFP_COMP) don't mess with PG_reserved.
>>
>> Could you try to drop it from DRM?
> 
> That is the current plan. So long as there is nothing magical about
> either the __GFP_COMP or SetPageReserved, we should be able to drop them
> without any functional change. Only 2 very old bits of HW (r128, ancient
> i915) depend on this routine, and i915 seems, touch wood, quite happy
> with a plain dma_alloc_coherent().

I documented a while ago in include/linux/page-flags.h

"
Pages marked as PG_reserved include:
[...]
MMIO/DMA pages. Some architectures don't allow to ioremap pages that are
not marked PG_reserved (as they might be in use by somebody else who
does not respect the caching strategy).
"

I also removed a bunch of users back then (and even had a patch to
remove this code here), but for this code I think I came to the
conclusion that it might be relevant for some archs.

git grep -o PageReserved | grep ioremap
arch/mips/mm/ioremap.c:PageReserved
arch/nios2/mm/ioremap.c:PageReserved
arch/parisc/mm/ioremap.c:PageReserved
arch/x86/mm/ioremap.c:PageReserved

It would be good to clarify if this here is actually needed (and in
addition the same pattern in other driver/ paths) and eventually update
the documentation.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH 09/16] page-flags: define PG_reserved behavior on compound pages
  2020-02-03 15:18     ` Kirill A. Shutemov
  2020-02-03 15:24       ` Chris Wilson
@ 2020-02-03 17:29       ` Christoph Hellwig
  1 sibling, 0 replies; 119+ messages in thread
From: Christoph Hellwig @ 2020-02-03 17:29 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Chris Wilson, Kirill A. Shutemov, Andrea Arcangeli,
	Andrew Morton, Hugh Dickins, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, linux-kernel, linux-mm

On Mon, Feb 03, 2020 at 06:18:44PM +0300, Kirill A. Shutemov wrote:
> > Much later than you would ever expect, but we just had a user update an
> > ancient device and trip over this.
> > https://gitlab.freedesktop.org/drm/intel/issues/1027
> > 
> > In drm_pci_alloc() we allocate a high-order page (for it to be physically
> > contiguous) and mark each page as Reserved.
> > 
> >         dmah->vaddr = dma_alloc_coherent(&dev->pdev->dev, size,
> >                                          &dmah->busaddr,
> >                                          GFP_KERNEL | __GFP_COMP);
> > 
> >         /* XXX - Is virt_to_page() legal for consistent mem? */
> >         /* Reserve */
> >         for (addr = (unsigned long)dmah->vaddr, sz = size;
> >              sz > 0; addr += PAGE_SIZE, sz -= PAGE_SIZE) {
> >                 SetPageReserved(virt_to_page((void *)addr));
> >         }
> > 
> > It's been doing that since

This code is completely and utterly broken.  Drivers were never allowed
to call virt_to_page() on the memory returned from dma_alloc_coherent
(or pci_alloc_consistent before that), as many implementations return
virtual addresses that are not in the kernel mapping.  So this code
needs to go away and not papered over.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH 09/16] page-flags: define PG_reserved behavior on compound pages
  2015-09-24 14:50 ` [PATCH 00/16] Refreshed page-flags patchset Kirill A. Shutemov
@ 2015-09-24 14:50     ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-09-24 14:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andrea Arcangeli, Hugh Dickins, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, Sasha Levin, linux-kernel,
	linux-mm, Kirill A. Shutemov

As far as I can see there's no users of PG_reserved on compound pages.
Let's use PF_NO_COMPOUND here.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index adaa2b39f471..5ba8130fffb5 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -264,7 +264,8 @@ PAGEFLAG(Pinned, pinned, PF_NO_COMPOUND)
 PAGEFLAG(SavePinned, savepinned, PF_NO_COMPOUND);
 PAGEFLAG(Foreign, foreign, PF_NO_COMPOUND);
 
-PAGEFLAG(Reserved, reserved, PF_ANY) __CLEARPAGEFLAG(Reserved, reserved, PF_ANY)
+PAGEFLAG(Reserved, reserved, PF_NO_COMPOUND)
+	__CLEARPAGEFLAG(Reserved, reserved, PF_NO_COMPOUND)
 PAGEFLAG(SwapBacked, swapbacked, PF_ANY)
 	__CLEARPAGEFLAG(SwapBacked, swapbacked, PF_ANY)
 	__SETPAGEFLAG(SwapBacked, swapbacked, PF_ANY)
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [PATCH 09/16] page-flags: define PG_reserved behavior on compound pages
@ 2015-09-24 14:50     ` Kirill A. Shutemov
  0 siblings, 0 replies; 119+ messages in thread
From: Kirill A. Shutemov @ 2015-09-24 14:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andrea Arcangeli, Hugh Dickins, Dave Hansen, Mel Gorman,
	Rik van Riel, Vlastimil Babka, Christoph Lameter,
	Naoya Horiguchi, Steve Capper, Aneesh Kumar K.V, Johannes Weiner,
	Michal Hocko, Jerome Marchand, Sasha Levin, linux-kernel,
	linux-mm, Kirill A. Shutemov

As far as I can see there's no users of PG_reserved on compound pages.
Let's use PF_NO_COMPOUND here.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-flags.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index adaa2b39f471..5ba8130fffb5 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -264,7 +264,8 @@ PAGEFLAG(Pinned, pinned, PF_NO_COMPOUND)
 PAGEFLAG(SavePinned, savepinned, PF_NO_COMPOUND);
 PAGEFLAG(Foreign, foreign, PF_NO_COMPOUND);
 
-PAGEFLAG(Reserved, reserved, PF_ANY) __CLEARPAGEFLAG(Reserved, reserved, PF_ANY)
+PAGEFLAG(Reserved, reserved, PF_NO_COMPOUND)
+	__CLEARPAGEFLAG(Reserved, reserved, PF_NO_COMPOUND)
 PAGEFLAG(SwapBacked, swapbacked, PF_ANY)
 	__CLEARPAGEFLAG(SwapBacked, swapbacked, PF_ANY)
 	__SETPAGEFLAG(SwapBacked, swapbacked, PF_ANY)
-- 
2.5.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 119+ messages in thread

end of thread, other threads:[~2020-02-03 17:29 UTC | newest]

Thread overview: 119+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-19 17:08 [PATCH 00/16] Sanitize usage of ->flags and ->mapping for tail pages Kirill A. Shutemov
2015-03-19 17:08 ` Kirill A. Shutemov
2015-03-19 17:08 ` [PATCH 01/16] mm: consolidate all page-flags helpers in <linux/page-flags.h> Kirill A. Shutemov
2015-03-19 17:08   ` Kirill A. Shutemov
2015-03-23  0:10   ` Hugh Dickins
2015-03-23  0:10     ` Hugh Dickins
2015-03-19 17:08 ` [PATCH 02/16] page-flags: trivial cleanup for PageTrans* helpers Kirill A. Shutemov
2015-03-19 17:08   ` Kirill A. Shutemov
2015-03-23  0:12   ` Hugh Dickins
2015-03-23  0:12     ` Hugh Dickins
2015-03-19 17:08 ` [PATCH 03/16] page-flags: introduce page flags policies wrt compound pages Kirill A. Shutemov
2015-03-19 17:08   ` Kirill A. Shutemov
2015-03-20 20:35   ` Andrew Morton
2015-03-20 20:35     ` Andrew Morton
2015-03-20 21:34     ` Kirill A. Shutemov
2015-03-20 21:34       ` Kirill A. Shutemov
2015-03-19 17:08 ` [PATCH 04/16] page-flags: define PG_locked behavior on " Kirill A. Shutemov
2015-03-19 17:08   ` Kirill A. Shutemov
2015-03-27 15:11   ` Mateusz Krawczuk
2015-03-27 15:11     ` Mateusz Krawczuk
2015-03-27 15:13   ` Mateusz Krawczuk
2015-03-27 15:13     ` Mateusz Krawczuk
2015-03-27 15:13     ` Mateusz Krawczuk
2015-03-27 16:37     ` Kirill A. Shutemov
2015-03-27 16:37       ` Kirill A. Shutemov
2015-03-27 16:37       ` Kirill A. Shutemov
2015-07-15 20:20   ` Christoph Lameter
2015-07-15 20:20     ` Christoph Lameter
2015-08-06  4:15   ` page-flags behavior on compound pages: a worry Hugh Dickins
2015-08-06  4:15     ` Hugh Dickins
2015-08-06 15:33     ` Kirill A. Shutemov
2015-08-06 15:33       ` Kirill A. Shutemov
2015-08-06 19:24       ` Hugh Dickins
2015-08-06 19:24         ` Hugh Dickins
2015-08-06 20:45         ` Christoph Lameter
2015-08-06 20:45           ` Christoph Lameter
2015-08-07 14:50           ` Kirill A. Shutemov
2015-08-07 14:50             ` Kirill A. Shutemov
2015-08-07 15:28             ` Christoph Lameter
2015-08-07 15:28               ` Christoph Lameter
2015-08-10 11:09               ` Kirill A. Shutemov
2015-08-10 11:09                 ` Kirill A. Shutemov
2015-08-10 13:50                 ` Christoph Lameter
2015-08-10 13:50                   ` Christoph Lameter
2015-08-07 14:49         ` Kirill A. Shutemov
2015-08-07 14:49           ` Kirill A. Shutemov
2015-08-13  5:10           ` Hugh Dickins
2015-08-13  5:10             ` Hugh Dickins
2015-08-12 14:35         ` Kirill A. Shutemov
2015-08-12 14:35           ` Kirill A. Shutemov
2015-08-12 14:47           ` Vlastimil Babka
2015-08-12 14:47             ` Vlastimil Babka
2015-08-12 21:16           ` Andrew Morton
2015-08-12 21:16             ` Andrew Morton
2015-08-12 22:21             ` Kirill A. Shutemov
2015-08-12 22:21               ` Kirill A. Shutemov
2015-08-13  4:12               ` Hugh Dickins
2015-08-13  4:12                 ` Hugh Dickins
2015-03-19 17:08 ` [PATCH 05/16] page-flags: define behavior of FS/IO-related flags on compound pages Kirill A. Shutemov
2015-03-19 17:08   ` Kirill A. Shutemov
2015-03-19 18:29   ` Dave Hansen
2015-03-19 18:29     ` Dave Hansen
2015-03-19 20:02     ` Kirill A. Shutemov
2015-03-19 20:02       ` Kirill A. Shutemov
2015-03-23  0:02       ` Hugh Dickins
2015-03-23  0:02         ` Hugh Dickins
2015-03-23 12:17         ` Kirill A. Shutemov
2015-03-23 12:17           ` Kirill A. Shutemov
2015-03-24 22:54           ` Hugh Dickins
2015-03-24 22:54             ` Hugh Dickins
2015-03-25 10:23             ` Kirill A. Shutemov
2015-03-25 10:23               ` Kirill A. Shutemov
2015-03-25 18:56               ` Hugh Dickins
2015-03-25 18:56                 ` Hugh Dickins
2015-03-19 17:08 ` [PATCH 06/16] page-flags: define behavior of LRU-related " Kirill A. Shutemov
2015-03-19 17:08   ` Kirill A. Shutemov
2015-03-19 17:08 ` [PATCH 07/16] page-flags: define behavior SL*B-related " Kirill A. Shutemov
2015-03-19 17:08   ` Kirill A. Shutemov
2015-03-19 17:08 ` [PATCH 08/16] page-flags: define behavior of Xen-related " Kirill A. Shutemov
2015-03-19 17:08   ` Kirill A. Shutemov
2015-03-19 17:08 ` [PATCH 09/16] page-flags: define PG_reserved behavior " Kirill A. Shutemov
2015-03-19 17:08   ` Kirill A. Shutemov
2020-01-31 15:24   ` Chris Wilson
2020-02-03 15:18     ` Kirill A. Shutemov
2020-02-03 15:24       ` Chris Wilson
2020-02-03 17:10         ` David Hildenbrand
2020-02-03 17:29       ` Christoph Hellwig
2015-03-19 17:08 ` [PATCH 10/16] page-flags: define PG_swapbacked " Kirill A. Shutemov
2015-03-19 17:08   ` Kirill A. Shutemov
2015-03-19 17:08 ` [PATCH 11/16] page-flags: define PG_swapcache " Kirill A. Shutemov
2015-03-19 17:08   ` Kirill A. Shutemov
2015-03-19 17:08 ` [PATCH 12/16] page-flags: define PG_mlocked " Kirill A. Shutemov
2015-03-19 17:08   ` Kirill A. Shutemov
2015-03-19 17:08 ` [PATCH 13/16] page-flags: define PG_uncached " Kirill A. Shutemov
2015-03-19 17:08   ` Kirill A. Shutemov
2015-03-19 17:08 ` [PATCH 14/16] page-flags: define PG_uptodate " Kirill A. Shutemov
2015-03-19 17:08   ` Kirill A. Shutemov
2015-03-19 17:08 ` [PATCH 15/16] page-flags: look on head page if the flag is encoded in page->mapping Kirill A. Shutemov
2015-03-19 17:08   ` Kirill A. Shutemov
2015-03-19 17:08 ` [PATCH 16/16] mm: sanitize page->mapping for tail pages Kirill A. Shutemov
2015-03-19 17:08   ` Kirill A. Shutemov
2015-03-23  0:28 ` [PATCH 00/16] Sanitize usage of ->flags and ->mapping " Hugh Dickins
2015-03-23  0:28   ` Hugh Dickins
2015-03-23 10:04   ` Kirill A. Shutemov
2015-03-23 10:04     ` Kirill A. Shutemov
2015-03-24 23:42     ` Hugh Dickins
2015-03-24 23:42       ` Hugh Dickins
2015-03-25 10:55       ` Kirill A. Shutemov
2015-03-25 10:55         ` Kirill A. Shutemov
2015-03-24 17:39 ` Konstantin Khlebnikov
2015-03-24 17:39   ` Konstantin Khlebnikov
2015-03-24 20:04   ` Kirill A. Shutemov
2015-03-24 20:04     ` Kirill A. Shutemov
2015-07-15 20:20 ` Christoph Lameter
2015-07-15 20:20   ` Christoph Lameter
2015-07-15 21:18   ` Kirill A. Shutemov
2015-07-15 21:18     ` Kirill A. Shutemov
2015-09-21 22:35 [PATCH 3/3] page-flags: rectify forward declaration Andrew Morton
2015-09-24 14:50 ` [PATCH 00/16] Refreshed page-flags patchset Kirill A. Shutemov
2015-09-24 14:50   ` [PATCH 09/16] page-flags: define PG_reserved behavior on compound pages Kirill A. Shutemov
2015-09-24 14:50     ` Kirill A. Shutemov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.