All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] Fix few rmap-related THP bugs
@ 2017-01-24 16:28 ` Kirill A. Shutemov
  0 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov

The patch fixes handing PTE-mapped THPs in page_referenced() and
page_idle_clear_pte_refs().

To achieve that I've intrdocued new helper -- page_check_walk() -- which
replaces all page_check_address{,_transhuge}() and covers all THP cases.

Patchset overview:
  - First patch fixes one uprobe bug (unrelated to the rest of the
    patchset, just spotted it at the same time);

  - Patches 2-5 fix handling PTE-mapped THPs in page_referenced(),
    page_idle_clear_pte_refs() and rmap core;

  - Patches 6-12 convert all page_check_address{,_transhuge}() users (plus
    remove_migration_pte()) to page_check_walk() and drop unused helpers.

I think the fixes are not critical enough for stable@ as they don't lead
to crashes or hangs, only suboptimal behaviour.

Please review and consider applying.

Kirill A. Shutemov (12):
  uprobes: split THPs before trying replace them
  mm: introduce page_check_walk()
  mm: fix handling PTE-mapped THPs in page_referenced()
  mm: fix handling PTE-mapped THPs in page_idle_clear_pte_refs()
  mm, rmap: check all VMAs that PTE-mapped THP can be part of
  mm: convert page_mkclean_one() to page_check_walk()
  mm: convert try_to_unmap_one() to page_check_walk()
  mm, ksm: convert write_protect_page() to page_check_walk()
  mm, uprobes: convert __replace_page() to page_check_walk()
  mm: convert page_mapped_in_vma() to page_check_walk()
  mm: drop page_check_address{,_transhuge}
  mm: convert remove_migration_pte() to page_check_walk()

 include/linux/rmap.h    |  85 ++++---
 kernel/events/uprobes.c |  26 ++-
 mm/Makefile             |   6 +-
 mm/huge_memory.c        |  25 +--
 mm/internal.h           |   9 +-
 mm/ksm.c                |  34 +--
 mm/migrate.c            | 103 ++++-----
 mm/page_check.c         | 178 +++++++++++++++
 mm/page_idle.c          |  34 +--
 mm/rmap.c               | 572 +++++++++++++++++++-----------------------------
 10 files changed, 567 insertions(+), 505 deletions(-)
 create mode 100644 mm/page_check.c

-- 
2.11.0

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH 00/12] Fix few rmap-related THP bugs
@ 2017-01-24 16:28 ` Kirill A. Shutemov
  0 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov

The patch fixes handing PTE-mapped THPs in page_referenced() and
page_idle_clear_pte_refs().

To achieve that I've intrdocued new helper -- page_check_walk() -- which
replaces all page_check_address{,_transhuge}() and covers all THP cases.

Patchset overview:
  - First patch fixes one uprobe bug (unrelated to the rest of the
    patchset, just spotted it at the same time);

  - Patches 2-5 fix handling PTE-mapped THPs in page_referenced(),
    page_idle_clear_pte_refs() and rmap core;

  - Patches 6-12 convert all page_check_address{,_transhuge}() users (plus
    remove_migration_pte()) to page_check_walk() and drop unused helpers.

I think the fixes are not critical enough for stable@ as they don't lead
to crashes or hangs, only suboptimal behaviour.

Please review and consider applying.

Kirill A. Shutemov (12):
  uprobes: split THPs before trying replace them
  mm: introduce page_check_walk()
  mm: fix handling PTE-mapped THPs in page_referenced()
  mm: fix handling PTE-mapped THPs in page_idle_clear_pte_refs()
  mm, rmap: check all VMAs that PTE-mapped THP can be part of
  mm: convert page_mkclean_one() to page_check_walk()
  mm: convert try_to_unmap_one() to page_check_walk()
  mm, ksm: convert write_protect_page() to page_check_walk()
  mm, uprobes: convert __replace_page() to page_check_walk()
  mm: convert page_mapped_in_vma() to page_check_walk()
  mm: drop page_check_address{,_transhuge}
  mm: convert remove_migration_pte() to page_check_walk()

 include/linux/rmap.h    |  85 ++++---
 kernel/events/uprobes.c |  26 ++-
 mm/Makefile             |   6 +-
 mm/huge_memory.c        |  25 +--
 mm/internal.h           |   9 +-
 mm/ksm.c                |  34 +--
 mm/migrate.c            | 103 ++++-----
 mm/page_check.c         | 178 +++++++++++++++
 mm/page_idle.c          |  34 +--
 mm/rmap.c               | 572 +++++++++++++++++++-----------------------------
 10 files changed, 567 insertions(+), 505 deletions(-)
 create mode 100644 mm/page_check.c

-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH 01/12] uprobes: split THPs before trying replace them
  2017-01-24 16:28 ` Kirill A. Shutemov
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov, Oleg Nesterov,
	Peter Zijlstra

For THPs page_check_address() always fails. It's better to split them
first before trying to replace.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 kernel/events/uprobes.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index d416f3baf392..1e65c79e52a6 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -300,8 +300,8 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
 
 retry:
 	/* Read the page with vaddr into memory */
-	ret = get_user_pages_remote(NULL, mm, vaddr, 1, FOLL_FORCE, &old_page,
-			&vma, NULL);
+	ret = get_user_pages_remote(NULL, mm, vaddr, 1,
+			FOLL_FORCE | FOLL_SPLIT, &old_page, &vma, NULL);
 	if (ret <= 0)
 		return ret;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 01/12] uprobes: split THPs before trying replace them
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  0 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov, Oleg Nesterov,
	Peter Zijlstra

For THPs page_check_address() always fails. It's better to split them
first before trying to replace.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 kernel/events/uprobes.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index d416f3baf392..1e65c79e52a6 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -300,8 +300,8 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
 
 retry:
 	/* Read the page with vaddr into memory */
-	ret = get_user_pages_remote(NULL, mm, vaddr, 1, FOLL_FORCE, &old_page,
-			&vma, NULL);
+	ret = get_user_pages_remote(NULL, mm, vaddr, 1,
+			FOLL_FORCE | FOLL_SPLIT, &old_page, &vma, NULL);
 	if (ret <= 0)
 		return ret;
 
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 02/12] mm: introduce page_check_walk()
  2017-01-24 16:28 ` Kirill A. Shutemov
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov

The patch introduce new interface to check if a page is mapped into a vma.
It aims to address shortcomings of page_check_address{,_transhuge}.

Existing interface is not able to handle PTE-mapped THPs: it only finds
the first PTE. The rest lefted unnoticed.

page_check_walk() iterates over all possible mapping of the page in the
vma.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/rmap.h |  65 ++++++++++++++++++++++
 mm/Makefile          |   6 ++-
 mm/huge_memory.c     |   9 ++--
 mm/page_check.c      | 148 +++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 223 insertions(+), 5 deletions(-)
 create mode 100644 mm/page_check.c

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 15321fb1df6b..474279810742 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -232,6 +232,71 @@ static inline bool page_check_address_transhuge(struct page *page,
 }
 #endif
 
+/* Avoid racy checks */
+#define PAGE_CHECK_WALK_SYNC		(1 << 0)
+/* Look for migarion entries rather than present ptes */
+#define PAGE_CHECK_WALK_MIGRATION	(1 << 1)
+
+struct page_check_walk {
+	struct page *page;
+	struct vm_area_struct *vma;
+	unsigned long address;
+	pmd_t *pmd;
+	pte_t *pte;
+	spinlock_t *ptl;
+	unsigned int flags;
+};
+
+static inline void page_check_walk_done(struct page_check_walk *pcw)
+{
+	if (pcw->pte)
+		pte_unmap(pcw->pte);
+	if (pcw->ptl)
+		spin_unlock(pcw->ptl);
+}
+
+bool __page_check_walk(struct page_check_walk *pcw);
+
+/**
+ * page_check_walk - check if @pcw->page is mapped in @pcw->vma at @pcw->address
+ * @pcw: pointer to struce page_check_walk. page, vma and address must be set.
+ *
+ * Returns true, if the page is mapped in the vma. @pcw->pmd and @pcw->pte point
+ * to relevant page table entries. @pcw->ptl is locked. @pcw->address is
+ * adjusted if needed (for PTE-mapped THPs).
+ *
+ * If @pcw->pmd is set, but @pcw->pte is not, you have found PMD-mapped page
+ * (usually THP). For PTE-mapped THP, you should run page_check_walk() in 
+ * a loop to find all PTEs that maps the THP.
+ *
+ * For HugeTLB pages, @pcw->pte is set to relevant page table entry regardless
+ * which page table level the page mapped at. @pcw->pmd is NULL.
+ *
+ * Retruns false, if there's no more page table entries for the page in the vma.
+ * @pcw->ptl is unlocked and @pcw->pte is unmapped.
+ *
+ * If you need to stop the walk before page_check_walk() returned false, use
+ * page_check_walk_done(). It will do the housekeeping.
+ */
+static inline bool page_check_walk(struct page_check_walk *pcw)
+{
+	/* The only possible pmd mapping has been handled on last iteration */
+	if (pcw->pmd && !pcw->pte) {
+		page_check_walk_done(pcw);
+		return false;
+	}
+
+	/* Only for THP, seek to next pte entry makes sense */
+	if (pcw->pte) {
+		if (!PageTransHuge(pcw->page) || PageHuge(pcw->page)) {
+			page_check_walk_done(pcw);
+			return false;
+		}
+	}
+
+	return __page_check_walk(pcw);
+}
+
 /*
  * Used by swapoff to help locate where page is expected in vma.
  */
diff --git a/mm/Makefile b/mm/Makefile
index 295bd7a9f76b..d8d2b2429557 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -23,8 +23,10 @@ KCOV_INSTRUMENT_vmstat.o := n
 
 mmu-y			:= nommu.o
 mmu-$(CONFIG_MMU)	:= gup.o highmem.o memory.o mincore.o \
-			   mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \
-			   vmalloc.o pagewalk.o pgtable-generic.o
+			   mlock.o mmap.o mprotect.o mremap.o msync.o \
+			   page_check.o pagewalk.o pgtable-generic.o rmap.o \
+			   vmalloc.o
+
 
 ifdef CONFIG_CROSS_MEMORY_ATTACH
 mmu-$(CONFIG_MMU)	+= process_vm_access.o
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9a6bd6c8d55a..16820e001d79 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1862,9 +1862,12 @@ static void freeze_page(struct page *page)
 static void unfreeze_page(struct page *page)
 {
 	int i;
-
-	for (i = 0; i < HPAGE_PMD_NR; i++)
-		remove_migration_ptes(page + i, page + i, true);
+	if (PageTransHuge(page)) {
+		remove_migration_ptes(page, page, true);
+	} else {
+		for (i = 0; i < HPAGE_PMD_NR; i++)
+			remove_migration_ptes(page + i, page + i, true);
+	}
 }
 
 static void __split_huge_page_tail(struct page *head, int tail,
diff --git a/mm/page_check.c b/mm/page_check.c
new file mode 100644
index 000000000000..d4b3536a6bf2
--- /dev/null
+++ b/mm/page_check.c
@@ -0,0 +1,148 @@
+#include <linux/mm.h>
+#include <linux/rmap.h>
+#include <linux/hugetlb.h>
+#include <linux/swap.h>
+#include <linux/swapops.h>
+
+#include "internal.h"
+
+static inline bool check_pmd(struct page_check_walk *pcw)
+{
+	pmd_t pmde = *pcw->pmd;
+	barrier();
+	return pmd_present(pmde) && !pmd_trans_huge(pmde);
+}
+
+static inline bool not_found(struct page_check_walk *pcw)
+{
+	page_check_walk_done(pcw);
+	return false;
+}
+
+static inline bool map_pte(struct page_check_walk *pcw)
+{
+	pcw->pte = pte_offset_map(pcw->pmd, pcw->address);
+	if (!(pcw->flags & PAGE_CHECK_WALK_SYNC)) {
+		if (pcw->flags & PAGE_CHECK_WALK_MIGRATION) {
+			if (!is_swap_pte(*pcw->pte))
+				return false;
+		} else {
+			if (!pte_present(*pcw->pte))
+				return false;
+		}
+	}
+	pcw->ptl = pte_lockptr(pcw->vma->vm_mm, pcw->pmd);
+	spin_lock(pcw->ptl);
+	return true;
+}
+
+static inline bool check_pte(struct page_check_walk *pcw)
+{
+	if (pcw->flags & PAGE_CHECK_WALK_MIGRATION) {
+		swp_entry_t entry;
+		if (!is_swap_pte(*pcw->pte))
+			return false;
+		entry = pte_to_swp_entry(*pcw->pte);
+		if (!is_migration_entry(entry))
+			return false;
+		if (migration_entry_to_page(entry) - pcw->page >=
+				hpage_nr_pages(pcw->page)) {
+			return false;
+		}
+		if (migration_entry_to_page(entry) < pcw->page)
+			return false;
+	} else {
+		if (!pte_present(*pcw->pte))
+			return false;
+
+		/* THP can be referenced by any subpage */
+		if (pte_page(*pcw->pte) - pcw->page >=
+				hpage_nr_pages(pcw->page)) {
+			return false;
+		}
+		if (pte_page(*pcw->pte) < pcw->page)
+			return false;
+	}
+
+	return true;
+}
+
+bool __page_check_walk(struct page_check_walk *pcw)
+{
+	struct mm_struct *mm = pcw->vma->vm_mm;
+	struct page *page = pcw->page;
+	pgd_t *pgd;
+	pud_t *pud;
+
+	/* For THP, seek to next pte entry */
+	if (pcw->pte)
+		goto next_pte;
+
+	if (unlikely(PageHuge(pcw->page))) {
+		/* when pud is not present, pte will be NULL */
+		pcw->pte = huge_pte_offset(mm, pcw->address);
+		if (!pcw->pte)
+			return false;
+
+		pcw->ptl = huge_pte_lockptr(page_hstate(page), mm, pcw->pte);
+		spin_lock(pcw->ptl);
+		if (!check_pte(pcw))
+			return not_found(pcw);
+		return true;
+	}
+restart:
+	pgd = pgd_offset(mm, pcw->address);
+	if (!pgd_present(*pgd))
+		return false;
+	pud = pud_offset(pgd, pcw->address);
+	if (!pud_present(*pud))
+		return false;
+	pcw->pmd = pmd_offset(pud, pcw->address);
+	if (pmd_trans_huge(*pcw->pmd)) {
+		pcw->ptl = pmd_lock(mm, pcw->pmd);
+		if (!pmd_present(*pcw->pmd))
+			return not_found(pcw);
+		if (likely(pmd_trans_huge(*pcw->pmd))) {
+			if (pcw->flags & PAGE_CHECK_WALK_MIGRATION)
+				return not_found(pcw);
+			if (pmd_page(*pcw->pmd) != page)
+				return not_found(pcw);
+			return true;
+		} else {
+			/* THP pmd was split under us: handle on pte level */
+			spin_unlock(pcw->ptl);
+			pcw->ptl = NULL;
+		}
+	} else {
+		if (!check_pmd(pcw))
+			return false;
+	}
+	if (!map_pte(pcw))
+		goto next_pte;
+	while (1) {
+		if (check_pte(pcw))
+			return true;
+next_pte:	do {
+			pcw->address += PAGE_SIZE;
+			if (pcw->address >= __vma_address(pcw->page, pcw->vma) +
+					hpage_nr_pages(pcw->page) * PAGE_SIZE)
+				return not_found(pcw);
+			/* Did we cross page table boundary? */
+			if (pcw->address % PMD_SIZE == 0) {
+				pte_unmap(pcw->pte);
+				if (pcw->ptl) {
+					spin_unlock(pcw->ptl);
+					pcw->ptl = NULL;
+				}
+				goto restart;
+			} else {
+				pcw->pte++;
+			}
+		} while (pte_none(*pcw->pte));
+
+		if (!pcw->ptl) {
+			pcw->ptl = pte_lockptr(mm, pcw->pmd);
+			spin_lock(pcw->ptl);
+		}
+	}
+}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 02/12] mm: introduce page_check_walk()
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  0 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov

The patch introduce new interface to check if a page is mapped into a vma.
It aims to address shortcomings of page_check_address{,_transhuge}.

Existing interface is not able to handle PTE-mapped THPs: it only finds
the first PTE. The rest lefted unnoticed.

page_check_walk() iterates over all possible mapping of the page in the
vma.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/rmap.h |  65 ++++++++++++++++++++++
 mm/Makefile          |   6 ++-
 mm/huge_memory.c     |   9 ++--
 mm/page_check.c      | 148 +++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 223 insertions(+), 5 deletions(-)
 create mode 100644 mm/page_check.c

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 15321fb1df6b..474279810742 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -232,6 +232,71 @@ static inline bool page_check_address_transhuge(struct page *page,
 }
 #endif
 
+/* Avoid racy checks */
+#define PAGE_CHECK_WALK_SYNC		(1 << 0)
+/* Look for migarion entries rather than present ptes */
+#define PAGE_CHECK_WALK_MIGRATION	(1 << 1)
+
+struct page_check_walk {
+	struct page *page;
+	struct vm_area_struct *vma;
+	unsigned long address;
+	pmd_t *pmd;
+	pte_t *pte;
+	spinlock_t *ptl;
+	unsigned int flags;
+};
+
+static inline void page_check_walk_done(struct page_check_walk *pcw)
+{
+	if (pcw->pte)
+		pte_unmap(pcw->pte);
+	if (pcw->ptl)
+		spin_unlock(pcw->ptl);
+}
+
+bool __page_check_walk(struct page_check_walk *pcw);
+
+/**
+ * page_check_walk - check if @pcw->page is mapped in @pcw->vma at @pcw->address
+ * @pcw: pointer to struce page_check_walk. page, vma and address must be set.
+ *
+ * Returns true, if the page is mapped in the vma. @pcw->pmd and @pcw->pte point
+ * to relevant page table entries. @pcw->ptl is locked. @pcw->address is
+ * adjusted if needed (for PTE-mapped THPs).
+ *
+ * If @pcw->pmd is set, but @pcw->pte is not, you have found PMD-mapped page
+ * (usually THP). For PTE-mapped THP, you should run page_check_walk() in 
+ * a loop to find all PTEs that maps the THP.
+ *
+ * For HugeTLB pages, @pcw->pte is set to relevant page table entry regardless
+ * which page table level the page mapped at. @pcw->pmd is NULL.
+ *
+ * Retruns false, if there's no more page table entries for the page in the vma.
+ * @pcw->ptl is unlocked and @pcw->pte is unmapped.
+ *
+ * If you need to stop the walk before page_check_walk() returned false, use
+ * page_check_walk_done(). It will do the housekeeping.
+ */
+static inline bool page_check_walk(struct page_check_walk *pcw)
+{
+	/* The only possible pmd mapping has been handled on last iteration */
+	if (pcw->pmd && !pcw->pte) {
+		page_check_walk_done(pcw);
+		return false;
+	}
+
+	/* Only for THP, seek to next pte entry makes sense */
+	if (pcw->pte) {
+		if (!PageTransHuge(pcw->page) || PageHuge(pcw->page)) {
+			page_check_walk_done(pcw);
+			return false;
+		}
+	}
+
+	return __page_check_walk(pcw);
+}
+
 /*
  * Used by swapoff to help locate where page is expected in vma.
  */
diff --git a/mm/Makefile b/mm/Makefile
index 295bd7a9f76b..d8d2b2429557 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -23,8 +23,10 @@ KCOV_INSTRUMENT_vmstat.o := n
 
 mmu-y			:= nommu.o
 mmu-$(CONFIG_MMU)	:= gup.o highmem.o memory.o mincore.o \
-			   mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \
-			   vmalloc.o pagewalk.o pgtable-generic.o
+			   mlock.o mmap.o mprotect.o mremap.o msync.o \
+			   page_check.o pagewalk.o pgtable-generic.o rmap.o \
+			   vmalloc.o
+
 
 ifdef CONFIG_CROSS_MEMORY_ATTACH
 mmu-$(CONFIG_MMU)	+= process_vm_access.o
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9a6bd6c8d55a..16820e001d79 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1862,9 +1862,12 @@ static void freeze_page(struct page *page)
 static void unfreeze_page(struct page *page)
 {
 	int i;
-
-	for (i = 0; i < HPAGE_PMD_NR; i++)
-		remove_migration_ptes(page + i, page + i, true);
+	if (PageTransHuge(page)) {
+		remove_migration_ptes(page, page, true);
+	} else {
+		for (i = 0; i < HPAGE_PMD_NR; i++)
+			remove_migration_ptes(page + i, page + i, true);
+	}
 }
 
 static void __split_huge_page_tail(struct page *head, int tail,
diff --git a/mm/page_check.c b/mm/page_check.c
new file mode 100644
index 000000000000..d4b3536a6bf2
--- /dev/null
+++ b/mm/page_check.c
@@ -0,0 +1,148 @@
+#include <linux/mm.h>
+#include <linux/rmap.h>
+#include <linux/hugetlb.h>
+#include <linux/swap.h>
+#include <linux/swapops.h>
+
+#include "internal.h"
+
+static inline bool check_pmd(struct page_check_walk *pcw)
+{
+	pmd_t pmde = *pcw->pmd;
+	barrier();
+	return pmd_present(pmde) && !pmd_trans_huge(pmde);
+}
+
+static inline bool not_found(struct page_check_walk *pcw)
+{
+	page_check_walk_done(pcw);
+	return false;
+}
+
+static inline bool map_pte(struct page_check_walk *pcw)
+{
+	pcw->pte = pte_offset_map(pcw->pmd, pcw->address);
+	if (!(pcw->flags & PAGE_CHECK_WALK_SYNC)) {
+		if (pcw->flags & PAGE_CHECK_WALK_MIGRATION) {
+			if (!is_swap_pte(*pcw->pte))
+				return false;
+		} else {
+			if (!pte_present(*pcw->pte))
+				return false;
+		}
+	}
+	pcw->ptl = pte_lockptr(pcw->vma->vm_mm, pcw->pmd);
+	spin_lock(pcw->ptl);
+	return true;
+}
+
+static inline bool check_pte(struct page_check_walk *pcw)
+{
+	if (pcw->flags & PAGE_CHECK_WALK_MIGRATION) {
+		swp_entry_t entry;
+		if (!is_swap_pte(*pcw->pte))
+			return false;
+		entry = pte_to_swp_entry(*pcw->pte);
+		if (!is_migration_entry(entry))
+			return false;
+		if (migration_entry_to_page(entry) - pcw->page >=
+				hpage_nr_pages(pcw->page)) {
+			return false;
+		}
+		if (migration_entry_to_page(entry) < pcw->page)
+			return false;
+	} else {
+		if (!pte_present(*pcw->pte))
+			return false;
+
+		/* THP can be referenced by any subpage */
+		if (pte_page(*pcw->pte) - pcw->page >=
+				hpage_nr_pages(pcw->page)) {
+			return false;
+		}
+		if (pte_page(*pcw->pte) < pcw->page)
+			return false;
+	}
+
+	return true;
+}
+
+bool __page_check_walk(struct page_check_walk *pcw)
+{
+	struct mm_struct *mm = pcw->vma->vm_mm;
+	struct page *page = pcw->page;
+	pgd_t *pgd;
+	pud_t *pud;
+
+	/* For THP, seek to next pte entry */
+	if (pcw->pte)
+		goto next_pte;
+
+	if (unlikely(PageHuge(pcw->page))) {
+		/* when pud is not present, pte will be NULL */
+		pcw->pte = huge_pte_offset(mm, pcw->address);
+		if (!pcw->pte)
+			return false;
+
+		pcw->ptl = huge_pte_lockptr(page_hstate(page), mm, pcw->pte);
+		spin_lock(pcw->ptl);
+		if (!check_pte(pcw))
+			return not_found(pcw);
+		return true;
+	}
+restart:
+	pgd = pgd_offset(mm, pcw->address);
+	if (!pgd_present(*pgd))
+		return false;
+	pud = pud_offset(pgd, pcw->address);
+	if (!pud_present(*pud))
+		return false;
+	pcw->pmd = pmd_offset(pud, pcw->address);
+	if (pmd_trans_huge(*pcw->pmd)) {
+		pcw->ptl = pmd_lock(mm, pcw->pmd);
+		if (!pmd_present(*pcw->pmd))
+			return not_found(pcw);
+		if (likely(pmd_trans_huge(*pcw->pmd))) {
+			if (pcw->flags & PAGE_CHECK_WALK_MIGRATION)
+				return not_found(pcw);
+			if (pmd_page(*pcw->pmd) != page)
+				return not_found(pcw);
+			return true;
+		} else {
+			/* THP pmd was split under us: handle on pte level */
+			spin_unlock(pcw->ptl);
+			pcw->ptl = NULL;
+		}
+	} else {
+		if (!check_pmd(pcw))
+			return false;
+	}
+	if (!map_pte(pcw))
+		goto next_pte;
+	while (1) {
+		if (check_pte(pcw))
+			return true;
+next_pte:	do {
+			pcw->address += PAGE_SIZE;
+			if (pcw->address >= __vma_address(pcw->page, pcw->vma) +
+					hpage_nr_pages(pcw->page) * PAGE_SIZE)
+				return not_found(pcw);
+			/* Did we cross page table boundary? */
+			if (pcw->address % PMD_SIZE == 0) {
+				pte_unmap(pcw->pte);
+				if (pcw->ptl) {
+					spin_unlock(pcw->ptl);
+					pcw->ptl = NULL;
+				}
+				goto restart;
+			} else {
+				pcw->pte++;
+			}
+		} while (pte_none(*pcw->pte));
+
+		if (!pcw->ptl) {
+			pcw->ptl = pte_lockptr(mm, pcw->pmd);
+			spin_lock(pcw->ptl);
+		}
+	}
+}
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 03/12] mm: fix handling PTE-mapped THPs in page_referenced()
  2017-01-24 16:28 ` Kirill A. Shutemov
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov

For PTE-mapped THP page_check_address_transhuge() is not adequate: it
cannot find all relevant PTEs, only the first one. It means we can miss
some references of the page and it can result in suboptimal decisions by
vmscan.

Let's switch it to page_check_walk().

I don't think it's subject for stable@: it's not fatal. The only side
effect is that THP can be swapped out when it shouldn't.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/rmap.c | 66 ++++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 34 insertions(+), 32 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 91619fd70939..d7a0f5121c65 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -886,45 +886,48 @@ struct page_referenced_arg {
 static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
 			unsigned long address, void *arg)
 {
-	struct mm_struct *mm = vma->vm_mm;
 	struct page_referenced_arg *pra = arg;
-	pmd_t *pmd;
-	pte_t *pte;
-	spinlock_t *ptl;
+	struct page_check_walk pcw = {
+		.page = page,
+		.vma = vma,
+		.address = address,
+	};
 	int referenced = 0;
 
-	if (!page_check_address_transhuge(page, mm, address, &pmd, &pte, &ptl))
-		return SWAP_AGAIN;
+	while (page_check_walk(&pcw)) {
+		address = pcw.address;
 
-	if (vma->vm_flags & VM_LOCKED) {
-		if (pte)
-			pte_unmap(pte);
-		spin_unlock(ptl);
-		pra->vm_flags |= VM_LOCKED;
-		return SWAP_FAIL; /* To break the loop */
-	}
+		if (vma->vm_flags & VM_LOCKED) {
+			page_check_walk_done(&pcw);
+			pra->vm_flags |= VM_LOCKED;
+			return SWAP_FAIL; /* To break the loop */
+		}
 
-	if (pte) {
-		if (ptep_clear_flush_young_notify(vma, address, pte)) {
-			/*
-			 * Don't treat a reference through a sequentially read
-			 * mapping as such.  If the page has been used in
-			 * another mapping, we will catch it; if this other
-			 * mapping is already gone, the unmap path will have
-			 * set PG_referenced or activated the page.
-			 */
-			if (likely(!(vma->vm_flags & VM_SEQ_READ)))
+		if (pcw.pte) {
+			if (ptep_clear_flush_young_notify(vma, address,
+						pcw.pte)) {
+				/*
+				 * Don't treat a reference through
+				 * a sequentially read mapping as such.
+				 * If the page has been used in another mapping,
+				 * we will catch it; if this other mapping is
+				 * already gone, the unmap path will have set
+				 * PG_referenced or activated the page.
+				 */
+				if (likely(!(vma->vm_flags & VM_SEQ_READ)))
+					referenced++;
+			}
+		} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+			if (pmdp_clear_flush_young_notify(vma, address,
+						pcw.pmd))
 				referenced++;
+		} else {
+			/* unexpected pmd-mapped page? */
+			WARN_ON_ONCE(1);
 		}
-		pte_unmap(pte);
-	} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
-		if (pmdp_clear_flush_young_notify(vma, address, pmd))
-			referenced++;
-	} else {
-		/* unexpected pmd-mapped page? */
-		WARN_ON_ONCE(1);
+
+		pra->mapcount--;
 	}
-	spin_unlock(ptl);
 
 	if (referenced)
 		clear_page_idle(page);
@@ -936,7 +939,6 @@ static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
 		pra->vm_flags |= vma->vm_flags;
 	}
 
-	pra->mapcount--;
 	if (!pra->mapcount)
 		return SWAP_SUCCESS; /* To break the loop */
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 03/12] mm: fix handling PTE-mapped THPs in page_referenced()
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  0 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov

For PTE-mapped THP page_check_address_transhuge() is not adequate: it
cannot find all relevant PTEs, only the first one. It means we can miss
some references of the page and it can result in suboptimal decisions by
vmscan.

Let's switch it to page_check_walk().

I don't think it's subject for stable@: it's not fatal. The only side
effect is that THP can be swapped out when it shouldn't.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/rmap.c | 66 ++++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 34 insertions(+), 32 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 91619fd70939..d7a0f5121c65 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -886,45 +886,48 @@ struct page_referenced_arg {
 static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
 			unsigned long address, void *arg)
 {
-	struct mm_struct *mm = vma->vm_mm;
 	struct page_referenced_arg *pra = arg;
-	pmd_t *pmd;
-	pte_t *pte;
-	spinlock_t *ptl;
+	struct page_check_walk pcw = {
+		.page = page,
+		.vma = vma,
+		.address = address,
+	};
 	int referenced = 0;
 
-	if (!page_check_address_transhuge(page, mm, address, &pmd, &pte, &ptl))
-		return SWAP_AGAIN;
+	while (page_check_walk(&pcw)) {
+		address = pcw.address;
 
-	if (vma->vm_flags & VM_LOCKED) {
-		if (pte)
-			pte_unmap(pte);
-		spin_unlock(ptl);
-		pra->vm_flags |= VM_LOCKED;
-		return SWAP_FAIL; /* To break the loop */
-	}
+		if (vma->vm_flags & VM_LOCKED) {
+			page_check_walk_done(&pcw);
+			pra->vm_flags |= VM_LOCKED;
+			return SWAP_FAIL; /* To break the loop */
+		}
 
-	if (pte) {
-		if (ptep_clear_flush_young_notify(vma, address, pte)) {
-			/*
-			 * Don't treat a reference through a sequentially read
-			 * mapping as such.  If the page has been used in
-			 * another mapping, we will catch it; if this other
-			 * mapping is already gone, the unmap path will have
-			 * set PG_referenced or activated the page.
-			 */
-			if (likely(!(vma->vm_flags & VM_SEQ_READ)))
+		if (pcw.pte) {
+			if (ptep_clear_flush_young_notify(vma, address,
+						pcw.pte)) {
+				/*
+				 * Don't treat a reference through
+				 * a sequentially read mapping as such.
+				 * If the page has been used in another mapping,
+				 * we will catch it; if this other mapping is
+				 * already gone, the unmap path will have set
+				 * PG_referenced or activated the page.
+				 */
+				if (likely(!(vma->vm_flags & VM_SEQ_READ)))
+					referenced++;
+			}
+		} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+			if (pmdp_clear_flush_young_notify(vma, address,
+						pcw.pmd))
 				referenced++;
+		} else {
+			/* unexpected pmd-mapped page? */
+			WARN_ON_ONCE(1);
 		}
-		pte_unmap(pte);
-	} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
-		if (pmdp_clear_flush_young_notify(vma, address, pmd))
-			referenced++;
-	} else {
-		/* unexpected pmd-mapped page? */
-		WARN_ON_ONCE(1);
+
+		pra->mapcount--;
 	}
-	spin_unlock(ptl);
 
 	if (referenced)
 		clear_page_idle(page);
@@ -936,7 +939,6 @@ static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
 		pra->vm_flags |= vma->vm_flags;
 	}
 
-	pra->mapcount--;
 	if (!pra->mapcount)
 		return SWAP_SUCCESS; /* To break the loop */
 
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 04/12] mm: fix handling PTE-mapped THPs in page_idle_clear_pte_refs()
  2017-01-24 16:28 ` Kirill A. Shutemov
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov, Vladimir Davydov

For PTE-mapped THP page_check_address_transhuge() is not adequate: it
cannot find all relevant PTEs, only the first one.i

Let's switch it to page_check_walk().

I don't think it's subject for stable@: it's not fatal.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
---
 mm/page_idle.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/mm/page_idle.c b/mm/page_idle.c
index ae11aa914e55..573d217457cb 100644
--- a/mm/page_idle.c
+++ b/mm/page_idle.c
@@ -54,27 +54,27 @@ static int page_idle_clear_pte_refs_one(struct page *page,
 					struct vm_area_struct *vma,
 					unsigned long addr, void *arg)
 {
-	struct mm_struct *mm = vma->vm_mm;
-	pmd_t *pmd;
-	pte_t *pte;
-	spinlock_t *ptl;
+	struct page_check_walk pcw = {
+		.page = page,
+		.vma = vma,
+		.address = addr,
+	};
 	bool referenced = false;
 
-	if (!page_check_address_transhuge(page, mm, addr, &pmd, &pte, &ptl))
-		return SWAP_AGAIN;
-
-	if (pte) {
-		referenced = ptep_clear_young_notify(vma, addr, pte);
-		pte_unmap(pte);
-	} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
-		referenced = pmdp_clear_young_notify(vma, addr, pmd);
-	} else {
-		/* unexpected pmd-mapped page? */
-		WARN_ON_ONCE(1);
+	while (page_check_walk(&pcw)) {
+		addr = pcw.address;
+		if (pcw.pte) {
+			referenced = ptep_clear_young_notify(vma, addr,
+					pcw.pte);
+		} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+			referenced = pmdp_clear_young_notify(vma, addr,
+					pcw.pmd);
+		} else {
+			/* unexpected pmd-mapped page? */
+			WARN_ON_ONCE(1);
+		}
 	}
 
-	spin_unlock(ptl);
-
 	if (referenced) {
 		clear_page_idle(page);
 		/*
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 04/12] mm: fix handling PTE-mapped THPs in page_idle_clear_pte_refs()
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  0 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov, Vladimir Davydov

For PTE-mapped THP page_check_address_transhuge() is not adequate: it
cannot find all relevant PTEs, only the first one.i

Let's switch it to page_check_walk().

I don't think it's subject for stable@: it's not fatal.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
---
 mm/page_idle.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/mm/page_idle.c b/mm/page_idle.c
index ae11aa914e55..573d217457cb 100644
--- a/mm/page_idle.c
+++ b/mm/page_idle.c
@@ -54,27 +54,27 @@ static int page_idle_clear_pte_refs_one(struct page *page,
 					struct vm_area_struct *vma,
 					unsigned long addr, void *arg)
 {
-	struct mm_struct *mm = vma->vm_mm;
-	pmd_t *pmd;
-	pte_t *pte;
-	spinlock_t *ptl;
+	struct page_check_walk pcw = {
+		.page = page,
+		.vma = vma,
+		.address = addr,
+	};
 	bool referenced = false;
 
-	if (!page_check_address_transhuge(page, mm, addr, &pmd, &pte, &ptl))
-		return SWAP_AGAIN;
-
-	if (pte) {
-		referenced = ptep_clear_young_notify(vma, addr, pte);
-		pte_unmap(pte);
-	} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
-		referenced = pmdp_clear_young_notify(vma, addr, pmd);
-	} else {
-		/* unexpected pmd-mapped page? */
-		WARN_ON_ONCE(1);
+	while (page_check_walk(&pcw)) {
+		addr = pcw.address;
+		if (pcw.pte) {
+			referenced = ptep_clear_young_notify(vma, addr,
+					pcw.pte);
+		} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+			referenced = pmdp_clear_young_notify(vma, addr,
+					pcw.pmd);
+		} else {
+			/* unexpected pmd-mapped page? */
+			WARN_ON_ONCE(1);
+		}
 	}
 
-	spin_unlock(ptl);
-
 	if (referenced) {
 		clear_page_idle(page);
 		/*
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 05/12] mm, rmap: check all VMAs that PTE-mapped THP can be part of
  2017-01-24 16:28 ` Kirill A. Shutemov
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov

Current rmap code can miss a VMA that maps PTE-mapped THP if the first
suppage of the THP was unmapped from the VMA.

We need to walk rmap for the whole range of offsets that THP covers, not
only the first one.

vma_address() also need to be corrected to check the range instead of
the first subpage.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/internal.h |  9 ++++++---
 mm/rmap.c     | 16 ++++++++++------
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 03763f5c42c5..1f90c65df7fb 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -333,12 +333,15 @@ __vma_address(struct page *page, struct vm_area_struct *vma)
 static inline unsigned long
 vma_address(struct page *page, struct vm_area_struct *vma)
 {
-	unsigned long address = __vma_address(page, vma);
+	unsigned long start, end;
+
+	start = __vma_address(page, vma);
+	end = start + PAGE_SIZE * (hpage_nr_pages(page) - 1);
 
 	/* page should be within @vma mapping range */
-	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
+	VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma);
 
-	return address;
+	return max(start, vma->vm_start);
 }
 
 #else /* !CONFIG_MMU */
diff --git a/mm/rmap.c b/mm/rmap.c
index d7a0f5121c65..3bbf83b32553 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1757,7 +1757,7 @@ static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc,
 		bool locked)
 {
 	struct anon_vma *anon_vma;
-	pgoff_t pgoff;
+	pgoff_t pgoff_start, pgoff_end;
 	struct anon_vma_chain *avc;
 	int ret = SWAP_AGAIN;
 
@@ -1771,8 +1771,10 @@ static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc,
 	if (!anon_vma)
 		return ret;
 
-	pgoff = page_to_pgoff(page);
-	anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff, pgoff) {
+	pgoff_start = page_to_pgoff(page);
+	pgoff_end = pgoff_start + hpage_nr_pages(page) - 1;
+	anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root,
+			pgoff_start, pgoff_end) {
 		struct vm_area_struct *vma = avc->vma;
 		unsigned long address = vma_address(page, vma);
 
@@ -1810,7 +1812,7 @@ static int rmap_walk_file(struct page *page, struct rmap_walk_control *rwc,
 		bool locked)
 {
 	struct address_space *mapping = page_mapping(page);
-	pgoff_t pgoff;
+	pgoff_t pgoff_start, pgoff_end;
 	struct vm_area_struct *vma;
 	int ret = SWAP_AGAIN;
 
@@ -1825,10 +1827,12 @@ static int rmap_walk_file(struct page *page, struct rmap_walk_control *rwc,
 	if (!mapping)
 		return ret;
 
-	pgoff = page_to_pgoff(page);
+	pgoff_start = page_to_pgoff(page);
+	pgoff_end = pgoff_start + hpage_nr_pages(page) - 1;
 	if (!locked)
 		i_mmap_lock_read(mapping);
-	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
+	vma_interval_tree_foreach(vma, &mapping->i_mmap,
+			pgoff_start, pgoff_end) {
 		unsigned long address = vma_address(page, vma);
 
 		cond_resched();
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 05/12] mm, rmap: check all VMAs that PTE-mapped THP can be part of
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  0 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov

Current rmap code can miss a VMA that maps PTE-mapped THP if the first
suppage of the THP was unmapped from the VMA.

We need to walk rmap for the whole range of offsets that THP covers, not
only the first one.

vma_address() also need to be corrected to check the range instead of
the first subpage.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/internal.h |  9 ++++++---
 mm/rmap.c     | 16 ++++++++++------
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 03763f5c42c5..1f90c65df7fb 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -333,12 +333,15 @@ __vma_address(struct page *page, struct vm_area_struct *vma)
 static inline unsigned long
 vma_address(struct page *page, struct vm_area_struct *vma)
 {
-	unsigned long address = __vma_address(page, vma);
+	unsigned long start, end;
+
+	start = __vma_address(page, vma);
+	end = start + PAGE_SIZE * (hpage_nr_pages(page) - 1);
 
 	/* page should be within @vma mapping range */
-	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
+	VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma);
 
-	return address;
+	return max(start, vma->vm_start);
 }
 
 #else /* !CONFIG_MMU */
diff --git a/mm/rmap.c b/mm/rmap.c
index d7a0f5121c65..3bbf83b32553 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1757,7 +1757,7 @@ static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc,
 		bool locked)
 {
 	struct anon_vma *anon_vma;
-	pgoff_t pgoff;
+	pgoff_t pgoff_start, pgoff_end;
 	struct anon_vma_chain *avc;
 	int ret = SWAP_AGAIN;
 
@@ -1771,8 +1771,10 @@ static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc,
 	if (!anon_vma)
 		return ret;
 
-	pgoff = page_to_pgoff(page);
-	anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff, pgoff) {
+	pgoff_start = page_to_pgoff(page);
+	pgoff_end = pgoff_start + hpage_nr_pages(page) - 1;
+	anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root,
+			pgoff_start, pgoff_end) {
 		struct vm_area_struct *vma = avc->vma;
 		unsigned long address = vma_address(page, vma);
 
@@ -1810,7 +1812,7 @@ static int rmap_walk_file(struct page *page, struct rmap_walk_control *rwc,
 		bool locked)
 {
 	struct address_space *mapping = page_mapping(page);
-	pgoff_t pgoff;
+	pgoff_t pgoff_start, pgoff_end;
 	struct vm_area_struct *vma;
 	int ret = SWAP_AGAIN;
 
@@ -1825,10 +1827,12 @@ static int rmap_walk_file(struct page *page, struct rmap_walk_control *rwc,
 	if (!mapping)
 		return ret;
 
-	pgoff = page_to_pgoff(page);
+	pgoff_start = page_to_pgoff(page);
+	pgoff_end = pgoff_start + hpage_nr_pages(page) - 1;
 	if (!locked)
 		i_mmap_lock_read(mapping);
-	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
+	vma_interval_tree_foreach(vma, &mapping->i_mmap,
+			pgoff_start, pgoff_end) {
 		unsigned long address = vma_address(page, vma);
 
 		cond_resched();
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 06/12] mm: convert page_mkclean_one() to page_check_walk()
  2017-01-24 16:28 ` Kirill A. Shutemov
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov

For consistency, it worth converting all page_check_address() to
page_check_walk(), so we could drop the former.

PMD handling here is future-proofing, we don't have users yet. ext4 with
huge pages will be the first.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/rmap.c | 66 +++++++++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 43 insertions(+), 23 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 3bbf83b32553..41874a6f6cf5 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1017,34 +1017,54 @@ int page_referenced(struct page *page,
 static int page_mkclean_one(struct page *page, struct vm_area_struct *vma,
 			    unsigned long address, void *arg)
 {
-	struct mm_struct *mm = vma->vm_mm;
-	pte_t *pte;
-	spinlock_t *ptl;
-	int ret = 0;
+	struct page_check_walk pcw = {
+		.page = page,
+		.vma = vma,
+		.address = address,
+		.flags = PAGE_CHECK_WALK_SYNC,
+	};
 	int *cleaned = arg;
 
-	pte = page_check_address(page, mm, address, &ptl, 1);
-	if (!pte)
-		goto out;
-
-	if (pte_dirty(*pte) || pte_write(*pte)) {
-		pte_t entry;
+	while (page_check_walk(&pcw)) {
+		int ret = 0;
+		address = pcw.address;
+		if (pcw.pte) {
+			pte_t entry;
+			pte_t *pte = pcw.pte;
+
+			if (!pte_dirty(*pte) && !pte_write(*pte))
+				continue;
+
+			flush_cache_page(vma, address, pte_pfn(*pte));
+			entry = ptep_clear_flush(vma, address, pte);
+			entry = pte_wrprotect(entry);
+			entry = pte_mkclean(entry);
+			set_pte_at(vma->vm_mm, address, pte, entry);
+			ret = 1;
+		} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGE_PAGECACHE)) {
+			pmd_t *pmd = pcw.pmd;
+			pmd_t entry;
+
+			if (!pmd_dirty(*pmd) && !pmd_write(*pmd))
+				continue;
+
+			flush_cache_page(vma, address, page_to_pfn(page));
+			entry = pmdp_huge_clear_flush(vma, address, pmd);
+			entry = pmd_wrprotect(entry);
+			entry = pmd_mkclean(entry);
+			set_pmd_at(vma->vm_mm, address, pmd, entry);
+			ret = 1;
+		} else {
+			/* unexpected pmd-mapped page? */
+			WARN_ON_ONCE(1);
+		}
 
-		flush_cache_page(vma, address, pte_pfn(*pte));
-		entry = ptep_clear_flush(vma, address, pte);
-		entry = pte_wrprotect(entry);
-		entry = pte_mkclean(entry);
-		set_pte_at(mm, address, pte, entry);
-		ret = 1;
+		if (ret) {
+			mmu_notifier_invalidate_page(vma->vm_mm, address);
+			(*cleaned)++;
+		}
 	}
 
-	pte_unmap_unlock(pte, ptl);
-
-	if (ret) {
-		mmu_notifier_invalidate_page(mm, address);
-		(*cleaned)++;
-	}
-out:
 	return SWAP_AGAIN;
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 06/12] mm: convert page_mkclean_one() to page_check_walk()
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  0 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov

For consistency, it worth converting all page_check_address() to
page_check_walk(), so we could drop the former.

PMD handling here is future-proofing, we don't have users yet. ext4 with
huge pages will be the first.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/rmap.c | 66 +++++++++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 43 insertions(+), 23 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 3bbf83b32553..41874a6f6cf5 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1017,34 +1017,54 @@ int page_referenced(struct page *page,
 static int page_mkclean_one(struct page *page, struct vm_area_struct *vma,
 			    unsigned long address, void *arg)
 {
-	struct mm_struct *mm = vma->vm_mm;
-	pte_t *pte;
-	spinlock_t *ptl;
-	int ret = 0;
+	struct page_check_walk pcw = {
+		.page = page,
+		.vma = vma,
+		.address = address,
+		.flags = PAGE_CHECK_WALK_SYNC,
+	};
 	int *cleaned = arg;
 
-	pte = page_check_address(page, mm, address, &ptl, 1);
-	if (!pte)
-		goto out;
-
-	if (pte_dirty(*pte) || pte_write(*pte)) {
-		pte_t entry;
+	while (page_check_walk(&pcw)) {
+		int ret = 0;
+		address = pcw.address;
+		if (pcw.pte) {
+			pte_t entry;
+			pte_t *pte = pcw.pte;
+
+			if (!pte_dirty(*pte) && !pte_write(*pte))
+				continue;
+
+			flush_cache_page(vma, address, pte_pfn(*pte));
+			entry = ptep_clear_flush(vma, address, pte);
+			entry = pte_wrprotect(entry);
+			entry = pte_mkclean(entry);
+			set_pte_at(vma->vm_mm, address, pte, entry);
+			ret = 1;
+		} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGE_PAGECACHE)) {
+			pmd_t *pmd = pcw.pmd;
+			pmd_t entry;
+
+			if (!pmd_dirty(*pmd) && !pmd_write(*pmd))
+				continue;
+
+			flush_cache_page(vma, address, page_to_pfn(page));
+			entry = pmdp_huge_clear_flush(vma, address, pmd);
+			entry = pmd_wrprotect(entry);
+			entry = pmd_mkclean(entry);
+			set_pmd_at(vma->vm_mm, address, pmd, entry);
+			ret = 1;
+		} else {
+			/* unexpected pmd-mapped page? */
+			WARN_ON_ONCE(1);
+		}
 
-		flush_cache_page(vma, address, pte_pfn(*pte));
-		entry = ptep_clear_flush(vma, address, pte);
-		entry = pte_wrprotect(entry);
-		entry = pte_mkclean(entry);
-		set_pte_at(mm, address, pte, entry);
-		ret = 1;
+		if (ret) {
+			mmu_notifier_invalidate_page(vma->vm_mm, address);
+			(*cleaned)++;
+		}
 	}
 
-	pte_unmap_unlock(pte, ptl);
-
-	if (ret) {
-		mmu_notifier_invalidate_page(mm, address);
-		(*cleaned)++;
-	}
-out:
 	return SWAP_AGAIN;
 }
 
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 07/12] mm: convert try_to_unmap_one() to page_check_walk()
  2017-01-24 16:28 ` Kirill A. Shutemov
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov

For consistency, it worth converting all page_check_address() to
page_check_walk(), so we could drop the former.

It also makes freeze_page() as we walk though rmap only once.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |  16 +---
 mm/rmap.c        | 260 ++++++++++++++++++++++++++++---------------------------
 2 files changed, 137 insertions(+), 139 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 16820e001d79..ca7855f857fa 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1839,24 +1839,16 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma,
 static void freeze_page(struct page *page)
 {
 	enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS |
-		TTU_RMAP_LOCKED;
-	int i, ret;
+		TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD;
+	int ret;
 
 	VM_BUG_ON_PAGE(!PageHead(page), page);
 
 	if (PageAnon(page))
 		ttu_flags |= TTU_MIGRATION;
 
-	/* We only need TTU_SPLIT_HUGE_PMD once */
-	ret = try_to_unmap(page, ttu_flags | TTU_SPLIT_HUGE_PMD);
-	for (i = 1; !ret && i < HPAGE_PMD_NR; i++) {
-		/* Cut short if the page is unmapped */
-		if (page_count(page) == 1)
-			return;
-
-		ret = try_to_unmap(page + i, ttu_flags);
-	}
-	VM_BUG_ON_PAGE(ret, page + i - 1);
+	ret = try_to_unmap(page, ttu_flags);
+	VM_BUG_ON_PAGE(ret, page);
 }
 
 static void unfreeze_page(struct page *page)
diff --git a/mm/rmap.c b/mm/rmap.c
index 41874a6f6cf5..c9a096ffb242 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -607,8 +607,7 @@ void try_to_unmap_flush_dirty(void)
 		try_to_unmap_flush();
 }
 
-static void set_tlb_ubc_flush_pending(struct mm_struct *mm,
-		struct page *page, bool writable)
+static void set_tlb_ubc_flush_pending(struct mm_struct *mm, bool writable)
 {
 	struct tlbflush_unmap_batch *tlb_ubc = &current->tlb_ubc;
 
@@ -643,8 +642,7 @@ static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags)
 	return should_defer;
 }
 #else
-static void set_tlb_ubc_flush_pending(struct mm_struct *mm,
-		struct page *page, bool writable)
+static void set_tlb_ubc_flush_pending(struct mm_struct *mm, bool writable)
 {
 }
 
@@ -1457,155 +1455,163 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		     unsigned long address, void *arg)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	pte_t *pte;
+	struct page_check_walk pcw = {
+		.page = page,
+		.vma = vma,
+		.address = address,
+	};
 	pte_t pteval;
-	spinlock_t *ptl;
+	struct page *subpage;
 	int ret = SWAP_AGAIN;
 	struct rmap_private *rp = arg;
 	enum ttu_flags flags = rp->flags;
 
 	/* munlock has nothing to gain from examining un-locked vmas */
 	if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
-		goto out;
+		return SWAP_AGAIN;
 
 	if (flags & TTU_SPLIT_HUGE_PMD) {
 		split_huge_pmd_address(vma, address,
 				flags & TTU_MIGRATION, page);
-		/* check if we have anything to do after split */
-		if (page_mapcount(page) == 0)
-			goto out;
 	}
 
-	pte = page_check_address(page, mm, address, &ptl,
-				 PageTransCompound(page));
-	if (!pte)
-		goto out;
+	while (page_check_walk(&pcw)) {
+		subpage = page - page_to_pfn(page) + pte_pfn(*pcw.pte);
+		address = pcw.address;
 
-	/*
-	 * If the page is mlock()d, we cannot swap it out.
-	 * If it's recently referenced (perhaps page_referenced
-	 * skipped over this mm) then we should reactivate it.
-	 */
-	if (!(flags & TTU_IGNORE_MLOCK)) {
-		if (vma->vm_flags & VM_LOCKED) {
-			/* PTE-mapped THP are never mlocked */
-			if (!PageTransCompound(page)) {
-				/*
-				 * Holding pte lock, we do *not* need
-				 * mmap_sem here
-				 */
-				mlock_vma_page(page);
+		/* Unexpected PMD-mapped THP? */
+		VM_BUG_ON_PAGE(!pcw.pte, page);
+
+		/*
+		 * If the page is mlock()d, we cannot swap it out.
+		 * If it's recently referenced (perhaps page_referenced
+		 * skipped over this mm) then we should reactivate it.
+		 */
+		if (!(flags & TTU_IGNORE_MLOCK)) {
+			if (vma->vm_flags & VM_LOCKED) {
+				/* PTE-mapped THP are never mlocked */
+				if (!PageTransCompound(page)) {
+					/*
+					 * Holding pte lock, we do *not* need
+					 * mmap_sem here
+					 */
+					mlock_vma_page(page);
+				}
+				ret = SWAP_MLOCK;
+				page_check_walk_done(&pcw);
+				break;
 			}
-			ret = SWAP_MLOCK;
-			goto out_unmap;
+			if (flags & TTU_MUNLOCK)
+				continue;
 		}
-		if (flags & TTU_MUNLOCK)
-			goto out_unmap;
-	}
-	if (!(flags & TTU_IGNORE_ACCESS)) {
-		if (ptep_clear_flush_young_notify(vma, address, pte)) {
-			ret = SWAP_FAIL;
-			goto out_unmap;
+
+		if (!(flags & TTU_IGNORE_ACCESS)) {
+			if (ptep_clear_flush_young_notify(vma, address,
+						pcw.pte)) {
+				ret = SWAP_FAIL;
+				page_check_walk_done(&pcw);
+				break;
+			}
 		}
-  	}
 
-	/* Nuke the page table entry. */
-	flush_cache_page(vma, address, page_to_pfn(page));
-	if (should_defer_flush(mm, flags)) {
-		/*
-		 * We clear the PTE but do not flush so potentially a remote
-		 * CPU could still be writing to the page. If the entry was
-		 * previously clean then the architecture must guarantee that
-		 * a clear->dirty transition on a cached TLB entry is written
-		 * through and traps if the PTE is unmapped.
-		 */
-		pteval = ptep_get_and_clear(mm, address, pte);
+		/* Nuke the page table entry. */
+		flush_cache_page(vma, address, pte_to_pfn(pcw.pte));
+		if (should_defer_flush(mm, flags)) {
+			/*
+			 * We clear the PTE but do not flush so potentially
+			 * a remote CPU could still be writing to the page.
+			 * If the entry was previously clean then the
+			 * architecture must guarantee that a clear->dirty
+			 * transition on a cached TLB entry is written through
+			 * and traps if the PTE is unmapped.
+			 */
+			pteval = ptep_get_and_clear(mm, address, pcw.pte);
+
+			set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
+		} else {
+			pteval = ptep_clear_flush(vma, address, pcw.pte);
+		}
 
-		set_tlb_ubc_flush_pending(mm, page, pte_dirty(pteval));
-	} else {
-		pteval = ptep_clear_flush(vma, address, pte);
-	}
+		/* Move the dirty bit to the page. Now the pte is gone. */
+		if (pte_dirty(pteval))
+			set_page_dirty(page);
 
-	/* Move the dirty bit to the physical page now the pte is gone. */
-	if (pte_dirty(pteval))
-		set_page_dirty(page);
+		/* Update high watermark before we lower rss */
+		update_hiwater_rss(mm);
 
-	/* Update high watermark before we lower rss */
-	update_hiwater_rss(mm);
+		if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) {
+			if (PageHuge(page)) {
+				int nr = 1 << compound_order(page);
+				hugetlb_count_sub(nr, mm);
+			} else {
+				dec_mm_counter(mm, mm_counter(page));
+			}
 
-	if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) {
-		if (PageHuge(page)) {
-			hugetlb_count_sub(1 << compound_order(page), mm);
-		} else {
+			pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
+			set_pte_at(mm, address, pcw.pte, pteval);
+		} else if (pte_unused(pteval)) {
+			/*
+			 * The guest indicated that the page content is of no
+			 * interest anymore. Simply discard the pte, vmscan
+			 * will take care of the rest.
+			 */
 			dec_mm_counter(mm, mm_counter(page));
-		}
-		set_pte_at(mm, address, pte,
-			   swp_entry_to_pte(make_hwpoison_entry(page)));
-	} else if (pte_unused(pteval)) {
-		/*
-		 * The guest indicated that the page content is of no
-		 * interest anymore. Simply discard the pte, vmscan
-		 * will take care of the rest.
-		 */
-		dec_mm_counter(mm, mm_counter(page));
-	} else if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION)) {
-		swp_entry_t entry;
-		pte_t swp_pte;
-		/*
-		 * Store the pfn of the page in a special migration
-		 * pte. do_swap_page() will wait until the migration
-		 * pte is removed and then restart fault handling.
-		 */
-		entry = make_migration_entry(page, pte_write(pteval));
-		swp_pte = swp_entry_to_pte(entry);
-		if (pte_soft_dirty(pteval))
-			swp_pte = pte_swp_mksoft_dirty(swp_pte);
-		set_pte_at(mm, address, pte, swp_pte);
-	} else if (PageAnon(page)) {
-		swp_entry_t entry = { .val = page_private(page) };
-		pte_t swp_pte;
-		/*
-		 * Store the swap location in the pte.
-		 * See handle_pte_fault() ...
-		 */
-		VM_BUG_ON_PAGE(!PageSwapCache(page), page);
+		} else if (IS_ENABLED(CONFIG_MIGRATION) &&
+				(flags & TTU_MIGRATION)) {
+			swp_entry_t entry;
+			pte_t swp_pte;
+			/*
+			 * Store the pfn of the page in a special migration
+			 * pte. do_swap_page() will wait until the migration
+			 * pte is removed and then restart fault handling.
+			 */
+			entry = make_migration_entry(subpage,
+					pte_write(pteval));
+			swp_pte = swp_entry_to_pte(entry);
+			if (pte_soft_dirty(pteval))
+				swp_pte = pte_swp_mksoft_dirty(swp_pte);
+			set_pte_at(mm, address, pcw.pte, swp_pte);
+		} else if (PageAnon(page)) {
+			swp_entry_t entry = { .val = page_private(subpage) };
+			pte_t swp_pte;
+			/*
+			 * Store the swap location in the pte.
+			 * See handle_pte_fault() ...
+			 */
+			VM_BUG_ON_PAGE(!PageSwapCache(page), page);
+
+			if (!PageDirty(page) && (flags & TTU_LZFREE)) {
+				/* It's a freeable page by MADV_FREE */
+				dec_mm_counter(mm, MM_ANONPAGES);
+				rp->lazyfreed++;
+				goto discard;
+			}
 
-		if (!PageDirty(page) && (flags & TTU_LZFREE)) {
-			/* It's a freeable page by MADV_FREE */
+			if (swap_duplicate(entry) < 0) {
+				set_pte_at(mm, address, pcw.pte, pteval);
+				ret = SWAP_FAIL;
+				page_check_walk_done(&pcw);
+				break;
+			}
+			if (list_empty(&mm->mmlist)) {
+				spin_lock(&mmlist_lock);
+				if (list_empty(&mm->mmlist))
+					list_add(&mm->mmlist, &init_mm.mmlist);
+				spin_unlock(&mmlist_lock);
+			}
 			dec_mm_counter(mm, MM_ANONPAGES);
-			rp->lazyfreed++;
-			goto discard;
-		}
-
-		if (swap_duplicate(entry) < 0) {
-			set_pte_at(mm, address, pte, pteval);
-			ret = SWAP_FAIL;
-			goto out_unmap;
-		}
-		if (list_empty(&mm->mmlist)) {
-			spin_lock(&mmlist_lock);
-			if (list_empty(&mm->mmlist))
-				list_add(&mm->mmlist, &init_mm.mmlist);
-			spin_unlock(&mmlist_lock);
-		}
-		dec_mm_counter(mm, MM_ANONPAGES);
-		inc_mm_counter(mm, MM_SWAPENTS);
-		swp_pte = swp_entry_to_pte(entry);
-		if (pte_soft_dirty(pteval))
-			swp_pte = pte_swp_mksoft_dirty(swp_pte);
-		set_pte_at(mm, address, pte, swp_pte);
-	} else
-		dec_mm_counter(mm, mm_counter_file(page));
-
+			inc_mm_counter(mm, MM_SWAPENTS);
+			swp_pte = swp_entry_to_pte(entry);
+			if (pte_soft_dirty(pteval))
+				swp_pte = pte_swp_mksoft_dirty(swp_pte);
+			set_pte_at(mm, address, pcw.pte, swp_pte);
+		} else
+			dec_mm_counter(mm, mm_counter_file(page));
 discard:
-	page_remove_rmap(page, PageHuge(page));
-	put_page(page);
-
-out_unmap:
-	pte_unmap_unlock(pte, ptl);
-	if (ret != SWAP_FAIL && ret != SWAP_MLOCK && !(flags & TTU_MUNLOCK))
+		page_remove_rmap(subpage, PageHuge(page));
+		put_page(page);
 		mmu_notifier_invalidate_page(mm, address);
-out:
+	}
 	return ret;
 }
 
@@ -1630,7 +1636,7 @@ static bool invalid_migration_vma(struct vm_area_struct *vma, void *arg)
 
 static int page_mapcount_is_zero(struct page *page)
 {
-	return !page_mapcount(page);
+	return !total_mapcount(page);
 }
 
 /**
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 07/12] mm: convert try_to_unmap_one() to page_check_walk()
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  0 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov

For consistency, it worth converting all page_check_address() to
page_check_walk(), so we could drop the former.

It also makes freeze_page() as we walk though rmap only once.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |  16 +---
 mm/rmap.c        | 260 ++++++++++++++++++++++++++++---------------------------
 2 files changed, 137 insertions(+), 139 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 16820e001d79..ca7855f857fa 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1839,24 +1839,16 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma,
 static void freeze_page(struct page *page)
 {
 	enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS |
-		TTU_RMAP_LOCKED;
-	int i, ret;
+		TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD;
+	int ret;
 
 	VM_BUG_ON_PAGE(!PageHead(page), page);
 
 	if (PageAnon(page))
 		ttu_flags |= TTU_MIGRATION;
 
-	/* We only need TTU_SPLIT_HUGE_PMD once */
-	ret = try_to_unmap(page, ttu_flags | TTU_SPLIT_HUGE_PMD);
-	for (i = 1; !ret && i < HPAGE_PMD_NR; i++) {
-		/* Cut short if the page is unmapped */
-		if (page_count(page) == 1)
-			return;
-
-		ret = try_to_unmap(page + i, ttu_flags);
-	}
-	VM_BUG_ON_PAGE(ret, page + i - 1);
+	ret = try_to_unmap(page, ttu_flags);
+	VM_BUG_ON_PAGE(ret, page);
 }
 
 static void unfreeze_page(struct page *page)
diff --git a/mm/rmap.c b/mm/rmap.c
index 41874a6f6cf5..c9a096ffb242 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -607,8 +607,7 @@ void try_to_unmap_flush_dirty(void)
 		try_to_unmap_flush();
 }
 
-static void set_tlb_ubc_flush_pending(struct mm_struct *mm,
-		struct page *page, bool writable)
+static void set_tlb_ubc_flush_pending(struct mm_struct *mm, bool writable)
 {
 	struct tlbflush_unmap_batch *tlb_ubc = &current->tlb_ubc;
 
@@ -643,8 +642,7 @@ static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags)
 	return should_defer;
 }
 #else
-static void set_tlb_ubc_flush_pending(struct mm_struct *mm,
-		struct page *page, bool writable)
+static void set_tlb_ubc_flush_pending(struct mm_struct *mm, bool writable)
 {
 }
 
@@ -1457,155 +1455,163 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		     unsigned long address, void *arg)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	pte_t *pte;
+	struct page_check_walk pcw = {
+		.page = page,
+		.vma = vma,
+		.address = address,
+	};
 	pte_t pteval;
-	spinlock_t *ptl;
+	struct page *subpage;
 	int ret = SWAP_AGAIN;
 	struct rmap_private *rp = arg;
 	enum ttu_flags flags = rp->flags;
 
 	/* munlock has nothing to gain from examining un-locked vmas */
 	if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
-		goto out;
+		return SWAP_AGAIN;
 
 	if (flags & TTU_SPLIT_HUGE_PMD) {
 		split_huge_pmd_address(vma, address,
 				flags & TTU_MIGRATION, page);
-		/* check if we have anything to do after split */
-		if (page_mapcount(page) == 0)
-			goto out;
 	}
 
-	pte = page_check_address(page, mm, address, &ptl,
-				 PageTransCompound(page));
-	if (!pte)
-		goto out;
+	while (page_check_walk(&pcw)) {
+		subpage = page - page_to_pfn(page) + pte_pfn(*pcw.pte);
+		address = pcw.address;
 
-	/*
-	 * If the page is mlock()d, we cannot swap it out.
-	 * If it's recently referenced (perhaps page_referenced
-	 * skipped over this mm) then we should reactivate it.
-	 */
-	if (!(flags & TTU_IGNORE_MLOCK)) {
-		if (vma->vm_flags & VM_LOCKED) {
-			/* PTE-mapped THP are never mlocked */
-			if (!PageTransCompound(page)) {
-				/*
-				 * Holding pte lock, we do *not* need
-				 * mmap_sem here
-				 */
-				mlock_vma_page(page);
+		/* Unexpected PMD-mapped THP? */
+		VM_BUG_ON_PAGE(!pcw.pte, page);
+
+		/*
+		 * If the page is mlock()d, we cannot swap it out.
+		 * If it's recently referenced (perhaps page_referenced
+		 * skipped over this mm) then we should reactivate it.
+		 */
+		if (!(flags & TTU_IGNORE_MLOCK)) {
+			if (vma->vm_flags & VM_LOCKED) {
+				/* PTE-mapped THP are never mlocked */
+				if (!PageTransCompound(page)) {
+					/*
+					 * Holding pte lock, we do *not* need
+					 * mmap_sem here
+					 */
+					mlock_vma_page(page);
+				}
+				ret = SWAP_MLOCK;
+				page_check_walk_done(&pcw);
+				break;
 			}
-			ret = SWAP_MLOCK;
-			goto out_unmap;
+			if (flags & TTU_MUNLOCK)
+				continue;
 		}
-		if (flags & TTU_MUNLOCK)
-			goto out_unmap;
-	}
-	if (!(flags & TTU_IGNORE_ACCESS)) {
-		if (ptep_clear_flush_young_notify(vma, address, pte)) {
-			ret = SWAP_FAIL;
-			goto out_unmap;
+
+		if (!(flags & TTU_IGNORE_ACCESS)) {
+			if (ptep_clear_flush_young_notify(vma, address,
+						pcw.pte)) {
+				ret = SWAP_FAIL;
+				page_check_walk_done(&pcw);
+				break;
+			}
 		}
-  	}
 
-	/* Nuke the page table entry. */
-	flush_cache_page(vma, address, page_to_pfn(page));
-	if (should_defer_flush(mm, flags)) {
-		/*
-		 * We clear the PTE but do not flush so potentially a remote
-		 * CPU could still be writing to the page. If the entry was
-		 * previously clean then the architecture must guarantee that
-		 * a clear->dirty transition on a cached TLB entry is written
-		 * through and traps if the PTE is unmapped.
-		 */
-		pteval = ptep_get_and_clear(mm, address, pte);
+		/* Nuke the page table entry. */
+		flush_cache_page(vma, address, pte_to_pfn(pcw.pte));
+		if (should_defer_flush(mm, flags)) {
+			/*
+			 * We clear the PTE but do not flush so potentially
+			 * a remote CPU could still be writing to the page.
+			 * If the entry was previously clean then the
+			 * architecture must guarantee that a clear->dirty
+			 * transition on a cached TLB entry is written through
+			 * and traps if the PTE is unmapped.
+			 */
+			pteval = ptep_get_and_clear(mm, address, pcw.pte);
+
+			set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
+		} else {
+			pteval = ptep_clear_flush(vma, address, pcw.pte);
+		}
 
-		set_tlb_ubc_flush_pending(mm, page, pte_dirty(pteval));
-	} else {
-		pteval = ptep_clear_flush(vma, address, pte);
-	}
+		/* Move the dirty bit to the page. Now the pte is gone. */
+		if (pte_dirty(pteval))
+			set_page_dirty(page);
 
-	/* Move the dirty bit to the physical page now the pte is gone. */
-	if (pte_dirty(pteval))
-		set_page_dirty(page);
+		/* Update high watermark before we lower rss */
+		update_hiwater_rss(mm);
 
-	/* Update high watermark before we lower rss */
-	update_hiwater_rss(mm);
+		if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) {
+			if (PageHuge(page)) {
+				int nr = 1 << compound_order(page);
+				hugetlb_count_sub(nr, mm);
+			} else {
+				dec_mm_counter(mm, mm_counter(page));
+			}
 
-	if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) {
-		if (PageHuge(page)) {
-			hugetlb_count_sub(1 << compound_order(page), mm);
-		} else {
+			pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
+			set_pte_at(mm, address, pcw.pte, pteval);
+		} else if (pte_unused(pteval)) {
+			/*
+			 * The guest indicated that the page content is of no
+			 * interest anymore. Simply discard the pte, vmscan
+			 * will take care of the rest.
+			 */
 			dec_mm_counter(mm, mm_counter(page));
-		}
-		set_pte_at(mm, address, pte,
-			   swp_entry_to_pte(make_hwpoison_entry(page)));
-	} else if (pte_unused(pteval)) {
-		/*
-		 * The guest indicated that the page content is of no
-		 * interest anymore. Simply discard the pte, vmscan
-		 * will take care of the rest.
-		 */
-		dec_mm_counter(mm, mm_counter(page));
-	} else if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION)) {
-		swp_entry_t entry;
-		pte_t swp_pte;
-		/*
-		 * Store the pfn of the page in a special migration
-		 * pte. do_swap_page() will wait until the migration
-		 * pte is removed and then restart fault handling.
-		 */
-		entry = make_migration_entry(page, pte_write(pteval));
-		swp_pte = swp_entry_to_pte(entry);
-		if (pte_soft_dirty(pteval))
-			swp_pte = pte_swp_mksoft_dirty(swp_pte);
-		set_pte_at(mm, address, pte, swp_pte);
-	} else if (PageAnon(page)) {
-		swp_entry_t entry = { .val = page_private(page) };
-		pte_t swp_pte;
-		/*
-		 * Store the swap location in the pte.
-		 * See handle_pte_fault() ...
-		 */
-		VM_BUG_ON_PAGE(!PageSwapCache(page), page);
+		} else if (IS_ENABLED(CONFIG_MIGRATION) &&
+				(flags & TTU_MIGRATION)) {
+			swp_entry_t entry;
+			pte_t swp_pte;
+			/*
+			 * Store the pfn of the page in a special migration
+			 * pte. do_swap_page() will wait until the migration
+			 * pte is removed and then restart fault handling.
+			 */
+			entry = make_migration_entry(subpage,
+					pte_write(pteval));
+			swp_pte = swp_entry_to_pte(entry);
+			if (pte_soft_dirty(pteval))
+				swp_pte = pte_swp_mksoft_dirty(swp_pte);
+			set_pte_at(mm, address, pcw.pte, swp_pte);
+		} else if (PageAnon(page)) {
+			swp_entry_t entry = { .val = page_private(subpage) };
+			pte_t swp_pte;
+			/*
+			 * Store the swap location in the pte.
+			 * See handle_pte_fault() ...
+			 */
+			VM_BUG_ON_PAGE(!PageSwapCache(page), page);
+
+			if (!PageDirty(page) && (flags & TTU_LZFREE)) {
+				/* It's a freeable page by MADV_FREE */
+				dec_mm_counter(mm, MM_ANONPAGES);
+				rp->lazyfreed++;
+				goto discard;
+			}
 
-		if (!PageDirty(page) && (flags & TTU_LZFREE)) {
-			/* It's a freeable page by MADV_FREE */
+			if (swap_duplicate(entry) < 0) {
+				set_pte_at(mm, address, pcw.pte, pteval);
+				ret = SWAP_FAIL;
+				page_check_walk_done(&pcw);
+				break;
+			}
+			if (list_empty(&mm->mmlist)) {
+				spin_lock(&mmlist_lock);
+				if (list_empty(&mm->mmlist))
+					list_add(&mm->mmlist, &init_mm.mmlist);
+				spin_unlock(&mmlist_lock);
+			}
 			dec_mm_counter(mm, MM_ANONPAGES);
-			rp->lazyfreed++;
-			goto discard;
-		}
-
-		if (swap_duplicate(entry) < 0) {
-			set_pte_at(mm, address, pte, pteval);
-			ret = SWAP_FAIL;
-			goto out_unmap;
-		}
-		if (list_empty(&mm->mmlist)) {
-			spin_lock(&mmlist_lock);
-			if (list_empty(&mm->mmlist))
-				list_add(&mm->mmlist, &init_mm.mmlist);
-			spin_unlock(&mmlist_lock);
-		}
-		dec_mm_counter(mm, MM_ANONPAGES);
-		inc_mm_counter(mm, MM_SWAPENTS);
-		swp_pte = swp_entry_to_pte(entry);
-		if (pte_soft_dirty(pteval))
-			swp_pte = pte_swp_mksoft_dirty(swp_pte);
-		set_pte_at(mm, address, pte, swp_pte);
-	} else
-		dec_mm_counter(mm, mm_counter_file(page));
-
+			inc_mm_counter(mm, MM_SWAPENTS);
+			swp_pte = swp_entry_to_pte(entry);
+			if (pte_soft_dirty(pteval))
+				swp_pte = pte_swp_mksoft_dirty(swp_pte);
+			set_pte_at(mm, address, pcw.pte, swp_pte);
+		} else
+			dec_mm_counter(mm, mm_counter_file(page));
 discard:
-	page_remove_rmap(page, PageHuge(page));
-	put_page(page);
-
-out_unmap:
-	pte_unmap_unlock(pte, ptl);
-	if (ret != SWAP_FAIL && ret != SWAP_MLOCK && !(flags & TTU_MUNLOCK))
+		page_remove_rmap(subpage, PageHuge(page));
+		put_page(page);
 		mmu_notifier_invalidate_page(mm, address);
-out:
+	}
 	return ret;
 }
 
@@ -1630,7 +1636,7 @@ static bool invalid_migration_vma(struct vm_area_struct *vma, void *arg)
 
 static int page_mapcount_is_zero(struct page *page)
 {
-	return !page_mapcount(page);
+	return !total_mapcount(page);
 }
 
 /**
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 08/12] mm, ksm: convert write_protect_page() to page_check_walk()
  2017-01-24 16:28 ` Kirill A. Shutemov
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov

For consistency, it worth converting all page_check_address() to
page_check_walk(), so we could drop the former.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/ksm.c | 34 ++++++++++++++++++----------------
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 9ae6011a41f8..6653ca186cfe 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -850,33 +850,35 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
 			      pte_t *orig_pte)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	unsigned long addr;
-	pte_t *ptep;
-	spinlock_t *ptl;
+	struct page_check_walk pcw = {
+		.page = page,
+		.vma = vma,
+	};
 	int swapped;
 	int err = -EFAULT;
 	unsigned long mmun_start;	/* For mmu_notifiers */
 	unsigned long mmun_end;		/* For mmu_notifiers */
 
-	addr = page_address_in_vma(page, vma);
-	if (addr == -EFAULT)
+	pcw.address = page_address_in_vma(page, vma);
+	if (pcw.address == -EFAULT)
 		goto out;
 
 	BUG_ON(PageTransCompound(page));
 
-	mmun_start = addr;
-	mmun_end   = addr + PAGE_SIZE;
+	mmun_start = pcw.address;
+	mmun_end   = pcw.address + PAGE_SIZE;
 	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
 
-	ptep = page_check_address(page, mm, addr, &ptl, 0);
-	if (!ptep)
+	if (!page_check_walk(&pcw))
 		goto out_mn;
+	if (WARN_ONCE(!pcw.pte, "Unexpected PMD mapping?"))
+		goto out_unlock;
 
-	if (pte_write(*ptep) || pte_dirty(*ptep)) {
+	if (pte_write(*pcw.pte) || pte_dirty(*pcw.pte)) {
 		pte_t entry;
 
 		swapped = PageSwapCache(page);
-		flush_cache_page(vma, addr, page_to_pfn(page));
+		flush_cache_page(vma, pcw.address, page_to_pfn(page));
 		/*
 		 * Ok this is tricky, when get_user_pages_fast() run it doesn't
 		 * take any lock, therefore the check that we are going to make
@@ -886,25 +888,25 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
 		 * this assure us that no O_DIRECT can happen after the check
 		 * or in the middle of the check.
 		 */
-		entry = ptep_clear_flush_notify(vma, addr, ptep);
+		entry = ptep_clear_flush_notify(vma, pcw.address, pcw.pte);
 		/*
 		 * Check that no O_DIRECT or similar I/O is in progress on the
 		 * page
 		 */
 		if (page_mapcount(page) + 1 + swapped != page_count(page)) {
-			set_pte_at(mm, addr, ptep, entry);
+			set_pte_at(mm, pcw.address, pcw.pte, entry);
 			goto out_unlock;
 		}
 		if (pte_dirty(entry))
 			set_page_dirty(page);
 		entry = pte_mkclean(pte_wrprotect(entry));
-		set_pte_at_notify(mm, addr, ptep, entry);
+		set_pte_at_notify(mm, pcw.address, pcw.pte, entry);
 	}
-	*orig_pte = *ptep;
+	*orig_pte = *pcw.pte;
 	err = 0;
 
 out_unlock:
-	pte_unmap_unlock(ptep, ptl);
+	page_check_walk_done(&pcw);
 out_mn:
 	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
 out:
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 08/12] mm, ksm: convert write_protect_page() to page_check_walk()
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  0 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov

For consistency, it worth converting all page_check_address() to
page_check_walk(), so we could drop the former.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/ksm.c | 34 ++++++++++++++++++----------------
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 9ae6011a41f8..6653ca186cfe 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -850,33 +850,35 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
 			      pte_t *orig_pte)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	unsigned long addr;
-	pte_t *ptep;
-	spinlock_t *ptl;
+	struct page_check_walk pcw = {
+		.page = page,
+		.vma = vma,
+	};
 	int swapped;
 	int err = -EFAULT;
 	unsigned long mmun_start;	/* For mmu_notifiers */
 	unsigned long mmun_end;		/* For mmu_notifiers */
 
-	addr = page_address_in_vma(page, vma);
-	if (addr == -EFAULT)
+	pcw.address = page_address_in_vma(page, vma);
+	if (pcw.address == -EFAULT)
 		goto out;
 
 	BUG_ON(PageTransCompound(page));
 
-	mmun_start = addr;
-	mmun_end   = addr + PAGE_SIZE;
+	mmun_start = pcw.address;
+	mmun_end   = pcw.address + PAGE_SIZE;
 	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
 
-	ptep = page_check_address(page, mm, addr, &ptl, 0);
-	if (!ptep)
+	if (!page_check_walk(&pcw))
 		goto out_mn;
+	if (WARN_ONCE(!pcw.pte, "Unexpected PMD mapping?"))
+		goto out_unlock;
 
-	if (pte_write(*ptep) || pte_dirty(*ptep)) {
+	if (pte_write(*pcw.pte) || pte_dirty(*pcw.pte)) {
 		pte_t entry;
 
 		swapped = PageSwapCache(page);
-		flush_cache_page(vma, addr, page_to_pfn(page));
+		flush_cache_page(vma, pcw.address, page_to_pfn(page));
 		/*
 		 * Ok this is tricky, when get_user_pages_fast() run it doesn't
 		 * take any lock, therefore the check that we are going to make
@@ -886,25 +888,25 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
 		 * this assure us that no O_DIRECT can happen after the check
 		 * or in the middle of the check.
 		 */
-		entry = ptep_clear_flush_notify(vma, addr, ptep);
+		entry = ptep_clear_flush_notify(vma, pcw.address, pcw.pte);
 		/*
 		 * Check that no O_DIRECT or similar I/O is in progress on the
 		 * page
 		 */
 		if (page_mapcount(page) + 1 + swapped != page_count(page)) {
-			set_pte_at(mm, addr, ptep, entry);
+			set_pte_at(mm, pcw.address, pcw.pte, entry);
 			goto out_unlock;
 		}
 		if (pte_dirty(entry))
 			set_page_dirty(page);
 		entry = pte_mkclean(pte_wrprotect(entry));
-		set_pte_at_notify(mm, addr, ptep, entry);
+		set_pte_at_notify(mm, pcw.address, pcw.pte, entry);
 	}
-	*orig_pte = *ptep;
+	*orig_pte = *pcw.pte;
 	err = 0;
 
 out_unlock:
-	pte_unmap_unlock(ptep, ptl);
+	page_check_walk_done(&pcw);
 out_mn:
 	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
 out:
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 09/12] mm, uprobes: convert __replace_page() to page_check_walk()
  2017-01-24 16:28 ` Kirill A. Shutemov
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov

For consistency, it worth converting all page_check_address() to
page_check_walk(), so we could drop the former.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 kernel/events/uprobes.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 1e65c79e52a6..6dbaa93b22fa 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -153,14 +153,19 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 				struct page *old_page, struct page *new_page)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	spinlock_t *ptl;
-	pte_t *ptep;
+	struct page_check_walk pcw = {
+		.page = old_page,
+		.vma = vma,
+		.address = addr,
+	};
 	int err;
 	/* For mmu_notifiers */
 	const unsigned long mmun_start = addr;
 	const unsigned long mmun_end   = addr + PAGE_SIZE;
 	struct mem_cgroup *memcg;
 
+	VM_BUG_ON_PAGE(PageTransHuge(old_page), old_page);
+
 	err = mem_cgroup_try_charge(new_page, vma->vm_mm, GFP_KERNEL, &memcg,
 			false);
 	if (err)
@@ -171,11 +176,11 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 
 	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
 	err = -EAGAIN;
-	ptep = page_check_address(old_page, mm, addr, &ptl, 0);
-	if (!ptep) {
+	if (!page_check_walk(&pcw)) {
 		mem_cgroup_cancel_charge(new_page, memcg, false);
 		goto unlock;
 	}
+	VM_BUG_ON_PAGE(addr != pcw.address, old_page);
 
 	get_page(new_page);
 	page_add_new_anon_rmap(new_page, vma, addr, false);
@@ -187,14 +192,15 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 		inc_mm_counter(mm, MM_ANONPAGES);
 	}
 
-	flush_cache_page(vma, addr, pte_pfn(*ptep));
-	ptep_clear_flush_notify(vma, addr, ptep);
-	set_pte_at_notify(mm, addr, ptep, mk_pte(new_page, vma->vm_page_prot));
+	flush_cache_page(vma, addr, pte_pfn(*pcw.pte));
+	ptep_clear_flush_notify(vma, addr, pcw.pte);
+	set_pte_at_notify(mm, addr, pcw.pte,
+			mk_pte(new_page, vma->vm_page_prot));
 
 	page_remove_rmap(old_page, false);
 	if (!page_mapped(old_page))
 		try_to_free_swap(old_page);
-	pte_unmap_unlock(ptep, ptl);
+	page_check_walk_done(&pcw);
 
 	if (vma->vm_flags & VM_LOCKED)
 		munlock_vma_page(old_page);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 09/12] mm, uprobes: convert __replace_page() to page_check_walk()
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  0 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov

For consistency, it worth converting all page_check_address() to
page_check_walk(), so we could drop the former.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 kernel/events/uprobes.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 1e65c79e52a6..6dbaa93b22fa 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -153,14 +153,19 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 				struct page *old_page, struct page *new_page)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	spinlock_t *ptl;
-	pte_t *ptep;
+	struct page_check_walk pcw = {
+		.page = old_page,
+		.vma = vma,
+		.address = addr,
+	};
 	int err;
 	/* For mmu_notifiers */
 	const unsigned long mmun_start = addr;
 	const unsigned long mmun_end   = addr + PAGE_SIZE;
 	struct mem_cgroup *memcg;
 
+	VM_BUG_ON_PAGE(PageTransHuge(old_page), old_page);
+
 	err = mem_cgroup_try_charge(new_page, vma->vm_mm, GFP_KERNEL, &memcg,
 			false);
 	if (err)
@@ -171,11 +176,11 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 
 	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
 	err = -EAGAIN;
-	ptep = page_check_address(old_page, mm, addr, &ptl, 0);
-	if (!ptep) {
+	if (!page_check_walk(&pcw)) {
 		mem_cgroup_cancel_charge(new_page, memcg, false);
 		goto unlock;
 	}
+	VM_BUG_ON_PAGE(addr != pcw.address, old_page);
 
 	get_page(new_page);
 	page_add_new_anon_rmap(new_page, vma, addr, false);
@@ -187,14 +192,15 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 		inc_mm_counter(mm, MM_ANONPAGES);
 	}
 
-	flush_cache_page(vma, addr, pte_pfn(*ptep));
-	ptep_clear_flush_notify(vma, addr, ptep);
-	set_pte_at_notify(mm, addr, ptep, mk_pte(new_page, vma->vm_page_prot));
+	flush_cache_page(vma, addr, pte_pfn(*pcw.pte));
+	ptep_clear_flush_notify(vma, addr, pcw.pte);
+	set_pte_at_notify(mm, addr, pcw.pte,
+			mk_pte(new_page, vma->vm_page_prot));
 
 	page_remove_rmap(old_page, false);
 	if (!page_mapped(old_page))
 		try_to_free_swap(old_page);
-	pte_unmap_unlock(ptep, ptl);
+	page_check_walk_done(&pcw);
 
 	if (vma->vm_flags & VM_LOCKED)
 		munlock_vma_page(old_page);
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 10/12] mm: convert page_mapped_in_vma() to page_check_walk()
  2017-01-24 16:28 ` Kirill A. Shutemov
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov

For consistency, it worth converting all page_check_address() to
page_check_walk(), so we could drop the former.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/page_check.c | 30 ++++++++++++++++++++++++++++++
 mm/rmap.c       | 26 --------------------------
 2 files changed, 30 insertions(+), 26 deletions(-)

diff --git a/mm/page_check.c b/mm/page_check.c
index d4b3536a6bf2..5a544ca382a7 100644
--- a/mm/page_check.c
+++ b/mm/page_check.c
@@ -146,3 +146,33 @@ next_pte:	do {
 		}
 	}
 }
+
+/**
+ * page_mapped_in_vma - check whether a page is really mapped in a VMA
+ * @page: the page to test
+ * @vma: the VMA to test
+ *
+ * Returns 1 if the page is mapped into the page tables of the VMA, 0
+ * if the page is not mapped into the page tables of this VMA.  Only
+ * valid for normal file or anonymous VMAs.
+ */
+int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
+{
+	struct page_check_walk pcw = {
+		.page = page,
+		.vma = vma,
+		.flags = PAGE_CHECK_WALK_SYNC,
+	};
+	unsigned long start, end;
+
+	start = __vma_address(page, vma);
+	end = start + PAGE_SIZE * (hpage_nr_pages(page) - 1);
+
+	if (unlikely(end < vma->vm_start || start >= vma->vm_end))
+		return 0;
+	pcw.address = max(start, vma->vm_start);
+	if (!page_check_walk(&pcw))
+		return 0;
+	page_check_walk_done(&pcw);
+	return 1;
+}
diff --git a/mm/rmap.c b/mm/rmap.c
index c9a096ffb242..cb34fd68a23a 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -756,32 +756,6 @@ pte_t *__page_check_address(struct page *page, struct mm_struct *mm,
 	return NULL;
 }
 
-/**
- * page_mapped_in_vma - check whether a page is really mapped in a VMA
- * @page: the page to test
- * @vma: the VMA to test
- *
- * Returns 1 if the page is mapped into the page tables of the VMA, 0
- * if the page is not mapped into the page tables of this VMA.  Only
- * valid for normal file or anonymous VMAs.
- */
-int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
-{
-	unsigned long address;
-	pte_t *pte;
-	spinlock_t *ptl;
-
-	address = __vma_address(page, vma);
-	if (unlikely(address < vma->vm_start || address >= vma->vm_end))
-		return 0;
-	pte = page_check_address(page, vma->vm_mm, address, &ptl, 1);
-	if (!pte)			/* the page is not in this mm */
-		return 0;
-	pte_unmap_unlock(pte, ptl);
-
-	return 1;
-}
-
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 /*
  * Check that @page is mapped at @address into @mm. In contrast to
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 10/12] mm: convert page_mapped_in_vma() to page_check_walk()
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  0 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov

For consistency, it worth converting all page_check_address() to
page_check_walk(), so we could drop the former.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/page_check.c | 30 ++++++++++++++++++++++++++++++
 mm/rmap.c       | 26 --------------------------
 2 files changed, 30 insertions(+), 26 deletions(-)

diff --git a/mm/page_check.c b/mm/page_check.c
index d4b3536a6bf2..5a544ca382a7 100644
--- a/mm/page_check.c
+++ b/mm/page_check.c
@@ -146,3 +146,33 @@ next_pte:	do {
 		}
 	}
 }
+
+/**
+ * page_mapped_in_vma - check whether a page is really mapped in a VMA
+ * @page: the page to test
+ * @vma: the VMA to test
+ *
+ * Returns 1 if the page is mapped into the page tables of the VMA, 0
+ * if the page is not mapped into the page tables of this VMA.  Only
+ * valid for normal file or anonymous VMAs.
+ */
+int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
+{
+	struct page_check_walk pcw = {
+		.page = page,
+		.vma = vma,
+		.flags = PAGE_CHECK_WALK_SYNC,
+	};
+	unsigned long start, end;
+
+	start = __vma_address(page, vma);
+	end = start + PAGE_SIZE * (hpage_nr_pages(page) - 1);
+
+	if (unlikely(end < vma->vm_start || start >= vma->vm_end))
+		return 0;
+	pcw.address = max(start, vma->vm_start);
+	if (!page_check_walk(&pcw))
+		return 0;
+	page_check_walk_done(&pcw);
+	return 1;
+}
diff --git a/mm/rmap.c b/mm/rmap.c
index c9a096ffb242..cb34fd68a23a 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -756,32 +756,6 @@ pte_t *__page_check_address(struct page *page, struct mm_struct *mm,
 	return NULL;
 }
 
-/**
- * page_mapped_in_vma - check whether a page is really mapped in a VMA
- * @page: the page to test
- * @vma: the VMA to test
- *
- * Returns 1 if the page is mapped into the page tables of the VMA, 0
- * if the page is not mapped into the page tables of this VMA.  Only
- * valid for normal file or anonymous VMAs.
- */
-int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
-{
-	unsigned long address;
-	pte_t *pte;
-	spinlock_t *ptl;
-
-	address = __vma_address(page, vma);
-	if (unlikely(address < vma->vm_start || address >= vma->vm_end))
-		return 0;
-	pte = page_check_address(page, vma->vm_mm, address, &ptl, 1);
-	if (!pte)			/* the page is not in this mm */
-		return 0;
-	pte_unmap_unlock(pte, ptl);
-
-	return 1;
-}
-
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 /*
  * Check that @page is mapped at @address into @mm. In contrast to
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 11/12] mm: drop page_check_address{,_transhuge}
  2017-01-24 16:28 ` Kirill A. Shutemov
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov

All users are gone. Let's drop them.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/rmap.h |  36 --------------
 mm/rmap.c            | 138 ---------------------------------------------------
 2 files changed, 174 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 474279810742..74113df9418d 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -196,42 +196,6 @@ int page_referenced(struct page *, int is_locked,
 
 int try_to_unmap(struct page *, enum ttu_flags flags);
 
-/*
- * Used by uprobes to replace a userspace page safely
- */
-pte_t *__page_check_address(struct page *, struct mm_struct *,
-				unsigned long, spinlock_t **, int);
-
-static inline pte_t *page_check_address(struct page *page, struct mm_struct *mm,
-					unsigned long address,
-					spinlock_t **ptlp, int sync)
-{
-	pte_t *ptep;
-
-	__cond_lock(*ptlp, ptep = __page_check_address(page, mm, address,
-						       ptlp, sync));
-	return ptep;
-}
-
-/*
- * Used by idle page tracking to check if a page was referenced via page
- * tables.
- */
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-bool page_check_address_transhuge(struct page *page, struct mm_struct *mm,
-				  unsigned long address, pmd_t **pmdp,
-				  pte_t **ptep, spinlock_t **ptlp);
-#else
-static inline bool page_check_address_transhuge(struct page *page,
-				struct mm_struct *mm, unsigned long address,
-				pmd_t **pmdp, pte_t **ptep, spinlock_t **ptlp)
-{
-	*ptep = page_check_address(page, mm, address, ptlp, 0);
-	*pmdp = NULL;
-	return !!*ptep;
-}
-#endif
-
 /* Avoid racy checks */
 #define PAGE_CHECK_WALK_SYNC		(1 << 0)
 /* Look for migarion entries rather than present ptes */
diff --git a/mm/rmap.c b/mm/rmap.c
index cb34fd68a23a..7106eb9b37a8 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -708,144 +708,6 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address)
 	return pmd;
 }
 
-/*
- * Check that @page is mapped at @address into @mm.
- *
- * If @sync is false, page_check_address may perform a racy check to avoid
- * the page table lock when the pte is not present (helpful when reclaiming
- * highly shared pages).
- *
- * On success returns with pte mapped and locked.
- */
-pte_t *__page_check_address(struct page *page, struct mm_struct *mm,
-			  unsigned long address, spinlock_t **ptlp, int sync)
-{
-	pmd_t *pmd;
-	pte_t *pte;
-	spinlock_t *ptl;
-
-	if (unlikely(PageHuge(page))) {
-		/* when pud is not present, pte will be NULL */
-		pte = huge_pte_offset(mm, address);
-		if (!pte)
-			return NULL;
-
-		ptl = huge_pte_lockptr(page_hstate(page), mm, pte);
-		goto check;
-	}
-
-	pmd = mm_find_pmd(mm, address);
-	if (!pmd)
-		return NULL;
-
-	pte = pte_offset_map(pmd, address);
-	/* Make a quick check before getting the lock */
-	if (!sync && !pte_present(*pte)) {
-		pte_unmap(pte);
-		return NULL;
-	}
-
-	ptl = pte_lockptr(mm, pmd);
-check:
-	spin_lock(ptl);
-	if (pte_present(*pte) && page_to_pfn(page) == pte_pfn(*pte)) {
-		*ptlp = ptl;
-		return pte;
-	}
-	pte_unmap_unlock(pte, ptl);
-	return NULL;
-}
-
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-/*
- * Check that @page is mapped at @address into @mm. In contrast to
- * page_check_address(), this function can handle transparent huge pages.
- *
- * On success returns true with pte mapped and locked. For PMD-mapped
- * transparent huge pages *@ptep is set to NULL.
- */
-bool page_check_address_transhuge(struct page *page, struct mm_struct *mm,
-				  unsigned long address, pmd_t **pmdp,
-				  pte_t **ptep, spinlock_t **ptlp)
-{
-	pgd_t *pgd;
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte;
-	spinlock_t *ptl;
-
-	if (unlikely(PageHuge(page))) {
-		/* when pud is not present, pte will be NULL */
-		pte = huge_pte_offset(mm, address);
-		if (!pte)
-			return false;
-
-		ptl = huge_pte_lockptr(page_hstate(page), mm, pte);
-		pmd = NULL;
-		goto check_pte;
-	}
-
-	pgd = pgd_offset(mm, address);
-	if (!pgd_present(*pgd))
-		return false;
-	pud = pud_offset(pgd, address);
-	if (!pud_present(*pud))
-		return false;
-	pmd = pmd_offset(pud, address);
-
-	if (pmd_trans_huge(*pmd)) {
-		ptl = pmd_lock(mm, pmd);
-		if (!pmd_present(*pmd))
-			goto unlock_pmd;
-		if (unlikely(!pmd_trans_huge(*pmd))) {
-			spin_unlock(ptl);
-			goto map_pte;
-		}
-
-		if (pmd_page(*pmd) != page)
-			goto unlock_pmd;
-
-		pte = NULL;
-		goto found;
-unlock_pmd:
-		spin_unlock(ptl);
-		return false;
-	} else {
-		pmd_t pmde = *pmd;
-
-		barrier();
-		if (!pmd_present(pmde) || pmd_trans_huge(pmde))
-			return false;
-	}
-map_pte:
-	pte = pte_offset_map(pmd, address);
-	if (!pte_present(*pte)) {
-		pte_unmap(pte);
-		return false;
-	}
-
-	ptl = pte_lockptr(mm, pmd);
-check_pte:
-	spin_lock(ptl);
-
-	if (!pte_present(*pte)) {
-		pte_unmap_unlock(pte, ptl);
-		return false;
-	}
-
-	/* THP can be referenced by any subpage */
-	if (pte_pfn(*pte) - page_to_pfn(page) >= hpage_nr_pages(page)) {
-		pte_unmap_unlock(pte, ptl);
-		return false;
-	}
-found:
-	*ptep = pte;
-	*pmdp = pmd;
-	*ptlp = ptl;
-	return true;
-}
-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-
 struct page_referenced_arg {
 	int mapcount;
 	int referenced;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 11/12] mm: drop page_check_address{,_transhuge}
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  0 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov

All users are gone. Let's drop them.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/rmap.h |  36 --------------
 mm/rmap.c            | 138 ---------------------------------------------------
 2 files changed, 174 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 474279810742..74113df9418d 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -196,42 +196,6 @@ int page_referenced(struct page *, int is_locked,
 
 int try_to_unmap(struct page *, enum ttu_flags flags);
 
-/*
- * Used by uprobes to replace a userspace page safely
- */
-pte_t *__page_check_address(struct page *, struct mm_struct *,
-				unsigned long, spinlock_t **, int);
-
-static inline pte_t *page_check_address(struct page *page, struct mm_struct *mm,
-					unsigned long address,
-					spinlock_t **ptlp, int sync)
-{
-	pte_t *ptep;
-
-	__cond_lock(*ptlp, ptep = __page_check_address(page, mm, address,
-						       ptlp, sync));
-	return ptep;
-}
-
-/*
- * Used by idle page tracking to check if a page was referenced via page
- * tables.
- */
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-bool page_check_address_transhuge(struct page *page, struct mm_struct *mm,
-				  unsigned long address, pmd_t **pmdp,
-				  pte_t **ptep, spinlock_t **ptlp);
-#else
-static inline bool page_check_address_transhuge(struct page *page,
-				struct mm_struct *mm, unsigned long address,
-				pmd_t **pmdp, pte_t **ptep, spinlock_t **ptlp)
-{
-	*ptep = page_check_address(page, mm, address, ptlp, 0);
-	*pmdp = NULL;
-	return !!*ptep;
-}
-#endif
-
 /* Avoid racy checks */
 #define PAGE_CHECK_WALK_SYNC		(1 << 0)
 /* Look for migarion entries rather than present ptes */
diff --git a/mm/rmap.c b/mm/rmap.c
index cb34fd68a23a..7106eb9b37a8 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -708,144 +708,6 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address)
 	return pmd;
 }
 
-/*
- * Check that @page is mapped at @address into @mm.
- *
- * If @sync is false, page_check_address may perform a racy check to avoid
- * the page table lock when the pte is not present (helpful when reclaiming
- * highly shared pages).
- *
- * On success returns with pte mapped and locked.
- */
-pte_t *__page_check_address(struct page *page, struct mm_struct *mm,
-			  unsigned long address, spinlock_t **ptlp, int sync)
-{
-	pmd_t *pmd;
-	pte_t *pte;
-	spinlock_t *ptl;
-
-	if (unlikely(PageHuge(page))) {
-		/* when pud is not present, pte will be NULL */
-		pte = huge_pte_offset(mm, address);
-		if (!pte)
-			return NULL;
-
-		ptl = huge_pte_lockptr(page_hstate(page), mm, pte);
-		goto check;
-	}
-
-	pmd = mm_find_pmd(mm, address);
-	if (!pmd)
-		return NULL;
-
-	pte = pte_offset_map(pmd, address);
-	/* Make a quick check before getting the lock */
-	if (!sync && !pte_present(*pte)) {
-		pte_unmap(pte);
-		return NULL;
-	}
-
-	ptl = pte_lockptr(mm, pmd);
-check:
-	spin_lock(ptl);
-	if (pte_present(*pte) && page_to_pfn(page) == pte_pfn(*pte)) {
-		*ptlp = ptl;
-		return pte;
-	}
-	pte_unmap_unlock(pte, ptl);
-	return NULL;
-}
-
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-/*
- * Check that @page is mapped at @address into @mm. In contrast to
- * page_check_address(), this function can handle transparent huge pages.
- *
- * On success returns true with pte mapped and locked. For PMD-mapped
- * transparent huge pages *@ptep is set to NULL.
- */
-bool page_check_address_transhuge(struct page *page, struct mm_struct *mm,
-				  unsigned long address, pmd_t **pmdp,
-				  pte_t **ptep, spinlock_t **ptlp)
-{
-	pgd_t *pgd;
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte;
-	spinlock_t *ptl;
-
-	if (unlikely(PageHuge(page))) {
-		/* when pud is not present, pte will be NULL */
-		pte = huge_pte_offset(mm, address);
-		if (!pte)
-			return false;
-
-		ptl = huge_pte_lockptr(page_hstate(page), mm, pte);
-		pmd = NULL;
-		goto check_pte;
-	}
-
-	pgd = pgd_offset(mm, address);
-	if (!pgd_present(*pgd))
-		return false;
-	pud = pud_offset(pgd, address);
-	if (!pud_present(*pud))
-		return false;
-	pmd = pmd_offset(pud, address);
-
-	if (pmd_trans_huge(*pmd)) {
-		ptl = pmd_lock(mm, pmd);
-		if (!pmd_present(*pmd))
-			goto unlock_pmd;
-		if (unlikely(!pmd_trans_huge(*pmd))) {
-			spin_unlock(ptl);
-			goto map_pte;
-		}
-
-		if (pmd_page(*pmd) != page)
-			goto unlock_pmd;
-
-		pte = NULL;
-		goto found;
-unlock_pmd:
-		spin_unlock(ptl);
-		return false;
-	} else {
-		pmd_t pmde = *pmd;
-
-		barrier();
-		if (!pmd_present(pmde) || pmd_trans_huge(pmde))
-			return false;
-	}
-map_pte:
-	pte = pte_offset_map(pmd, address);
-	if (!pte_present(*pte)) {
-		pte_unmap(pte);
-		return false;
-	}
-
-	ptl = pte_lockptr(mm, pmd);
-check_pte:
-	spin_lock(ptl);
-
-	if (!pte_present(*pte)) {
-		pte_unmap_unlock(pte, ptl);
-		return false;
-	}
-
-	/* THP can be referenced by any subpage */
-	if (pte_pfn(*pte) - page_to_pfn(page) >= hpage_nr_pages(page)) {
-		pte_unmap_unlock(pte, ptl);
-		return false;
-	}
-found:
-	*ptep = pte;
-	*pmdp = pmd;
-	*ptlp = ptl;
-	return true;
-}
-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-
 struct page_referenced_arg {
 	int mapcount;
 	int referenced;
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 12/12] mm: convert remove_migration_pte() to page_check_walk()
  2017-01-24 16:28 ` Kirill A. Shutemov
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov

remove_migration_pte() also can easily be converted to page_check_walk().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/migrate.c | 103 ++++++++++++++++++++++++-----------------------------------
 1 file changed, 41 insertions(+), 62 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 87f4d0f81819..11c9373242e7 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -193,82 +193,61 @@ void putback_movable_pages(struct list_head *l)
 /*
  * Restore a potential migration pte to a working pte entry
  */
-static int remove_migration_pte(struct page *new, struct vm_area_struct *vma,
+static int remove_migration_pte(struct page *page, struct vm_area_struct *vma,
 				 unsigned long addr, void *old)
 {
 	struct mm_struct *mm = vma->vm_mm;
+	struct page_check_walk pcw = {
+		.page = old,
+		.vma = vma,
+		.address = addr,
+		.flags = PAGE_CHECK_WALK_SYNC | PAGE_CHECK_WALK_MIGRATION,
+	};
+	struct page *new;
+	pte_t pte;
 	swp_entry_t entry;
- 	pmd_t *pmd;
-	pte_t *ptep, pte;
- 	spinlock_t *ptl;
 
-	if (unlikely(PageHuge(new))) {
-		ptep = huge_pte_offset(mm, addr);
-		if (!ptep)
-			goto out;
-		ptl = huge_pte_lockptr(hstate_vma(vma), mm, ptep);
-	} else {
-		pmd = mm_find_pmd(mm, addr);
-		if (!pmd)
-			goto out;
+	VM_BUG_ON_PAGE(PageTail(page), page);
+	while (page_check_walk(&pcw)) {
+		new = page - pcw.page->index +
+			linear_page_index(vma, pcw.address);
 
-		ptep = pte_offset_map(pmd, addr);
+		get_page(new);
+		pte = pte_mkold(mk_pte(new, READ_ONCE(vma->vm_page_prot)));
+		if (pte_swp_soft_dirty(*pcw.pte))
+			pte = pte_mksoft_dirty(pte);
 
-		/*
-		 * Peek to check is_swap_pte() before taking ptlock?  No, we
-		 * can race mremap's move_ptes(), which skips anon_vma lock.
-		 */
-
-		ptl = pte_lockptr(mm, pmd);
-	}
-
- 	spin_lock(ptl);
-	pte = *ptep;
-	if (!is_swap_pte(pte))
-		goto unlock;
-
-	entry = pte_to_swp_entry(pte);
-
-	if (!is_migration_entry(entry) ||
-	    migration_entry_to_page(entry) != old)
-		goto unlock;
-
-	get_page(new);
-	pte = pte_mkold(mk_pte(new, READ_ONCE(vma->vm_page_prot)));
-	if (pte_swp_soft_dirty(*ptep))
-		pte = pte_mksoft_dirty(pte);
-
-	/* Recheck VMA as permissions can change since migration started  */
-	if (is_write_migration_entry(entry))
-		pte = maybe_mkwrite(pte, vma);
+		/* Recheck VMA as permissions can change since migration started  */
+		entry = pte_to_swp_entry(*pcw.pte);
+		if (is_write_migration_entry(entry))
+			pte = maybe_mkwrite(pte, vma);
 
 #ifdef CONFIG_HUGETLB_PAGE
-	if (PageHuge(new)) {
-		pte = pte_mkhuge(pte);
-		pte = arch_make_huge_pte(pte, vma, new, 0);
-	}
+		if (PageHuge(new)) {
+			pte = pte_mkhuge(pte);
+			pte = arch_make_huge_pte(pte, vma, new, 0);
+		}
 #endif
-	flush_dcache_page(new);
-	set_pte_at(mm, addr, ptep, pte);
+		flush_dcache_page(new);
+		set_pte_at(mm, pcw.address, pcw.pte, pte);
 
-	if (PageHuge(new)) {
-		if (PageAnon(new))
-			hugepage_add_anon_rmap(new, vma, addr);
+		if (PageHuge(new)) {
+			if (PageAnon(new))
+				hugepage_add_anon_rmap(new, vma, pcw.address);
+			else
+				page_dup_rmap(new, true);
+		} else if (PageAnon(new))
+			page_add_anon_rmap(new, vma, pcw.address, false);
 		else
-			page_dup_rmap(new, true);
-	} else if (PageAnon(new))
-		page_add_anon_rmap(new, vma, addr, false);
-	else
-		page_add_file_rmap(new, false);
+			page_add_file_rmap(new, false);
 
-	if (vma->vm_flags & VM_LOCKED && !PageTransCompound(new))
-		mlock_vma_page(new);
+		if (vma->vm_flags & VM_LOCKED && !PageTransCompound(new))
+			mlock_vma_page(new);
+
+		/* No need to invalidate - it was non-present before */
+		update_mmu_cache(vma, pcw.address, pcw.pte);
+	}
 
-	/* No need to invalidate - it was non-present before */
-	update_mmu_cache(vma, addr, ptep);
-unlock:
-	pte_unmap_unlock(ptep, ptl);
-out:
 	return SWAP_AGAIN;
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 12/12] mm: convert remove_migration_pte() to page_check_walk()
@ 2017-01-24 16:28   ` Kirill A. Shutemov
  0 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 16:28 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov

remove_migration_pte() also can easily be converted to page_check_walk().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/migrate.c | 103 ++++++++++++++++++++++++-----------------------------------
 1 file changed, 41 insertions(+), 62 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 87f4d0f81819..11c9373242e7 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -193,82 +193,61 @@ void putback_movable_pages(struct list_head *l)
 /*
  * Restore a potential migration pte to a working pte entry
  */
-static int remove_migration_pte(struct page *new, struct vm_area_struct *vma,
+static int remove_migration_pte(struct page *page, struct vm_area_struct *vma,
 				 unsigned long addr, void *old)
 {
 	struct mm_struct *mm = vma->vm_mm;
+	struct page_check_walk pcw = {
+		.page = old,
+		.vma = vma,
+		.address = addr,
+		.flags = PAGE_CHECK_WALK_SYNC | PAGE_CHECK_WALK_MIGRATION,
+	};
+	struct page *new;
+	pte_t pte;
 	swp_entry_t entry;
- 	pmd_t *pmd;
-	pte_t *ptep, pte;
- 	spinlock_t *ptl;
 
-	if (unlikely(PageHuge(new))) {
-		ptep = huge_pte_offset(mm, addr);
-		if (!ptep)
-			goto out;
-		ptl = huge_pte_lockptr(hstate_vma(vma), mm, ptep);
-	} else {
-		pmd = mm_find_pmd(mm, addr);
-		if (!pmd)
-			goto out;
+	VM_BUG_ON_PAGE(PageTail(page), page);
+	while (page_check_walk(&pcw)) {
+		new = page - pcw.page->index +
+			linear_page_index(vma, pcw.address);
 
-		ptep = pte_offset_map(pmd, addr);
+		get_page(new);
+		pte = pte_mkold(mk_pte(new, READ_ONCE(vma->vm_page_prot)));
+		if (pte_swp_soft_dirty(*pcw.pte))
+			pte = pte_mksoft_dirty(pte);
 
-		/*
-		 * Peek to check is_swap_pte() before taking ptlock?  No, we
-		 * can race mremap's move_ptes(), which skips anon_vma lock.
-		 */
-
-		ptl = pte_lockptr(mm, pmd);
-	}
-
- 	spin_lock(ptl);
-	pte = *ptep;
-	if (!is_swap_pte(pte))
-		goto unlock;
-
-	entry = pte_to_swp_entry(pte);
-
-	if (!is_migration_entry(entry) ||
-	    migration_entry_to_page(entry) != old)
-		goto unlock;
-
-	get_page(new);
-	pte = pte_mkold(mk_pte(new, READ_ONCE(vma->vm_page_prot)));
-	if (pte_swp_soft_dirty(*ptep))
-		pte = pte_mksoft_dirty(pte);
-
-	/* Recheck VMA as permissions can change since migration started  */
-	if (is_write_migration_entry(entry))
-		pte = maybe_mkwrite(pte, vma);
+		/* Recheck VMA as permissions can change since migration started  */
+		entry = pte_to_swp_entry(*pcw.pte);
+		if (is_write_migration_entry(entry))
+			pte = maybe_mkwrite(pte, vma);
 
 #ifdef CONFIG_HUGETLB_PAGE
-	if (PageHuge(new)) {
-		pte = pte_mkhuge(pte);
-		pte = arch_make_huge_pte(pte, vma, new, 0);
-	}
+		if (PageHuge(new)) {
+			pte = pte_mkhuge(pte);
+			pte = arch_make_huge_pte(pte, vma, new, 0);
+		}
 #endif
-	flush_dcache_page(new);
-	set_pte_at(mm, addr, ptep, pte);
+		flush_dcache_page(new);
+		set_pte_at(mm, pcw.address, pcw.pte, pte);
 
-	if (PageHuge(new)) {
-		if (PageAnon(new))
-			hugepage_add_anon_rmap(new, vma, addr);
+		if (PageHuge(new)) {
+			if (PageAnon(new))
+				hugepage_add_anon_rmap(new, vma, pcw.address);
+			else
+				page_dup_rmap(new, true);
+		} else if (PageAnon(new))
+			page_add_anon_rmap(new, vma, pcw.address, false);
 		else
-			page_dup_rmap(new, true);
-	} else if (PageAnon(new))
-		page_add_anon_rmap(new, vma, addr, false);
-	else
-		page_add_file_rmap(new, false);
+			page_add_file_rmap(new, false);
 
-	if (vma->vm_flags & VM_LOCKED && !PageTransCompound(new))
-		mlock_vma_page(new);
+		if (vma->vm_flags & VM_LOCKED && !PageTransCompound(new))
+			mlock_vma_page(new);
+
+		/* No need to invalidate - it was non-present before */
+		update_mmu_cache(vma, pcw.address, pcw.pte);
+	}
 
-	/* No need to invalidate - it was non-present before */
-	update_mmu_cache(vma, addr, ptep);
-unlock:
-	pte_unmap_unlock(ptep, ptl);
-out:
 	return SWAP_AGAIN;
 }
 
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
  2017-01-24 16:28   ` Kirill A. Shutemov
@ 2017-01-24 18:08     ` Rik van Riel
  -1 siblings, 0 replies; 71+ messages in thread
From: Rik van Riel @ 2017-01-24 18:08 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrea Arcangeli, Hugh Dickins, Andrew Morton
  Cc: linux-mm, linux-kernel, Oleg Nesterov, Peter Zijlstra

On Tue, 2017-01-24 at 19:28 +0300, Kirill A. Shutemov wrote:
> For THPs page_check_address() always fails. It's better to split them
> first before trying to replace.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>

Acked-by: Rik van Riel <riel@redhat.com>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
@ 2017-01-24 18:08     ` Rik van Riel
  0 siblings, 0 replies; 71+ messages in thread
From: Rik van Riel @ 2017-01-24 18:08 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrea Arcangeli, Hugh Dickins, Andrew Morton
  Cc: linux-mm, linux-kernel, Oleg Nesterov, Peter Zijlstra

On Tue, 2017-01-24 at 19:28 +0300, Kirill A. Shutemov wrote:
> For THPs page_check_address() always fails. It's better to split them
> first before trying to replace.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>

Acked-by: Rik van Riel <riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
  2017-01-24 16:28   ` Kirill A. Shutemov
@ 2017-01-24 21:28     ` Andrew Morton
  -1 siblings, 0 replies; 71+ messages in thread
From: Andrew Morton @ 2017-01-24 21:28 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Hugh Dickins, Rik van Riel, linux-mm,
	linux-kernel, Oleg Nesterov, Peter Zijlstra

On Tue, 24 Jan 2017 19:28:13 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> For THPs page_check_address() always fails. It's better to split them
> first before trying to replace.

So what does this mean.  uprobes simply fails to work when trying to
place a probe into a THP memory region?  How come nobody noticed (and
reported) this when using the feature?

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
@ 2017-01-24 21:28     ` Andrew Morton
  0 siblings, 0 replies; 71+ messages in thread
From: Andrew Morton @ 2017-01-24 21:28 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Hugh Dickins, Rik van Riel, linux-mm,
	linux-kernel, Oleg Nesterov, Peter Zijlstra

On Tue, 24 Jan 2017 19:28:13 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> For THPs page_check_address() always fails. It's better to split them
> first before trying to replace.

So what does this mean.  uprobes simply fails to work when trying to
place a probe into a THP memory region?  How come nobody noticed (and
reported) this when using the feature?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 02/12] mm: introduce page_check_walk()
  2017-01-24 16:28   ` Kirill A. Shutemov
@ 2017-01-24 21:41     ` Andrew Morton
  -1 siblings, 0 replies; 71+ messages in thread
From: Andrew Morton @ 2017-01-24 21:41 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Hugh Dickins, Rik van Riel, linux-mm, linux-kernel

On Tue, 24 Jan 2017 19:28:14 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> The patch introduce new interface to check if a page is mapped into a vma.
> It aims to address shortcomings of page_check_address{,_transhuge}.
> 
> Existing interface is not able to handle PTE-mapped THPs: it only finds
> the first PTE. The rest lefted unnoticed.
> 
> page_check_walk() iterates over all possible mapping of the page in the
> vma.

I really don't like the name page_check_walk().  "check" could mean any
damn thing.  Something like page_vma_mapped_walk() has meaning.  We
could omit the "_walk" for brevity.


> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  include/linux/rmap.h |  65 ++++++++++++++++++++++
>  mm/Makefile          |   6 ++-
>  mm/huge_memory.c     |   9 ++--
>  mm/page_check.c      | 148 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 223 insertions(+), 5 deletions(-)
>  create mode 100644 mm/page_check.c
> 
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 15321fb1df6b..474279810742 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -232,6 +232,71 @@ static inline bool page_check_address_transhuge(struct page *page,
>  }
>  #endif
>  
> +/* Avoid racy checks */
> +#define PAGE_CHECK_WALK_SYNC		(1 << 0)
> +/* Look for migarion entries rather than present ptes */
> +#define PAGE_CHECK_WALK_MIGRATION	(1 << 1)
> +
> +struct page_check_walk {
> +	struct page *page;
> +	struct vm_area_struct *vma;
> +	unsigned long address;
> +	pmd_t *pmd;
> +	pte_t *pte;
> +	spinlock_t *ptl;
> +	unsigned int flags;
> +};

One thing which I don't think was documented is that it is the caller's
responsibility to initialize this appropriately before calling
page_check_walk().  At least, .pte and .ptl must be NULL, for
page_check_walk_done().

> +static inline void page_check_walk_done(struct page_check_walk *pcw)
> +{
> +	if (pcw->pte)
> +		pte_unmap(pcw->pte);
> +	if (pcw->ptl)
> +		spin_unlock(pcw->ptl);
> +}
> +
> +bool __page_check_walk(struct page_check_walk *pcw);
> +
> +/**
> + * page_check_walk - check if @pcw->page is mapped in @pcw->vma at @pcw->address
> + * @pcw: pointer to struce page_check_walk. page, vma and address must be set.

"struct"

> + *
> + * Returns true, if the page is mapped in the vma. @pcw->pmd and @pcw->pte point

"Returns true if"

> + * to relevant page table entries. @pcw->ptl is locked. @pcw->address is
> + * adjusted if needed (for PTE-mapped THPs).
> + *
> + * If @pcw->pmd is set, but @pcw->pte is not, you have found PMD-mapped page

"is set but"

> + * (usually THP). For PTE-mapped THP, you should run page_check_walk() in 
> + * a loop to find all PTEs that maps the THP.

"that map"

> + *
> + * For HugeTLB pages, @pcw->pte is set to relevant page table entry regardless

"set to the relevant", "regardless of"

> + * which page table level the page mapped at. @pcw->pmd is NULL.

"the page is"

> + *
> + * Retruns false, if there's no more page table entries for the page in the vma.

"Returns false if there are"

> + * @pcw->ptl is unlocked and @pcw->pte is unmapped.
> + *
> + * If you need to stop the walk before page_check_walk() returned false, use
> + * page_check_walk_done(). It will do the housekeeping.
> + */
> +static inline bool page_check_walk(struct page_check_walk *pcw)
> +{
> +	/* The only possible pmd mapping has been handled on last iteration */
> +	if (pcw->pmd && !pcw->pte) {
> +		page_check_walk_done(pcw);
> +		return false;
> +	}
> +
> +	/* Only for THP, seek to next pte entry makes sense */
> +	if (pcw->pte) {
> +		if (!PageTransHuge(pcw->page) || PageHuge(pcw->page)) {
> +			page_check_walk_done(pcw);
> +			return false;
> +		}
> +	}
> +
> +	return __page_check_walk(pcw);
> +}

Was the decision to inline this a correct one?

> --- /dev/null
> +++ b/mm/page_check.c
> @@ -0,0 +1,148 @@
> +#include <linux/mm.h>
> +#include <linux/rmap.h>
> +#include <linux/hugetlb.h>
> +#include <linux/swap.h>
> +#include <linux/swapops.h>
> +
> +#include "internal.h"
> +
> +static inline bool check_pmd(struct page_check_walk *pcw)
> +{
> +	pmd_t pmde = *pcw->pmd;
> +	barrier();
> +	return pmd_present(pmde) && !pmd_trans_huge(pmde);
> +}

Can we please have a comment explaining what the barrier() does?

> +static inline bool not_found(struct page_check_walk *pcw)
> +{
> +	page_check_walk_done(pcw);
> +	return false;
> +}
> +
> +static inline bool map_pte(struct page_check_walk *pcw)
> +{
> +	pcw->pte = pte_offset_map(pcw->pmd, pcw->address);
> +	if (!(pcw->flags & PAGE_CHECK_WALK_SYNC)) {
> +		if (pcw->flags & PAGE_CHECK_WALK_MIGRATION) {
> +			if (!is_swap_pte(*pcw->pte))
> +				return false;
> +		} else {
> +			if (!pte_present(*pcw->pte))
> +				return false;
> +		}
> +	}
> +	pcw->ptl = pte_lockptr(pcw->vma->vm_mm, pcw->pmd);
> +	spin_lock(pcw->ptl);
> +	return true;
> +}

The compiler will just ignore all these "inline" statements.

> +static inline bool check_pte(struct page_check_walk *pcw)
> +{
> +	if (pcw->flags & PAGE_CHECK_WALK_MIGRATION) {
> +		swp_entry_t entry;
> +		if (!is_swap_pte(*pcw->pte))
> +			return false;
> +		entry = pte_to_swp_entry(*pcw->pte);
> +		if (!is_migration_entry(entry))
> +			return false;
> +		if (migration_entry_to_page(entry) - pcw->page >=
> +				hpage_nr_pages(pcw->page)) {
> +			return false;
> +		}
> +		if (migration_entry_to_page(entry) < pcw->page)
> +			return false;
> +	} else {
> +		if (!pte_present(*pcw->pte))
> +			return false;
> +
> +		/* THP can be referenced by any subpage */
> +		if (pte_page(*pcw->pte) - pcw->page >=
> +				hpage_nr_pages(pcw->page)) {
> +			return false;
> +		}
> +		if (pte_page(*pcw->pte) < pcw->page)
> +			return false;
> +	}
> +
> +	return true;
> +}

Thankfully, because inlining this one does seem inappropriate - it's
enormous!

> +bool __page_check_walk(struct page_check_walk *pcw)
> +{
> +	struct mm_struct *mm = pcw->vma->vm_mm;
> +	struct page *page = pcw->page;
> +	pgd_t *pgd;
> +	pud_t *pud;
> +
> +	/* For THP, seek to next pte entry */
> +	if (pcw->pte)
> +		goto next_pte;
> +
> +	if (unlikely(PageHuge(pcw->page))) {
> +		/* when pud is not present, pte will be NULL */
> +		pcw->pte = huge_pte_offset(mm, pcw->address);
> +		if (!pcw->pte)
> +			return false;
> +
> +		pcw->ptl = huge_pte_lockptr(page_hstate(page), mm, pcw->pte);
> +		spin_lock(pcw->ptl);
> +		if (!check_pte(pcw))
> +			return not_found(pcw);
> +		return true;
> +	}
> +restart:
> +	pgd = pgd_offset(mm, pcw->address);
> +	if (!pgd_present(*pgd))
> +		return false;
> +	pud = pud_offset(pgd, pcw->address);
> +	if (!pud_present(*pud))
> +		return false;
> +	pcw->pmd = pmd_offset(pud, pcw->address);
> +	if (pmd_trans_huge(*pcw->pmd)) {
> +		pcw->ptl = pmd_lock(mm, pcw->pmd);
> +		if (!pmd_present(*pcw->pmd))
> +			return not_found(pcw);
> +		if (likely(pmd_trans_huge(*pcw->pmd))) {
> +			if (pcw->flags & PAGE_CHECK_WALK_MIGRATION)
> +				return not_found(pcw);
> +			if (pmd_page(*pcw->pmd) != page)
> +				return not_found(pcw);
> +			return true;
> +		} else {
> +			/* THP pmd was split under us: handle on pte level */
> +			spin_unlock(pcw->ptl);
> +			pcw->ptl = NULL;
> +		}
> +	} else {
> +		if (!check_pmd(pcw))
> +			return false;
> +	}
> +	if (!map_pte(pcw))
> +		goto next_pte;
> +	while (1) {
> +		if (check_pte(pcw))
> +			return true;
> +next_pte:	do {
> +			pcw->address += PAGE_SIZE;
> +			if (pcw->address >= __vma_address(pcw->page, pcw->vma) +
> +					hpage_nr_pages(pcw->page) * PAGE_SIZE)
> +				return not_found(pcw);
> +			/* Did we cross page table boundary? */
> +			if (pcw->address % PMD_SIZE == 0) {
> +				pte_unmap(pcw->pte);
> +				if (pcw->ptl) {
> +					spin_unlock(pcw->ptl);
> +					pcw->ptl = NULL;
> +				}
> +				goto restart;
> +			} else {
> +				pcw->pte++;
> +			}
> +		} while (pte_none(*pcw->pte));
> +
> +		if (!pcw->ptl) {
> +			pcw->ptl = pte_lockptr(mm, pcw->pmd);
> +			spin_lock(pcw->ptl);
> +		}
> +	}
> +}

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 02/12] mm: introduce page_check_walk()
@ 2017-01-24 21:41     ` Andrew Morton
  0 siblings, 0 replies; 71+ messages in thread
From: Andrew Morton @ 2017-01-24 21:41 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Hugh Dickins, Rik van Riel, linux-mm, linux-kernel

On Tue, 24 Jan 2017 19:28:14 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> The patch introduce new interface to check if a page is mapped into a vma.
> It aims to address shortcomings of page_check_address{,_transhuge}.
> 
> Existing interface is not able to handle PTE-mapped THPs: it only finds
> the first PTE. The rest lefted unnoticed.
> 
> page_check_walk() iterates over all possible mapping of the page in the
> vma.

I really don't like the name page_check_walk().  "check" could mean any
damn thing.  Something like page_vma_mapped_walk() has meaning.  We
could omit the "_walk" for brevity.


> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  include/linux/rmap.h |  65 ++++++++++++++++++++++
>  mm/Makefile          |   6 ++-
>  mm/huge_memory.c     |   9 ++--
>  mm/page_check.c      | 148 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 223 insertions(+), 5 deletions(-)
>  create mode 100644 mm/page_check.c
> 
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 15321fb1df6b..474279810742 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -232,6 +232,71 @@ static inline bool page_check_address_transhuge(struct page *page,
>  }
>  #endif
>  
> +/* Avoid racy checks */
> +#define PAGE_CHECK_WALK_SYNC		(1 << 0)
> +/* Look for migarion entries rather than present ptes */
> +#define PAGE_CHECK_WALK_MIGRATION	(1 << 1)
> +
> +struct page_check_walk {
> +	struct page *page;
> +	struct vm_area_struct *vma;
> +	unsigned long address;
> +	pmd_t *pmd;
> +	pte_t *pte;
> +	spinlock_t *ptl;
> +	unsigned int flags;
> +};

One thing which I don't think was documented is that it is the caller's
responsibility to initialize this appropriately before calling
page_check_walk().  At least, .pte and .ptl must be NULL, for
page_check_walk_done().

> +static inline void page_check_walk_done(struct page_check_walk *pcw)
> +{
> +	if (pcw->pte)
> +		pte_unmap(pcw->pte);
> +	if (pcw->ptl)
> +		spin_unlock(pcw->ptl);
> +}
> +
> +bool __page_check_walk(struct page_check_walk *pcw);
> +
> +/**
> + * page_check_walk - check if @pcw->page is mapped in @pcw->vma at @pcw->address
> + * @pcw: pointer to struce page_check_walk. page, vma and address must be set.

"struct"

> + *
> + * Returns true, if the page is mapped in the vma. @pcw->pmd and @pcw->pte point

"Returns true if"

> + * to relevant page table entries. @pcw->ptl is locked. @pcw->address is
> + * adjusted if needed (for PTE-mapped THPs).
> + *
> + * If @pcw->pmd is set, but @pcw->pte is not, you have found PMD-mapped page

"is set but"

> + * (usually THP). For PTE-mapped THP, you should run page_check_walk() in 
> + * a loop to find all PTEs that maps the THP.

"that map"

> + *
> + * For HugeTLB pages, @pcw->pte is set to relevant page table entry regardless

"set to the relevant", "regardless of"

> + * which page table level the page mapped at. @pcw->pmd is NULL.

"the page is"

> + *
> + * Retruns false, if there's no more page table entries for the page in the vma.

"Returns false if there are"

> + * @pcw->ptl is unlocked and @pcw->pte is unmapped.
> + *
> + * If you need to stop the walk before page_check_walk() returned false, use
> + * page_check_walk_done(). It will do the housekeeping.
> + */
> +static inline bool page_check_walk(struct page_check_walk *pcw)
> +{
> +	/* The only possible pmd mapping has been handled on last iteration */
> +	if (pcw->pmd && !pcw->pte) {
> +		page_check_walk_done(pcw);
> +		return false;
> +	}
> +
> +	/* Only for THP, seek to next pte entry makes sense */
> +	if (pcw->pte) {
> +		if (!PageTransHuge(pcw->page) || PageHuge(pcw->page)) {
> +			page_check_walk_done(pcw);
> +			return false;
> +		}
> +	}
> +
> +	return __page_check_walk(pcw);
> +}

Was the decision to inline this a correct one?

> --- /dev/null
> +++ b/mm/page_check.c
> @@ -0,0 +1,148 @@
> +#include <linux/mm.h>
> +#include <linux/rmap.h>
> +#include <linux/hugetlb.h>
> +#include <linux/swap.h>
> +#include <linux/swapops.h>
> +
> +#include "internal.h"
> +
> +static inline bool check_pmd(struct page_check_walk *pcw)
> +{
> +	pmd_t pmde = *pcw->pmd;
> +	barrier();
> +	return pmd_present(pmde) && !pmd_trans_huge(pmde);
> +}

Can we please have a comment explaining what the barrier() does?

> +static inline bool not_found(struct page_check_walk *pcw)
> +{
> +	page_check_walk_done(pcw);
> +	return false;
> +}
> +
> +static inline bool map_pte(struct page_check_walk *pcw)
> +{
> +	pcw->pte = pte_offset_map(pcw->pmd, pcw->address);
> +	if (!(pcw->flags & PAGE_CHECK_WALK_SYNC)) {
> +		if (pcw->flags & PAGE_CHECK_WALK_MIGRATION) {
> +			if (!is_swap_pte(*pcw->pte))
> +				return false;
> +		} else {
> +			if (!pte_present(*pcw->pte))
> +				return false;
> +		}
> +	}
> +	pcw->ptl = pte_lockptr(pcw->vma->vm_mm, pcw->pmd);
> +	spin_lock(pcw->ptl);
> +	return true;
> +}

The compiler will just ignore all these "inline" statements.

> +static inline bool check_pte(struct page_check_walk *pcw)
> +{
> +	if (pcw->flags & PAGE_CHECK_WALK_MIGRATION) {
> +		swp_entry_t entry;
> +		if (!is_swap_pte(*pcw->pte))
> +			return false;
> +		entry = pte_to_swp_entry(*pcw->pte);
> +		if (!is_migration_entry(entry))
> +			return false;
> +		if (migration_entry_to_page(entry) - pcw->page >=
> +				hpage_nr_pages(pcw->page)) {
> +			return false;
> +		}
> +		if (migration_entry_to_page(entry) < pcw->page)
> +			return false;
> +	} else {
> +		if (!pte_present(*pcw->pte))
> +			return false;
> +
> +		/* THP can be referenced by any subpage */
> +		if (pte_page(*pcw->pte) - pcw->page >=
> +				hpage_nr_pages(pcw->page)) {
> +			return false;
> +		}
> +		if (pte_page(*pcw->pte) < pcw->page)
> +			return false;
> +	}
> +
> +	return true;
> +}

Thankfully, because inlining this one does seem inappropriate - it's
enormous!

> +bool __page_check_walk(struct page_check_walk *pcw)
> +{
> +	struct mm_struct *mm = pcw->vma->vm_mm;
> +	struct page *page = pcw->page;
> +	pgd_t *pgd;
> +	pud_t *pud;
> +
> +	/* For THP, seek to next pte entry */
> +	if (pcw->pte)
> +		goto next_pte;
> +
> +	if (unlikely(PageHuge(pcw->page))) {
> +		/* when pud is not present, pte will be NULL */
> +		pcw->pte = huge_pte_offset(mm, pcw->address);
> +		if (!pcw->pte)
> +			return false;
> +
> +		pcw->ptl = huge_pte_lockptr(page_hstate(page), mm, pcw->pte);
> +		spin_lock(pcw->ptl);
> +		if (!check_pte(pcw))
> +			return not_found(pcw);
> +		return true;
> +	}
> +restart:
> +	pgd = pgd_offset(mm, pcw->address);
> +	if (!pgd_present(*pgd))
> +		return false;
> +	pud = pud_offset(pgd, pcw->address);
> +	if (!pud_present(*pud))
> +		return false;
> +	pcw->pmd = pmd_offset(pud, pcw->address);
> +	if (pmd_trans_huge(*pcw->pmd)) {
> +		pcw->ptl = pmd_lock(mm, pcw->pmd);
> +		if (!pmd_present(*pcw->pmd))
> +			return not_found(pcw);
> +		if (likely(pmd_trans_huge(*pcw->pmd))) {
> +			if (pcw->flags & PAGE_CHECK_WALK_MIGRATION)
> +				return not_found(pcw);
> +			if (pmd_page(*pcw->pmd) != page)
> +				return not_found(pcw);
> +			return true;
> +		} else {
> +			/* THP pmd was split under us: handle on pte level */
> +			spin_unlock(pcw->ptl);
> +			pcw->ptl = NULL;
> +		}
> +	} else {
> +		if (!check_pmd(pcw))
> +			return false;
> +	}
> +	if (!map_pte(pcw))
> +		goto next_pte;
> +	while (1) {
> +		if (check_pte(pcw))
> +			return true;
> +next_pte:	do {
> +			pcw->address += PAGE_SIZE;
> +			if (pcw->address >= __vma_address(pcw->page, pcw->vma) +
> +					hpage_nr_pages(pcw->page) * PAGE_SIZE)
> +				return not_found(pcw);
> +			/* Did we cross page table boundary? */
> +			if (pcw->address % PMD_SIZE == 0) {
> +				pte_unmap(pcw->pte);
> +				if (pcw->ptl) {
> +					spin_unlock(pcw->ptl);
> +					pcw->ptl = NULL;
> +				}
> +				goto restart;
> +			} else {
> +				pcw->pte++;
> +			}
> +		} while (pte_none(*pcw->pte));
> +
> +		if (!pcw->ptl) {
> +			pcw->ptl = pte_lockptr(mm, pcw->pmd);
> +			spin_lock(pcw->ptl);
> +		}
> +	}
> +}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
  2017-01-24 21:28     ` Andrew Morton
@ 2017-01-24 22:22       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 22:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	linux-mm, linux-kernel, Oleg Nesterov, Peter Zijlstra

On Tue, Jan 24, 2017 at 01:28:49PM -0800, Andrew Morton wrote:
> On Tue, 24 Jan 2017 19:28:13 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> 
> > For THPs page_check_address() always fails. It's better to split them
> > first before trying to replace.
> 
> So what does this mean.  uprobes simply fails to work when trying to
> place a probe into a THP memory region?

Looks like we can end up with endless retry loop in uprobe_write_opcode().

> How come nobody noticed (and reported) this when using the feature?

I guess it's not often used for anon memory.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
@ 2017-01-24 22:22       ` Kirill A. Shutemov
  0 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 22:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	linux-mm, linux-kernel, Oleg Nesterov, Peter Zijlstra

On Tue, Jan 24, 2017 at 01:28:49PM -0800, Andrew Morton wrote:
> On Tue, 24 Jan 2017 19:28:13 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> 
> > For THPs page_check_address() always fails. It's better to split them
> > first before trying to replace.
> 
> So what does this mean.  uprobes simply fails to work when trying to
> place a probe into a THP memory region?

Looks like we can end up with endless retry loop in uprobe_write_opcode().

> How come nobody noticed (and reported) this when using the feature?

I guess it's not often used for anon memory.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
  2017-01-24 22:22       ` Kirill A. Shutemov
@ 2017-01-24 22:35         ` Andrew Morton
  -1 siblings, 0 replies; 71+ messages in thread
From: Andrew Morton @ 2017-01-24 22:35 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	linux-mm, linux-kernel, Oleg Nesterov, Peter Zijlstra

On Wed, 25 Jan 2017 01:22:17 +0300 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Tue, Jan 24, 2017 at 01:28:49PM -0800, Andrew Morton wrote:
> > On Tue, 24 Jan 2017 19:28:13 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> > 
> > > For THPs page_check_address() always fails. It's better to split them
> > > first before trying to replace.
> > 
> > So what does this mean.  uprobes simply fails to work when trying to
> > place a probe into a THP memory region?
> 
> Looks like we can end up with endless retry loop in uprobe_write_opcode().
> 
> > How come nobody noticed (and reported) this when using the feature?
> 
> I guess it's not often used for anon memory.

OK,  can we please include discussion of these things in the changelog?

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
@ 2017-01-24 22:35         ` Andrew Morton
  0 siblings, 0 replies; 71+ messages in thread
From: Andrew Morton @ 2017-01-24 22:35 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	linux-mm, linux-kernel, Oleg Nesterov, Peter Zijlstra

On Wed, 25 Jan 2017 01:22:17 +0300 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Tue, Jan 24, 2017 at 01:28:49PM -0800, Andrew Morton wrote:
> > On Tue, 24 Jan 2017 19:28:13 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> > 
> > > For THPs page_check_address() always fails. It's better to split them
> > > first before trying to replace.
> > 
> > So what does this mean.  uprobes simply fails to work when trying to
> > place a probe into a THP memory region?
> 
> Looks like we can end up with endless retry loop in uprobe_write_opcode().
> 
> > How come nobody noticed (and reported) this when using the feature?
> 
> I guess it's not often used for anon memory.

OK,  can we please include discussion of these things in the changelog?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 02/12] mm: introduce page_check_walk()
  2017-01-24 21:41     ` Andrew Morton
@ 2017-01-24 22:50       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 22:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	linux-mm, linux-kernel

On Tue, Jan 24, 2017 at 01:41:22PM -0800, Andrew Morton wrote:
> On Tue, 24 Jan 2017 19:28:14 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> 
> > The patch introduce new interface to check if a page is mapped into a vma.
> > It aims to address shortcomings of page_check_address{,_transhuge}.
> > 
> > Existing interface is not able to handle PTE-mapped THPs: it only finds
> > the first PTE. The rest lefted unnoticed.
> > 
> > page_check_walk() iterates over all possible mapping of the page in the
> > vma.
> 
> I really don't like the name page_check_walk().  "check" could mean any
> damn thing.  Something like page_vma_mapped_walk() has meaning.  We
> could omit the "_walk" for brevity.

page_vma_mapped() would sound as predicate.
I'll rename to page_vma_mapped_walk().

> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  include/linux/rmap.h |  65 ++++++++++++++++++++++
> >  mm/Makefile          |   6 ++-
> >  mm/huge_memory.c     |   9 ++--
> >  mm/page_check.c      | 148 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 223 insertions(+), 5 deletions(-)
> >  create mode 100644 mm/page_check.c
> > 
> > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> > index 15321fb1df6b..474279810742 100644
> > --- a/include/linux/rmap.h
> > +++ b/include/linux/rmap.h
> > @@ -232,6 +232,71 @@ static inline bool page_check_address_transhuge(struct page *page,
> >  }
> >  #endif
> >  
> > +/* Avoid racy checks */
> > +#define PAGE_CHECK_WALK_SYNC		(1 << 0)
> > +/* Look for migarion entries rather than present ptes */
> > +#define PAGE_CHECK_WALK_MIGRATION	(1 << 1)
> > +
> > +struct page_check_walk {
> > +	struct page *page;
> > +	struct vm_area_struct *vma;
> > +	unsigned long address;
> > +	pmd_t *pmd;
> > +	pte_t *pte;
> > +	spinlock_t *ptl;
> > +	unsigned int flags;
> > +};
> 
> One thing which I don't think was documented is that it is the caller's
> responsibility to initialize this appropriately before calling
> page_check_walk().  At least, .pte and .ptl must be NULL, for
> page_check_walk_done().

Okay, I'll write it down.

> > +static inline void page_check_walk_done(struct page_check_walk *pcw)
> > +{
> > +	if (pcw->pte)
> > +		pte_unmap(pcw->pte);
> > +	if (pcw->ptl)
> > +		spin_unlock(pcw->ptl);
> > +}
> > +
> > +bool __page_check_walk(struct page_check_walk *pcw);
> > +
> > +/**
> > + * page_check_walk - check if @pcw->page is mapped in @pcw->vma at @pcw->address
> > + * @pcw: pointer to struce page_check_walk. page, vma and address must be set.
> 
> "struct"
> 
> > + *
> > + * Returns true, if the page is mapped in the vma. @pcw->pmd and @pcw->pte point
> 
> "Returns true if"
> 
> > + * to relevant page table entries. @pcw->ptl is locked. @pcw->address is
> > + * adjusted if needed (for PTE-mapped THPs).
> > + *
> > + * If @pcw->pmd is set, but @pcw->pte is not, you have found PMD-mapped page
> 
> "is set but"
> 
> > + * (usually THP). For PTE-mapped THP, you should run page_check_walk() in 
> > + * a loop to find all PTEs that maps the THP.
> 
> "that map"
> 
> > + *
> > + * For HugeTLB pages, @pcw->pte is set to relevant page table entry regardless
> 
> "set to the relevant", "regardless of"
> 
> > + * which page table level the page mapped at. @pcw->pmd is NULL.
> 
> "the page is"
> 
> > + *
> > + * Retruns false, if there's no more page table entries for the page in the vma.
> 
> "Returns false if there are"

Thanks for corrections. Applied.

> > + * @pcw->ptl is unlocked and @pcw->pte is unmapped.
> > + *
> > + * If you need to stop the walk before page_check_walk() returned false, use
> > + * page_check_walk_done(). It will do the housekeeping.
> > + */
> > +static inline bool page_check_walk(struct page_check_walk *pcw)
> > +{
> > +	/* The only possible pmd mapping has been handled on last iteration */
> > +	if (pcw->pmd && !pcw->pte) {
> > +		page_check_walk_done(pcw);
> > +		return false;
> > +	}
> > +
> > +	/* Only for THP, seek to next pte entry makes sense */
> > +	if (pcw->pte) {
> > +		if (!PageTransHuge(pcw->page) || PageHuge(pcw->page)) {
> > +			page_check_walk_done(pcw);
> > +			return false;
> > +		}
> > +	}
> > +
> > +	return __page_check_walk(pcw);
> > +}
> 
> Was the decision to inline this a correct one?

Well, my logic was that in most cases we would have exactly one iteration.
The only case when we need more than one iteration is PTE-mapped THP which
is rare.
I hoped to avoid additional function call. Not sure if it worth it.

Should I move it inside the function?

> > --- /dev/null
> > +++ b/mm/page_check.c
> > @@ -0,0 +1,148 @@
> > +#include <linux/mm.h>
> > +#include <linux/rmap.h>
> > +#include <linux/hugetlb.h>
> > +#include <linux/swap.h>
> > +#include <linux/swapops.h>
> > +
> > +#include "internal.h"
> > +
> > +static inline bool check_pmd(struct page_check_walk *pcw)
> > +{
> > +	pmd_t pmde = *pcw->pmd;
> > +	barrier();
> > +	return pmd_present(pmde) && !pmd_trans_huge(pmde);
> > +}
> 
> Can we please have a comment explaining what the barrier() does?

I copied it from page_check_address_transhuge().

I think we can get aways with READ_ONCE() instead.
I'll add a comment.

> > +static inline bool not_found(struct page_check_walk *pcw)
> > +{
> > +	page_check_walk_done(pcw);
> > +	return false;
> > +}
> > +
> > +static inline bool map_pte(struct page_check_walk *pcw)
> > +{
> > +	pcw->pte = pte_offset_map(pcw->pmd, pcw->address);
> > +	if (!(pcw->flags & PAGE_CHECK_WALK_SYNC)) {
> > +		if (pcw->flags & PAGE_CHECK_WALK_MIGRATION) {
> > +			if (!is_swap_pte(*pcw->pte))
> > +				return false;
> > +		} else {
> > +			if (!pte_present(*pcw->pte))
> > +				return false;
> > +		}
> > +	}
> > +	pcw->ptl = pte_lockptr(pcw->vma->vm_mm, pcw->pmd);
> > +	spin_lock(pcw->ptl);
> > +	return true;
> > +}
> 
> The compiler will just ignore all these "inline" statements.

These helpers grew during development. I forgot to drop "inline".

> > +static inline bool check_pte(struct page_check_walk *pcw)
> > +{
> > +	if (pcw->flags & PAGE_CHECK_WALK_MIGRATION) {
> > +		swp_entry_t entry;
> > +		if (!is_swap_pte(*pcw->pte))
> > +			return false;
> > +		entry = pte_to_swp_entry(*pcw->pte);
> > +		if (!is_migration_entry(entry))
> > +			return false;
> > +		if (migration_entry_to_page(entry) - pcw->page >=
> > +				hpage_nr_pages(pcw->page)) {
> > +			return false;
> > +		}
> > +		if (migration_entry_to_page(entry) < pcw->page)
> > +			return false;
> > +	} else {
> > +		if (!pte_present(*pcw->pte))
> > +			return false;
> > +
> > +		/* THP can be referenced by any subpage */
> > +		if (pte_page(*pcw->pte) - pcw->page >=
> > +				hpage_nr_pages(pcw->page)) {
> > +			return false;
> > +		}
> > +		if (pte_page(*pcw->pte) < pcw->page)
> > +			return false;
> > +	}
> > +
> > +	return true;
> > +}
> 
> Thankfully, because inlining this one does seem inappropriate - it's
> enormous!
> 
> > +bool __page_check_walk(struct page_check_walk *pcw)
> > +{
> > +	struct mm_struct *mm = pcw->vma->vm_mm;
> > +	struct page *page = pcw->page;
> > +	pgd_t *pgd;
> > +	pud_t *pud;
> > +
> > +	/* For THP, seek to next pte entry */
> > +	if (pcw->pte)
> > +		goto next_pte;
> > +
> > +	if (unlikely(PageHuge(pcw->page))) {
> > +		/* when pud is not present, pte will be NULL */
> > +		pcw->pte = huge_pte_offset(mm, pcw->address);
> > +		if (!pcw->pte)
> > +			return false;
> > +
> > +		pcw->ptl = huge_pte_lockptr(page_hstate(page), mm, pcw->pte);
> > +		spin_lock(pcw->ptl);
> > +		if (!check_pte(pcw))
> > +			return not_found(pcw);
> > +		return true;
> > +	}
> > +restart:
> > +	pgd = pgd_offset(mm, pcw->address);
> > +	if (!pgd_present(*pgd))
> > +		return false;
> > +	pud = pud_offset(pgd, pcw->address);
> > +	if (!pud_present(*pud))
> > +		return false;
> > +	pcw->pmd = pmd_offset(pud, pcw->address);
> > +	if (pmd_trans_huge(*pcw->pmd)) {
> > +		pcw->ptl = pmd_lock(mm, pcw->pmd);
> > +		if (!pmd_present(*pcw->pmd))
> > +			return not_found(pcw);
> > +		if (likely(pmd_trans_huge(*pcw->pmd))) {
> > +			if (pcw->flags & PAGE_CHECK_WALK_MIGRATION)
> > +				return not_found(pcw);
> > +			if (pmd_page(*pcw->pmd) != page)
> > +				return not_found(pcw);
> > +			return true;
> > +		} else {
> > +			/* THP pmd was split under us: handle on pte level */
> > +			spin_unlock(pcw->ptl);
> > +			pcw->ptl = NULL;
> > +		}
> > +	} else {
> > +		if (!check_pmd(pcw))
> > +			return false;
> > +	}
> > +	if (!map_pte(pcw))
> > +		goto next_pte;
> > +	while (1) {
> > +		if (check_pte(pcw))
> > +			return true;
> > +next_pte:	do {
> > +			pcw->address += PAGE_SIZE;
> > +			if (pcw->address >= __vma_address(pcw->page, pcw->vma) +
> > +					hpage_nr_pages(pcw->page) * PAGE_SIZE)
> > +				return not_found(pcw);
> > +			/* Did we cross page table boundary? */
> > +			if (pcw->address % PMD_SIZE == 0) {
> > +				pte_unmap(pcw->pte);
> > +				if (pcw->ptl) {
> > +					spin_unlock(pcw->ptl);
> > +					pcw->ptl = NULL;
> > +				}
> > +				goto restart;
> > +			} else {
> > +				pcw->pte++;
> > +			}
> > +		} while (pte_none(*pcw->pte));
> > +
> > +		if (!pcw->ptl) {
> > +			pcw->ptl = pte_lockptr(mm, pcw->pmd);
> > +			spin_lock(pcw->ptl);
> > +		}
> > +	}
> > +}
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 02/12] mm: introduce page_check_walk()
@ 2017-01-24 22:50       ` Kirill A. Shutemov
  0 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 22:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	linux-mm, linux-kernel

On Tue, Jan 24, 2017 at 01:41:22PM -0800, Andrew Morton wrote:
> On Tue, 24 Jan 2017 19:28:14 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> 
> > The patch introduce new interface to check if a page is mapped into a vma.
> > It aims to address shortcomings of page_check_address{,_transhuge}.
> > 
> > Existing interface is not able to handle PTE-mapped THPs: it only finds
> > the first PTE. The rest lefted unnoticed.
> > 
> > page_check_walk() iterates over all possible mapping of the page in the
> > vma.
> 
> I really don't like the name page_check_walk().  "check" could mean any
> damn thing.  Something like page_vma_mapped_walk() has meaning.  We
> could omit the "_walk" for brevity.

page_vma_mapped() would sound as predicate.
I'll rename to page_vma_mapped_walk().

> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  include/linux/rmap.h |  65 ++++++++++++++++++++++
> >  mm/Makefile          |   6 ++-
> >  mm/huge_memory.c     |   9 ++--
> >  mm/page_check.c      | 148 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 223 insertions(+), 5 deletions(-)
> >  create mode 100644 mm/page_check.c
> > 
> > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> > index 15321fb1df6b..474279810742 100644
> > --- a/include/linux/rmap.h
> > +++ b/include/linux/rmap.h
> > @@ -232,6 +232,71 @@ static inline bool page_check_address_transhuge(struct page *page,
> >  }
> >  #endif
> >  
> > +/* Avoid racy checks */
> > +#define PAGE_CHECK_WALK_SYNC		(1 << 0)
> > +/* Look for migarion entries rather than present ptes */
> > +#define PAGE_CHECK_WALK_MIGRATION	(1 << 1)
> > +
> > +struct page_check_walk {
> > +	struct page *page;
> > +	struct vm_area_struct *vma;
> > +	unsigned long address;
> > +	pmd_t *pmd;
> > +	pte_t *pte;
> > +	spinlock_t *ptl;
> > +	unsigned int flags;
> > +};
> 
> One thing which I don't think was documented is that it is the caller's
> responsibility to initialize this appropriately before calling
> page_check_walk().  At least, .pte and .ptl must be NULL, for
> page_check_walk_done().

Okay, I'll write it down.

> > +static inline void page_check_walk_done(struct page_check_walk *pcw)
> > +{
> > +	if (pcw->pte)
> > +		pte_unmap(pcw->pte);
> > +	if (pcw->ptl)
> > +		spin_unlock(pcw->ptl);
> > +}
> > +
> > +bool __page_check_walk(struct page_check_walk *pcw);
> > +
> > +/**
> > + * page_check_walk - check if @pcw->page is mapped in @pcw->vma at @pcw->address
> > + * @pcw: pointer to struce page_check_walk. page, vma and address must be set.
> 
> "struct"
> 
> > + *
> > + * Returns true, if the page is mapped in the vma. @pcw->pmd and @pcw->pte point
> 
> "Returns true if"
> 
> > + * to relevant page table entries. @pcw->ptl is locked. @pcw->address is
> > + * adjusted if needed (for PTE-mapped THPs).
> > + *
> > + * If @pcw->pmd is set, but @pcw->pte is not, you have found PMD-mapped page
> 
> "is set but"
> 
> > + * (usually THP). For PTE-mapped THP, you should run page_check_walk() in 
> > + * a loop to find all PTEs that maps the THP.
> 
> "that map"
> 
> > + *
> > + * For HugeTLB pages, @pcw->pte is set to relevant page table entry regardless
> 
> "set to the relevant", "regardless of"
> 
> > + * which page table level the page mapped at. @pcw->pmd is NULL.
> 
> "the page is"
> 
> > + *
> > + * Retruns false, if there's no more page table entries for the page in the vma.
> 
> "Returns false if there are"

Thanks for corrections. Applied.

> > + * @pcw->ptl is unlocked and @pcw->pte is unmapped.
> > + *
> > + * If you need to stop the walk before page_check_walk() returned false, use
> > + * page_check_walk_done(). It will do the housekeeping.
> > + */
> > +static inline bool page_check_walk(struct page_check_walk *pcw)
> > +{
> > +	/* The only possible pmd mapping has been handled on last iteration */
> > +	if (pcw->pmd && !pcw->pte) {
> > +		page_check_walk_done(pcw);
> > +		return false;
> > +	}
> > +
> > +	/* Only for THP, seek to next pte entry makes sense */
> > +	if (pcw->pte) {
> > +		if (!PageTransHuge(pcw->page) || PageHuge(pcw->page)) {
> > +			page_check_walk_done(pcw);
> > +			return false;
> > +		}
> > +	}
> > +
> > +	return __page_check_walk(pcw);
> > +}
> 
> Was the decision to inline this a correct one?

Well, my logic was that in most cases we would have exactly one iteration.
The only case when we need more than one iteration is PTE-mapped THP which
is rare.
I hoped to avoid additional function call. Not sure if it worth it.

Should I move it inside the function?

> > --- /dev/null
> > +++ b/mm/page_check.c
> > @@ -0,0 +1,148 @@
> > +#include <linux/mm.h>
> > +#include <linux/rmap.h>
> > +#include <linux/hugetlb.h>
> > +#include <linux/swap.h>
> > +#include <linux/swapops.h>
> > +
> > +#include "internal.h"
> > +
> > +static inline bool check_pmd(struct page_check_walk *pcw)
> > +{
> > +	pmd_t pmde = *pcw->pmd;
> > +	barrier();
> > +	return pmd_present(pmde) && !pmd_trans_huge(pmde);
> > +}
> 
> Can we please have a comment explaining what the barrier() does?

I copied it from page_check_address_transhuge().

I think we can get aways with READ_ONCE() instead.
I'll add a comment.

> > +static inline bool not_found(struct page_check_walk *pcw)
> > +{
> > +	page_check_walk_done(pcw);
> > +	return false;
> > +}
> > +
> > +static inline bool map_pte(struct page_check_walk *pcw)
> > +{
> > +	pcw->pte = pte_offset_map(pcw->pmd, pcw->address);
> > +	if (!(pcw->flags & PAGE_CHECK_WALK_SYNC)) {
> > +		if (pcw->flags & PAGE_CHECK_WALK_MIGRATION) {
> > +			if (!is_swap_pte(*pcw->pte))
> > +				return false;
> > +		} else {
> > +			if (!pte_present(*pcw->pte))
> > +				return false;
> > +		}
> > +	}
> > +	pcw->ptl = pte_lockptr(pcw->vma->vm_mm, pcw->pmd);
> > +	spin_lock(pcw->ptl);
> > +	return true;
> > +}
> 
> The compiler will just ignore all these "inline" statements.

These helpers grew during development. I forgot to drop "inline".

> > +static inline bool check_pte(struct page_check_walk *pcw)
> > +{
> > +	if (pcw->flags & PAGE_CHECK_WALK_MIGRATION) {
> > +		swp_entry_t entry;
> > +		if (!is_swap_pte(*pcw->pte))
> > +			return false;
> > +		entry = pte_to_swp_entry(*pcw->pte);
> > +		if (!is_migration_entry(entry))
> > +			return false;
> > +		if (migration_entry_to_page(entry) - pcw->page >=
> > +				hpage_nr_pages(pcw->page)) {
> > +			return false;
> > +		}
> > +		if (migration_entry_to_page(entry) < pcw->page)
> > +			return false;
> > +	} else {
> > +		if (!pte_present(*pcw->pte))
> > +			return false;
> > +
> > +		/* THP can be referenced by any subpage */
> > +		if (pte_page(*pcw->pte) - pcw->page >=
> > +				hpage_nr_pages(pcw->page)) {
> > +			return false;
> > +		}
> > +		if (pte_page(*pcw->pte) < pcw->page)
> > +			return false;
> > +	}
> > +
> > +	return true;
> > +}
> 
> Thankfully, because inlining this one does seem inappropriate - it's
> enormous!
> 
> > +bool __page_check_walk(struct page_check_walk *pcw)
> > +{
> > +	struct mm_struct *mm = pcw->vma->vm_mm;
> > +	struct page *page = pcw->page;
> > +	pgd_t *pgd;
> > +	pud_t *pud;
> > +
> > +	/* For THP, seek to next pte entry */
> > +	if (pcw->pte)
> > +		goto next_pte;
> > +
> > +	if (unlikely(PageHuge(pcw->page))) {
> > +		/* when pud is not present, pte will be NULL */
> > +		pcw->pte = huge_pte_offset(mm, pcw->address);
> > +		if (!pcw->pte)
> > +			return false;
> > +
> > +		pcw->ptl = huge_pte_lockptr(page_hstate(page), mm, pcw->pte);
> > +		spin_lock(pcw->ptl);
> > +		if (!check_pte(pcw))
> > +			return not_found(pcw);
> > +		return true;
> > +	}
> > +restart:
> > +	pgd = pgd_offset(mm, pcw->address);
> > +	if (!pgd_present(*pgd))
> > +		return false;
> > +	pud = pud_offset(pgd, pcw->address);
> > +	if (!pud_present(*pud))
> > +		return false;
> > +	pcw->pmd = pmd_offset(pud, pcw->address);
> > +	if (pmd_trans_huge(*pcw->pmd)) {
> > +		pcw->ptl = pmd_lock(mm, pcw->pmd);
> > +		if (!pmd_present(*pcw->pmd))
> > +			return not_found(pcw);
> > +		if (likely(pmd_trans_huge(*pcw->pmd))) {
> > +			if (pcw->flags & PAGE_CHECK_WALK_MIGRATION)
> > +				return not_found(pcw);
> > +			if (pmd_page(*pcw->pmd) != page)
> > +				return not_found(pcw);
> > +			return true;
> > +		} else {
> > +			/* THP pmd was split under us: handle on pte level */
> > +			spin_unlock(pcw->ptl);
> > +			pcw->ptl = NULL;
> > +		}
> > +	} else {
> > +		if (!check_pmd(pcw))
> > +			return false;
> > +	}
> > +	if (!map_pte(pcw))
> > +		goto next_pte;
> > +	while (1) {
> > +		if (check_pte(pcw))
> > +			return true;
> > +next_pte:	do {
> > +			pcw->address += PAGE_SIZE;
> > +			if (pcw->address >= __vma_address(pcw->page, pcw->vma) +
> > +					hpage_nr_pages(pcw->page) * PAGE_SIZE)
> > +				return not_found(pcw);
> > +			/* Did we cross page table boundary? */
> > +			if (pcw->address % PMD_SIZE == 0) {
> > +				pte_unmap(pcw->pte);
> > +				if (pcw->ptl) {
> > +					spin_unlock(pcw->ptl);
> > +					pcw->ptl = NULL;
> > +				}
> > +				goto restart;
> > +			} else {
> > +				pcw->pte++;
> > +			}
> > +		} while (pte_none(*pcw->pte));
> > +
> > +		if (!pcw->ptl) {
> > +			pcw->ptl = pte_lockptr(mm, pcw->pmd);
> > +			spin_lock(pcw->ptl);
> > +		}
> > +	}
> > +}
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 02/12] mm: introduce page_check_walk()
  2017-01-24 22:50       ` Kirill A. Shutemov
@ 2017-01-24 22:55         ` Andrew Morton
  -1 siblings, 0 replies; 71+ messages in thread
From: Andrew Morton @ 2017-01-24 22:55 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	linux-mm, linux-kernel

On Wed, 25 Jan 2017 01:50:30 +0300 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> > > + * @pcw->ptl is unlocked and @pcw->pte is unmapped.
> > > + *
> > > + * If you need to stop the walk before page_check_walk() returned false, use
> > > + * page_check_walk_done(). It will do the housekeeping.
> > > + */
> > > +static inline bool page_check_walk(struct page_check_walk *pcw)
> > > +{
> > > +	/* The only possible pmd mapping has been handled on last iteration */
> > > +	if (pcw->pmd && !pcw->pte) {
> > > +		page_check_walk_done(pcw);
> > > +		return false;
> > > +	}
> > > +
> > > +	/* Only for THP, seek to next pte entry makes sense */
> > > +	if (pcw->pte) {
> > > +		if (!PageTransHuge(pcw->page) || PageHuge(pcw->page)) {
> > > +			page_check_walk_done(pcw);
> > > +			return false;
> > > +		}
> > > +	}
> > > +
> > > +	return __page_check_walk(pcw);
> > > +}
> > 
> > Was the decision to inline this a correct one?
> 
> Well, my logic was that in most cases we would have exactly one iteration.
> The only case when we need more than one iteration is PTE-mapped THP which
> is rare.
> I hoped to avoid additional function call. Not sure if it worth it.
> 
> Should I move it inside the function?

I suggest building a kernel with it uninlined, take a look at the bloat
factor then make a seat-of-the pants decision about "is it worth it". 
With quite a few callsites the saving from uninlining may be
significant.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 02/12] mm: introduce page_check_walk()
@ 2017-01-24 22:55         ` Andrew Morton
  0 siblings, 0 replies; 71+ messages in thread
From: Andrew Morton @ 2017-01-24 22:55 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	linux-mm, linux-kernel

On Wed, 25 Jan 2017 01:50:30 +0300 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> > > + * @pcw->ptl is unlocked and @pcw->pte is unmapped.
> > > + *
> > > + * If you need to stop the walk before page_check_walk() returned false, use
> > > + * page_check_walk_done(). It will do the housekeeping.
> > > + */
> > > +static inline bool page_check_walk(struct page_check_walk *pcw)
> > > +{
> > > +	/* The only possible pmd mapping has been handled on last iteration */
> > > +	if (pcw->pmd && !pcw->pte) {
> > > +		page_check_walk_done(pcw);
> > > +		return false;
> > > +	}
> > > +
> > > +	/* Only for THP, seek to next pte entry makes sense */
> > > +	if (pcw->pte) {
> > > +		if (!PageTransHuge(pcw->page) || PageHuge(pcw->page)) {
> > > +			page_check_walk_done(pcw);
> > > +			return false;
> > > +		}
> > > +	}
> > > +
> > > +	return __page_check_walk(pcw);
> > > +}
> > 
> > Was the decision to inline this a correct one?
> 
> Well, my logic was that in most cases we would have exactly one iteration.
> The only case when we need more than one iteration is PTE-mapped THP which
> is rare.
> I hoped to avoid additional function call. Not sure if it worth it.
> 
> Should I move it inside the function?

I suggest building a kernel with it uninlined, take a look at the bloat
factor then make a seat-of-the pants decision about "is it worth it". 
With quite a few callsites the saving from uninlining may be
significant.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
  2017-01-24 22:35         ` Andrew Morton
@ 2017-01-24 22:56           ` Kirill A. Shutemov
  -1 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 22:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	linux-mm, linux-kernel, Oleg Nesterov, Peter Zijlstra

On Tue, Jan 24, 2017 at 02:35:59PM -0800, Andrew Morton wrote:
> On Wed, 25 Jan 2017 01:22:17 +0300 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Tue, Jan 24, 2017 at 01:28:49PM -0800, Andrew Morton wrote:
> > > On Tue, 24 Jan 2017 19:28:13 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> > > 
> > > > For THPs page_check_address() always fails. It's better to split them
> > > > first before trying to replace.
> > > 
> > > So what does this mean.  uprobes simply fails to work when trying to
> > > place a probe into a THP memory region?
> > 
> > Looks like we can end up with endless retry loop in uprobe_write_opcode().
> > 
> > > How come nobody noticed (and reported) this when using the feature?
> > 
> > I guess it's not often used for anon memory.
> 
> OK,  can we please include discussion of these things in the changelog?

Okay, I'll try to come up with a test case too.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
@ 2017-01-24 22:56           ` Kirill A. Shutemov
  0 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-24 22:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	linux-mm, linux-kernel, Oleg Nesterov, Peter Zijlstra

On Tue, Jan 24, 2017 at 02:35:59PM -0800, Andrew Morton wrote:
> On Wed, 25 Jan 2017 01:22:17 +0300 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Tue, Jan 24, 2017 at 01:28:49PM -0800, Andrew Morton wrote:
> > > On Tue, 24 Jan 2017 19:28:13 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> > > 
> > > > For THPs page_check_address() always fails. It's better to split them
> > > > first before trying to replace.
> > > 
> > > So what does this mean.  uprobes simply fails to work when trying to
> > > place a probe into a THP memory region?
> > 
> > Looks like we can end up with endless retry loop in uprobe_write_opcode().
> > 
> > > How come nobody noticed (and reported) this when using the feature?
> > 
> > I guess it's not often used for anon memory.
> 
> OK,  can we please include discussion of these things in the changelog?

Okay, I'll try to come up with a test case too.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 02/12] mm: introduce page_check_walk()
  2017-01-24 16:28   ` Kirill A. Shutemov
@ 2017-01-25  1:19     ` kbuild test robot
  -1 siblings, 0 replies; 71+ messages in thread
From: kbuild test robot @ 2017-01-25  1:19 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: kbuild-all, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel, Kirill A. Shutemov

[-- Attachment #1: Type: text/plain, Size: 2349 bytes --]

Hi Kirill,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.10-rc5 next-20170124]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Kirill-A-Shutemov/Fix-few-rmap-related-THP-bugs/20170125-081918
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: i386-defconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   In file included from arch/x86/include/asm/pgtable.h:471:0,
                    from include/linux/mm.h:68,
                    from include/linux/ring_buffer.h:5,
                    from include/linux/trace_events.h:5,
                    from include/trace/syscall.h:6,
                    from include/linux/syscalls.h:81,
                    from init/main.c:18:
   include/linux/rmap.h: In function 'page_check_walk_done':
>> arch/x86/include/asm/pgtable_32.h:53:24: error: implicit declaration of function 'kunmap_atomic' [-Werror=implicit-function-declaration]
    #define pte_unmap(pte) kunmap_atomic((pte))
                           ^
>> include/linux/rmap.h:253:3: note: in expansion of macro 'pte_unmap'
      pte_unmap(pcw->pte);
      ^~~~~~~~~
   cc1: some warnings being treated as errors

vim +/pte_unmap +253 include/linux/rmap.h

   237	/* Look for migarion entries rather than present ptes */
   238	#define PAGE_CHECK_WALK_MIGRATION	(1 << 1)
   239	
   240	struct page_check_walk {
   241		struct page *page;
   242		struct vm_area_struct *vma;
   243		unsigned long address;
   244		pmd_t *pmd;
   245		pte_t *pte;
   246		spinlock_t *ptl;
   247		unsigned int flags;
   248	};
   249	
   250	static inline void page_check_walk_done(struct page_check_walk *pcw)
   251	{
   252		if (pcw->pte)
 > 253			pte_unmap(pcw->pte);
   254		if (pcw->ptl)
   255			spin_unlock(pcw->ptl);
   256	}
   257	
   258	bool __page_check_walk(struct page_check_walk *pcw);
   259	
   260	/**
   261	 * page_check_walk - check if @pcw->page is mapped in @pcw->vma at @pcw->address

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25587 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 02/12] mm: introduce page_check_walk()
@ 2017-01-25  1:19     ` kbuild test robot
  0 siblings, 0 replies; 71+ messages in thread
From: kbuild test robot @ 2017-01-25  1:19 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: kbuild-all, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2349 bytes --]

Hi Kirill,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.10-rc5 next-20170124]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Kirill-A-Shutemov/Fix-few-rmap-related-THP-bugs/20170125-081918
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: i386-defconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   In file included from arch/x86/include/asm/pgtable.h:471:0,
                    from include/linux/mm.h:68,
                    from include/linux/ring_buffer.h:5,
                    from include/linux/trace_events.h:5,
                    from include/trace/syscall.h:6,
                    from include/linux/syscalls.h:81,
                    from init/main.c:18:
   include/linux/rmap.h: In function 'page_check_walk_done':
>> arch/x86/include/asm/pgtable_32.h:53:24: error: implicit declaration of function 'kunmap_atomic' [-Werror=implicit-function-declaration]
    #define pte_unmap(pte) kunmap_atomic((pte))
                           ^
>> include/linux/rmap.h:253:3: note: in expansion of macro 'pte_unmap'
      pte_unmap(pcw->pte);
      ^~~~~~~~~
   cc1: some warnings being treated as errors

vim +/pte_unmap +253 include/linux/rmap.h

   237	/* Look for migarion entries rather than present ptes */
   238	#define PAGE_CHECK_WALK_MIGRATION	(1 << 1)
   239	
   240	struct page_check_walk {
   241		struct page *page;
   242		struct vm_area_struct *vma;
   243		unsigned long address;
   244		pmd_t *pmd;
   245		pte_t *pte;
   246		spinlock_t *ptl;
   247		unsigned int flags;
   248	};
   249	
   250	static inline void page_check_walk_done(struct page_check_walk *pcw)
   251	{
   252		if (pcw->pte)
 > 253			pte_unmap(pcw->pte);
   254		if (pcw->ptl)
   255			spin_unlock(pcw->ptl);
   256	}
   257	
   258	bool __page_check_walk(struct page_check_walk *pcw);
   259	
   260	/**
   261	 * page_check_walk - check if @pcw->page is mapped in @pcw->vma at @pcw->address

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25587 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 06/12] mm: convert page_mkclean_one() to page_check_walk()
  2017-01-24 16:28   ` Kirill A. Shutemov
@ 2017-01-25  1:44     ` kbuild test robot
  -1 siblings, 0 replies; 71+ messages in thread
From: kbuild test robot @ 2017-01-25  1:44 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: kbuild-all, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel, Kirill A. Shutemov

[-- Attachment #1: Type: text/plain, Size: 2046 bytes --]

Hi Kirill,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.10-rc5 next-20170124]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Kirill-A-Shutemov/Fix-few-rmap-related-THP-bugs/20170125-081918
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: openrisc-or1ksim_defconfig (attached as .config)
compiler: or32-linux-gcc (GCC) 4.5.1-or32-1.0rc1
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=openrisc 

All errors (new ones prefixed by >>):

   mm/rmap.c: In function 'page_mkclean_one':
   mm/rmap.c:1048:4: error: implicit declaration of function 'pmd_dirty'
   mm/rmap.c:1053:4: error: implicit declaration of function 'pmd_wrprotect'
>> mm/rmap.c:1053:10: error: incompatible types when assigning to type 'pmd_t' from type 'int'
   mm/rmap.c:1054:4: error: implicit declaration of function 'pmd_mkclean'
   mm/rmap.c:1054:10: error: incompatible types when assigning to type 'pmd_t' from type 'int'
   mm/rmap.c:1055:4: error: implicit declaration of function 'set_pmd_at'

vim +1053 mm/rmap.c

  1042				set_pte_at(vma->vm_mm, address, pte, entry);
  1043				ret = 1;
  1044			} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGE_PAGECACHE)) {
  1045				pmd_t *pmd = pcw.pmd;
  1046				pmd_t entry;
  1047	
> 1048				if (!pmd_dirty(*pmd) && !pmd_write(*pmd))
  1049					continue;
  1050	
  1051				flush_cache_page(vma, address, page_to_pfn(page));
  1052				entry = pmdp_huge_clear_flush(vma, address, pmd);
> 1053				entry = pmd_wrprotect(entry);
  1054				entry = pmd_mkclean(entry);
  1055				set_pmd_at(vma->vm_mm, address, pmd, entry);
  1056				ret = 1;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7397 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 06/12] mm: convert page_mkclean_one() to page_check_walk()
@ 2017-01-25  1:44     ` kbuild test robot
  0 siblings, 0 replies; 71+ messages in thread
From: kbuild test robot @ 2017-01-25  1:44 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: kbuild-all, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2046 bytes --]

Hi Kirill,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.10-rc5 next-20170124]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Kirill-A-Shutemov/Fix-few-rmap-related-THP-bugs/20170125-081918
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: openrisc-or1ksim_defconfig (attached as .config)
compiler: or32-linux-gcc (GCC) 4.5.1-or32-1.0rc1
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=openrisc 

All errors (new ones prefixed by >>):

   mm/rmap.c: In function 'page_mkclean_one':
   mm/rmap.c:1048:4: error: implicit declaration of function 'pmd_dirty'
   mm/rmap.c:1053:4: error: implicit declaration of function 'pmd_wrprotect'
>> mm/rmap.c:1053:10: error: incompatible types when assigning to type 'pmd_t' from type 'int'
   mm/rmap.c:1054:4: error: implicit declaration of function 'pmd_mkclean'
   mm/rmap.c:1054:10: error: incompatible types when assigning to type 'pmd_t' from type 'int'
   mm/rmap.c:1055:4: error: implicit declaration of function 'set_pmd_at'

vim +1053 mm/rmap.c

  1042				set_pte_at(vma->vm_mm, address, pte, entry);
  1043				ret = 1;
  1044			} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGE_PAGECACHE)) {
  1045				pmd_t *pmd = pcw.pmd;
  1046				pmd_t entry;
  1047	
> 1048				if (!pmd_dirty(*pmd) && !pmd_write(*pmd))
  1049					continue;
  1050	
  1051				flush_cache_page(vma, address, page_to_pfn(page));
  1052				entry = pmdp_huge_clear_flush(vma, address, pmd);
> 1053				entry = pmd_wrprotect(entry);
  1054				entry = pmd_mkclean(entry);
  1055				set_pmd_at(vma->vm_mm, address, pmd, entry);
  1056				ret = 1;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7397 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 12/12] mm: convert remove_migration_pte() to page_check_walk()
  2017-01-24 16:28   ` Kirill A. Shutemov
@ 2017-01-25  1:54     ` kbuild test robot
  -1 siblings, 0 replies; 71+ messages in thread
From: kbuild test robot @ 2017-01-25  1:54 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: kbuild-all, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel, Kirill A. Shutemov

[-- Attachment #1: Type: text/plain, Size: 10707 bytes --]

Hi Kirill,

[auto build test WARNING on mmotm/master]
[also build test WARNING on v4.10-rc5 next-20170124]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Kirill-A-Shutemov/Fix-few-rmap-related-THP-bugs/20170125-081918
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: openrisc-or1ksim_defconfig (attached as .config)
compiler: or32-linux-gcc (GCC) 4.5.1-or32-1.0rc1
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=openrisc 

All warnings (new ones prefixed by >>):

   mm/migrate.c: In function 'remove_migration_pte':
>> mm/migrate.c:199:20: warning: unused variable 'mm'
   arch/openrisc/include/asm/bitops/atomic.h: Assembler messages:
   arch/openrisc/include/asm/bitops/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:90: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:92: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:90: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:92: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/cmpxchg.h:30: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/cmpxchg.h:34: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/cmpxchg.h:30: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/cmpxchg.h:34: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/cmpxchg.h:30: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/cmpxchg.h:34: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:70: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:72: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:70: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:72: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:70: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:72: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/cmpxchg.h:30: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/cmpxchg.h:34: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/cmpxchg.h:30: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/cmpxchg.h:34: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/cmpxchg.h:30: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/cmpxchg.h:34: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.

vim +/mm +199 mm/migrate.c

bda807d44 Minchan Kim        2016-07-26  183  			unlock_page(page);
bda807d44 Minchan Kim        2016-07-26  184  			put_page(page);
bda807d44 Minchan Kim        2016-07-26  185  		} else {
894bc3104 Lee Schermerhorn   2008-10-18  186  			putback_lru_page(page);
6afcf8ef0 Ming Ling          2016-12-12  187  			dec_node_page_state(page, NR_ISOLATED_ANON +
6afcf8ef0 Ming Ling          2016-12-12  188  					page_is_file_cache(page));
b20a35035 Christoph Lameter  2006-03-22  189  		}
b20a35035 Christoph Lameter  2006-03-22  190  	}
bda807d44 Minchan Kim        2016-07-26  191  }
b20a35035 Christoph Lameter  2006-03-22  192  
0697212a4 Christoph Lameter  2006-06-23  193  /*
0697212a4 Christoph Lameter  2006-06-23  194   * Restore a potential migration pte to a working pte entry
0697212a4 Christoph Lameter  2006-06-23  195   */
51b4efdf7 Kirill A. Shutemov 2017-01-24  196  static int remove_migration_pte(struct page *page, struct vm_area_struct *vma,
e9995ef97 Hugh Dickins       2009-12-14  197  				 unsigned long addr, void *old)
0697212a4 Christoph Lameter  2006-06-23  198  {
0697212a4 Christoph Lameter  2006-06-23 @199  	struct mm_struct *mm = vma->vm_mm;
51b4efdf7 Kirill A. Shutemov 2017-01-24  200  	struct page_check_walk pcw = {
51b4efdf7 Kirill A. Shutemov 2017-01-24  201  		.page = old,
51b4efdf7 Kirill A. Shutemov 2017-01-24  202  		.vma = vma,
51b4efdf7 Kirill A. Shutemov 2017-01-24  203  		.address = addr,
51b4efdf7 Kirill A. Shutemov 2017-01-24  204  		.flags = PAGE_CHECK_WALK_SYNC | PAGE_CHECK_WALK_MIGRATION,
51b4efdf7 Kirill A. Shutemov 2017-01-24  205  	};
51b4efdf7 Kirill A. Shutemov 2017-01-24  206  	struct page *new;
51b4efdf7 Kirill A. Shutemov 2017-01-24  207  	pte_t pte;

:::::: The code at line 199 was first introduced by commit
:::::: 0697212a411c1dae03c27845f2de2f3adb32c331 [PATCH] Swapless page migration: add R/W migration entries

:::::: TO: Christoph Lameter <clameter@sgi.com>
:::::: CC: Linus Torvalds <torvalds@g5.osdl.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7397 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 12/12] mm: convert remove_migration_pte() to page_check_walk()
@ 2017-01-25  1:54     ` kbuild test robot
  0 siblings, 0 replies; 71+ messages in thread
From: kbuild test robot @ 2017-01-25  1:54 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: kbuild-all, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 10707 bytes --]

Hi Kirill,

[auto build test WARNING on mmotm/master]
[also build test WARNING on v4.10-rc5 next-20170124]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Kirill-A-Shutemov/Fix-few-rmap-related-THP-bugs/20170125-081918
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: openrisc-or1ksim_defconfig (attached as .config)
compiler: or32-linux-gcc (GCC) 4.5.1-or32-1.0rc1
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=openrisc 

All warnings (new ones prefixed by >>):

   mm/migrate.c: In function 'remove_migration_pte':
>> mm/migrate.c:199:20: warning: unused variable 'mm'
   arch/openrisc/include/asm/bitops/atomic.h: Assembler messages:
   arch/openrisc/include/asm/bitops/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:90: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:92: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:90: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:92: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/cmpxchg.h:30: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/cmpxchg.h:34: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/cmpxchg.h:30: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/cmpxchg.h:34: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/cmpxchg.h:30: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/cmpxchg.h:34: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:70: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:72: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:70: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:72: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:70: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:72: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/cmpxchg.h:30: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/cmpxchg.h:34: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/cmpxchg.h:30: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/cmpxchg.h:34: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/cmpxchg.h:30: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/cmpxchg.h:34: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:18: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:20: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/bitops/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/bitops/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.
   arch/openrisc/include/asm/atomic.h:37: Error: unknown opcode2 `l.swa'.
   arch/openrisc/include/asm/atomic.h:35: Error: unknown opcode2 `l.lwa'.

vim +/mm +199 mm/migrate.c

bda807d44 Minchan Kim        2016-07-26  183  			unlock_page(page);
bda807d44 Minchan Kim        2016-07-26  184  			put_page(page);
bda807d44 Minchan Kim        2016-07-26  185  		} else {
894bc3104 Lee Schermerhorn   2008-10-18  186  			putback_lru_page(page);
6afcf8ef0 Ming Ling          2016-12-12  187  			dec_node_page_state(page, NR_ISOLATED_ANON +
6afcf8ef0 Ming Ling          2016-12-12  188  					page_is_file_cache(page));
b20a35035 Christoph Lameter  2006-03-22  189  		}
b20a35035 Christoph Lameter  2006-03-22  190  	}
bda807d44 Minchan Kim        2016-07-26  191  }
b20a35035 Christoph Lameter  2006-03-22  192  
0697212a4 Christoph Lameter  2006-06-23  193  /*
0697212a4 Christoph Lameter  2006-06-23  194   * Restore a potential migration pte to a working pte entry
0697212a4 Christoph Lameter  2006-06-23  195   */
51b4efdf7 Kirill A. Shutemov 2017-01-24  196  static int remove_migration_pte(struct page *page, struct vm_area_struct *vma,
e9995ef97 Hugh Dickins       2009-12-14  197  				 unsigned long addr, void *old)
0697212a4 Christoph Lameter  2006-06-23  198  {
0697212a4 Christoph Lameter  2006-06-23 @199  	struct mm_struct *mm = vma->vm_mm;
51b4efdf7 Kirill A. Shutemov 2017-01-24  200  	struct page_check_walk pcw = {
51b4efdf7 Kirill A. Shutemov 2017-01-24  201  		.page = old,
51b4efdf7 Kirill A. Shutemov 2017-01-24  202  		.vma = vma,
51b4efdf7 Kirill A. Shutemov 2017-01-24  203  		.address = addr,
51b4efdf7 Kirill A. Shutemov 2017-01-24  204  		.flags = PAGE_CHECK_WALK_SYNC | PAGE_CHECK_WALK_MIGRATION,
51b4efdf7 Kirill A. Shutemov 2017-01-24  205  	};
51b4efdf7 Kirill A. Shutemov 2017-01-24  206  	struct page *new;
51b4efdf7 Kirill A. Shutemov 2017-01-24  207  	pte_t pte;

:::::: The code at line 199 was first introduced by commit
:::::: 0697212a411c1dae03c27845f2de2f3adb32c331 [PATCH] Swapless page migration: add R/W migration entries

:::::: TO: Christoph Lameter <clameter@sgi.com>
:::::: CC: Linus Torvalds <torvalds@g5.osdl.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7397 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 02/12] mm: introduce page_check_walk()
  2017-01-24 16:28   ` Kirill A. Shutemov
@ 2017-01-25  1:59     ` kbuild test robot
  -1 siblings, 0 replies; 71+ messages in thread
From: kbuild test robot @ 2017-01-25  1:59 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: kbuild-all, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel, Kirill A. Shutemov

[-- Attachment #1: Type: text/plain, Size: 1875 bytes --]

Hi Kirill,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.10-rc5 next-20170124]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Kirill-A-Shutemov/Fix-few-rmap-related-THP-bugs/20170125-081918
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: sparc64-allnoconfig (attached as .config)
compiler: sparc64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sparc64 

All error/warnings (new ones prefixed by >>):

   mm/page_check.c: In function 'check_pte':
>> mm/page_check.c:48:38: error: invalid operands to binary - (have 'void *' and 'struct page *')
      if (migration_entry_to_page(entry) - pcw->page >=
                                         ^ ~~~~~~~~~
>> mm/page_check.c:52:38: warning: comparison of distinct pointer types lacks a cast
      if (migration_entry_to_page(entry) < pcw->page)
                                         ^

vim +48 mm/page_check.c

    42			swp_entry_t entry;
    43			if (!is_swap_pte(*pcw->pte))
    44				return false;
    45			entry = pte_to_swp_entry(*pcw->pte);
    46			if (!is_migration_entry(entry))
    47				return false;
  > 48			if (migration_entry_to_page(entry) - pcw->page >=
    49					hpage_nr_pages(pcw->page)) {
    50				return false;
    51			}
  > 52			if (migration_entry_to_page(entry) < pcw->page)
    53				return false;
    54		} else {
    55			if (!pte_present(*pcw->pte))

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 5132 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 02/12] mm: introduce page_check_walk()
@ 2017-01-25  1:59     ` kbuild test robot
  0 siblings, 0 replies; 71+ messages in thread
From: kbuild test robot @ 2017-01-25  1:59 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: kbuild-all, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1875 bytes --]

Hi Kirill,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.10-rc5 next-20170124]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Kirill-A-Shutemov/Fix-few-rmap-related-THP-bugs/20170125-081918
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: sparc64-allnoconfig (attached as .config)
compiler: sparc64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sparc64 

All error/warnings (new ones prefixed by >>):

   mm/page_check.c: In function 'check_pte':
>> mm/page_check.c:48:38: error: invalid operands to binary - (have 'void *' and 'struct page *')
      if (migration_entry_to_page(entry) - pcw->page >=
                                         ^ ~~~~~~~~~
>> mm/page_check.c:52:38: warning: comparison of distinct pointer types lacks a cast
      if (migration_entry_to_page(entry) < pcw->page)
                                         ^

vim +48 mm/page_check.c

    42			swp_entry_t entry;
    43			if (!is_swap_pte(*pcw->pte))
    44				return false;
    45			entry = pte_to_swp_entry(*pcw->pte);
    46			if (!is_migration_entry(entry))
    47				return false;
  > 48			if (migration_entry_to_page(entry) - pcw->page >=
    49					hpage_nr_pages(pcw->page)) {
    50				return false;
    51			}
  > 52			if (migration_entry_to_page(entry) < pcw->page)
    53				return false;
    54		} else {
    55			if (!pte_present(*pcw->pte))

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 5132 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 06/12] mm: convert page_mkclean_one() to page_check_walk()
  2017-01-24 16:28   ` Kirill A. Shutemov
@ 2017-01-25  2:00     ` kbuild test robot
  -1 siblings, 0 replies; 71+ messages in thread
From: kbuild test robot @ 2017-01-25  2:00 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: kbuild-all, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel, Kirill A. Shutemov

[-- Attachment #1: Type: text/plain, Size: 2724 bytes --]

Hi Kirill,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.10-rc5 next-20170124]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Kirill-A-Shutemov/Fix-few-rmap-related-THP-bugs/20170125-081918
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 6.2.0
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   mm/rmap.c: In function 'page_mkclean_one':
>> mm/rmap.c:1048:9: error: implicit declaration of function 'pmd_dirty' [-Werror=implicit-function-declaration]
       if (!pmd_dirty(*pmd) && !pmd_write(*pmd))
            ^~~~~~~~~
>> mm/rmap.c:1053:12: error: implicit declaration of function 'pmd_wrprotect' [-Werror=implicit-function-declaration]
       entry = pmd_wrprotect(entry);
               ^~~~~~~~~~~~~
>> mm/rmap.c:1053:10: error: incompatible types when assigning to type 'pmd_t {aka struct <anonymous>}' from type 'int'
       entry = pmd_wrprotect(entry);
             ^
>> mm/rmap.c:1054:12: error: implicit declaration of function 'pmd_mkclean' [-Werror=implicit-function-declaration]
       entry = pmd_mkclean(entry);
               ^~~~~~~~~~~
   mm/rmap.c:1054:10: error: incompatible types when assigning to type 'pmd_t {aka struct <anonymous>}' from type 'int'
       entry = pmd_mkclean(entry);
             ^
>> mm/rmap.c:1055:4: error: implicit declaration of function 'set_pmd_at' [-Werror=implicit-function-declaration]
       set_pmd_at(vma->vm_mm, address, pmd, entry);
       ^~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/pmd_dirty +1048 mm/rmap.c

  1042				set_pte_at(vma->vm_mm, address, pte, entry);
  1043				ret = 1;
  1044			} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGE_PAGECACHE)) {
  1045				pmd_t *pmd = pcw.pmd;
  1046				pmd_t entry;
  1047	
> 1048				if (!pmd_dirty(*pmd) && !pmd_write(*pmd))
  1049					continue;
  1050	
  1051				flush_cache_page(vma, address, page_to_pfn(page));
  1052				entry = pmdp_huge_clear_flush(vma, address, pmd);
> 1053				entry = pmd_wrprotect(entry);
> 1054				entry = pmd_mkclean(entry);
> 1055				set_pmd_at(vma->vm_mm, address, pmd, entry);
  1056				ret = 1;
  1057			} else {
  1058				/* unexpected pmd-mapped page? */

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 46039 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 06/12] mm: convert page_mkclean_one() to page_check_walk()
@ 2017-01-25  2:00     ` kbuild test robot
  0 siblings, 0 replies; 71+ messages in thread
From: kbuild test robot @ 2017-01-25  2:00 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: kbuild-all, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2724 bytes --]

Hi Kirill,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.10-rc5 next-20170124]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Kirill-A-Shutemov/Fix-few-rmap-related-THP-bugs/20170125-081918
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 6.2.0
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   mm/rmap.c: In function 'page_mkclean_one':
>> mm/rmap.c:1048:9: error: implicit declaration of function 'pmd_dirty' [-Werror=implicit-function-declaration]
       if (!pmd_dirty(*pmd) && !pmd_write(*pmd))
            ^~~~~~~~~
>> mm/rmap.c:1053:12: error: implicit declaration of function 'pmd_wrprotect' [-Werror=implicit-function-declaration]
       entry = pmd_wrprotect(entry);
               ^~~~~~~~~~~~~
>> mm/rmap.c:1053:10: error: incompatible types when assigning to type 'pmd_t {aka struct <anonymous>}' from type 'int'
       entry = pmd_wrprotect(entry);
             ^
>> mm/rmap.c:1054:12: error: implicit declaration of function 'pmd_mkclean' [-Werror=implicit-function-declaration]
       entry = pmd_mkclean(entry);
               ^~~~~~~~~~~
   mm/rmap.c:1054:10: error: incompatible types when assigning to type 'pmd_t {aka struct <anonymous>}' from type 'int'
       entry = pmd_mkclean(entry);
             ^
>> mm/rmap.c:1055:4: error: implicit declaration of function 'set_pmd_at' [-Werror=implicit-function-declaration]
       set_pmd_at(vma->vm_mm, address, pmd, entry);
       ^~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/pmd_dirty +1048 mm/rmap.c

  1042				set_pte_at(vma->vm_mm, address, pte, entry);
  1043				ret = 1;
  1044			} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGE_PAGECACHE)) {
  1045				pmd_t *pmd = pcw.pmd;
  1046				pmd_t entry;
  1047	
> 1048				if (!pmd_dirty(*pmd) && !pmd_write(*pmd))
  1049					continue;
  1050	
  1051				flush_cache_page(vma, address, page_to_pfn(page));
  1052				entry = pmdp_huge_clear_flush(vma, address, pmd);
> 1053				entry = pmd_wrprotect(entry);
> 1054				entry = pmd_mkclean(entry);
> 1055				set_pmd_at(vma->vm_mm, address, pmd, entry);
  1056				ret = 1;
  1057			} else {
  1058				/* unexpected pmd-mapped page? */

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 46039 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 07/12] mm: convert try_to_unmap_one() to page_check_walk()
  2017-01-24 16:28   ` Kirill A. Shutemov
@ 2017-01-25  3:13     ` kbuild test robot
  -1 siblings, 0 replies; 71+ messages in thread
From: kbuild test robot @ 2017-01-25  3:13 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: kbuild-all, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel, Kirill A. Shutemov

[-- Attachment #1: Type: text/plain, Size: 2706 bytes --]

Hi Kirill,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.10-rc5 next-20170124]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Kirill-A-Shutemov/Fix-few-rmap-related-THP-bugs/20170125-081918
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: sh-titan_defconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sh 

All errors (new ones prefixed by >>):

   mm/rmap.c: In function 'page_mkclean_one':
   mm/rmap.c:1046:9: error: implicit declaration of function 'pmd_dirty' [-Werror=implicit-function-declaration]
       if (!pmd_dirty(*pmd) && !pmd_write(*pmd))
            ^~~~~~~~~
   mm/rmap.c:1051:12: error: implicit declaration of function 'pmd_wrprotect' [-Werror=implicit-function-declaration]
       entry = pmd_wrprotect(entry);
               ^~~~~~~~~~~~~
   mm/rmap.c:1051:10: error: incompatible types when assigning to type 'pmd_t {aka struct <anonymous>}' from type 'int'
       entry = pmd_wrprotect(entry);
             ^
   mm/rmap.c:1052:12: error: implicit declaration of function 'pmd_mkclean' [-Werror=implicit-function-declaration]
       entry = pmd_mkclean(entry);
               ^~~~~~~~~~~
   mm/rmap.c:1052:10: error: incompatible types when assigning to type 'pmd_t {aka struct <anonymous>}' from type 'int'
       entry = pmd_mkclean(entry);
             ^
   mm/rmap.c:1053:4: error: implicit declaration of function 'set_pmd_at' [-Werror=implicit-function-declaration]
       set_pmd_at(vma->vm_mm, address, pmd, entry);
       ^~~~~~~~~~
   mm/rmap.c: In function 'try_to_unmap_one':
>> mm/rmap.c:1518:34: error: implicit declaration of function 'pte_to_pfn' [-Werror=implicit-function-declaration]
      flush_cache_page(vma, address, pte_to_pfn(pcw.pte));
                                     ^~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/pte_to_pfn +1518 mm/rmap.c

  1512					page_check_walk_done(&pcw);
  1513					break;
  1514				}
  1515			}
  1516	
  1517			/* Nuke the page table entry. */
> 1518			flush_cache_page(vma, address, pte_to_pfn(pcw.pte));
  1519			if (should_defer_flush(mm, flags)) {
  1520				/*
  1521				 * We clear the PTE but do not flush so potentially

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 16229 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 07/12] mm: convert try_to_unmap_one() to page_check_walk()
@ 2017-01-25  3:13     ` kbuild test robot
  0 siblings, 0 replies; 71+ messages in thread
From: kbuild test robot @ 2017-01-25  3:13 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: kbuild-all, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2706 bytes --]

Hi Kirill,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.10-rc5 next-20170124]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Kirill-A-Shutemov/Fix-few-rmap-related-THP-bugs/20170125-081918
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: sh-titan_defconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sh 

All errors (new ones prefixed by >>):

   mm/rmap.c: In function 'page_mkclean_one':
   mm/rmap.c:1046:9: error: implicit declaration of function 'pmd_dirty' [-Werror=implicit-function-declaration]
       if (!pmd_dirty(*pmd) && !pmd_write(*pmd))
            ^~~~~~~~~
   mm/rmap.c:1051:12: error: implicit declaration of function 'pmd_wrprotect' [-Werror=implicit-function-declaration]
       entry = pmd_wrprotect(entry);
               ^~~~~~~~~~~~~
   mm/rmap.c:1051:10: error: incompatible types when assigning to type 'pmd_t {aka struct <anonymous>}' from type 'int'
       entry = pmd_wrprotect(entry);
             ^
   mm/rmap.c:1052:12: error: implicit declaration of function 'pmd_mkclean' [-Werror=implicit-function-declaration]
       entry = pmd_mkclean(entry);
               ^~~~~~~~~~~
   mm/rmap.c:1052:10: error: incompatible types when assigning to type 'pmd_t {aka struct <anonymous>}' from type 'int'
       entry = pmd_mkclean(entry);
             ^
   mm/rmap.c:1053:4: error: implicit declaration of function 'set_pmd_at' [-Werror=implicit-function-declaration]
       set_pmd_at(vma->vm_mm, address, pmd, entry);
       ^~~~~~~~~~
   mm/rmap.c: In function 'try_to_unmap_one':
>> mm/rmap.c:1518:34: error: implicit declaration of function 'pte_to_pfn' [-Werror=implicit-function-declaration]
      flush_cache_page(vma, address, pte_to_pfn(pcw.pte));
                                     ^~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/pte_to_pfn +1518 mm/rmap.c

  1512					page_check_walk_done(&pcw);
  1513					break;
  1514				}
  1515			}
  1516	
  1517			/* Nuke the page table entry. */
> 1518			flush_cache_page(vma, address, pte_to_pfn(pcw.pte));
  1519			if (should_defer_flush(mm, flags)) {
  1520				/*
  1521				 * We clear the PTE but do not flush so potentially

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 16229 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
  2017-01-24 22:22       ` Kirill A. Shutemov
@ 2017-01-25 16:55         ` Srikar Dronamraju
  -1 siblings, 0 replies; 71+ messages in thread
From: Srikar Dronamraju @ 2017-01-25 16:55 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli,
	Hugh Dickins, Rik van Riel, linux-mm, linux-kernel,
	Oleg Nesterov, Peter Zijlstra

> > 
> > > For THPs page_check_address() always fails. It's better to split them
> > > first before trying to replace.
> > 
> > So what does this mean.  uprobes simply fails to work when trying to
> > place a probe into a THP memory region?
> 
> Looks like we can end up with endless retry loop in uprobe_write_opcode().
> 
> > How come nobody noticed (and reported) this when using the feature?
> 
> I guess it's not often used for anon memory.
> 

The first time the breakpoint is hit on a page, it replaces the text
page with anon page.  Now lets assume we insert breakpoints in all the
pages in a range. Here each page is individually replaced by a non THP
anonpage. (since we dont have bulk breakpoint insertion support,
breakpoint insertion happens one at a time). Now the only interesting
case may be when each of these replaced pages happen to be physically
contiguous so that THP kicks in to replace all of these pages with one
THP page. Can happen in practice?

Are there any other cases that I have missed?

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
@ 2017-01-25 16:55         ` Srikar Dronamraju
  0 siblings, 0 replies; 71+ messages in thread
From: Srikar Dronamraju @ 2017-01-25 16:55 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli,
	Hugh Dickins, Rik van Riel, linux-mm, linux-kernel,
	Oleg Nesterov, Peter Zijlstra

> > 
> > > For THPs page_check_address() always fails. It's better to split them
> > > first before trying to replace.
> > 
> > So what does this mean.  uprobes simply fails to work when trying to
> > place a probe into a THP memory region?
> 
> Looks like we can end up with endless retry loop in uprobe_write_opcode().
> 
> > How come nobody noticed (and reported) this when using the feature?
> 
> I guess it's not often used for anon memory.
> 

The first time the breakpoint is hit on a page, it replaces the text
page with anon page.  Now lets assume we insert breakpoints in all the
pages in a range. Here each page is individually replaced by a non THP
anonpage. (since we dont have bulk breakpoint insertion support,
breakpoint insertion happens one at a time). Now the only interesting
case may be when each of these replaced pages happen to be physically
contiguous so that THP kicks in to replace all of these pages with one
THP page. Can happen in practice?

Are there any other cases that I have missed?

-- 
Thanks and Regards
Srikar Dronamraju

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
  2017-01-25 16:55         ` Srikar Dronamraju
  (?)
@ 2017-01-25 17:44         ` Rik van Riel
  -1 siblings, 0 replies; 71+ messages in thread
From: Rik van Riel @ 2017-01-25 17:44 UTC (permalink / raw)
  To: Srikar Dronamraju, Kirill A. Shutemov
  Cc: Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli,
	Hugh Dickins, linux-mm, linux-kernel, Oleg Nesterov,
	Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 1424 bytes --]

On Wed, 2017-01-25 at 08:55 -0800, Srikar Dronamraju wrote:
> > 
> > > 
> > > 
> > > > 
> > > > For THPs page_check_address() always fails. It's better to
> > > > split them
> > > > first before trying to replace.
> > > So what does this mean.  uprobes simply fails to work when trying
> > > to
> > > place a probe into a THP memory region?
> > Looks like we can end up with endless retry loop in
> > uprobe_write_opcode().
> > 
> > > 
> > > How come nobody noticed (and reported) this when using the
> > > feature?
> > I guess it's not often used for anon memory.
> > 
> The first time the breakpoint is hit on a page, it replaces the text
> page with anon page.  Now lets assume we insert breakpoints in all
> the
> pages in a range. Here each page is individually replaced by a non
> THP
> anonpage. (since we dont have bulk breakpoint insertion support,
> breakpoint insertion happens one at a time). Now the only interesting
> case may be when each of these replaced pages happen to be physically
> contiguous so that THP kicks in to replace all of these pages with
> one
> THP page. Can happen in practice?
> 
> Are there any other cases that I have missed?

A JIT compiler placing executable code in anonymous
memory before executing it, and a debugger trying to
insert a uprobe in one of those areas?

Not common, but I suppose it could be done.

-- 
All rights reversed

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
  2017-01-25 16:55         ` Srikar Dronamraju
@ 2017-01-25 17:44           ` Kirill A. Shutemov
  -1 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-25 17:44 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli,
	Hugh Dickins, Rik van Riel, linux-mm, linux-kernel,
	Oleg Nesterov, Peter Zijlstra

On Wed, Jan 25, 2017 at 08:55:22AM -0800, Srikar Dronamraju wrote:
> > > 
> > > > For THPs page_check_address() always fails. It's better to split them
> > > > first before trying to replace.
> > > 
> > > So what does this mean.  uprobes simply fails to work when trying to
> > > place a probe into a THP memory region?
> > 
> > Looks like we can end up with endless retry loop in uprobe_write_opcode().
> > 
> > > How come nobody noticed (and reported) this when using the feature?
> > 
> > I guess it's not often used for anon memory.
> > 
> 
> The first time the breakpoint is hit on a page, it replaces the text
> page with anon page.  Now lets assume we insert breakpoints in all the
> pages in a range. Here each page is individually replaced by a non THP
> anonpage. (since we dont have bulk breakpoint insertion support,
> breakpoint insertion happens one at a time). Now the only interesting
> case may be when each of these replaced pages happen to be physically
> contiguous so that THP kicks in to replace all of these pages with one
> THP page. Can happen in practice?

The problem is with the page you try to replace, not with page that you
replace it with.

> Are there any other cases that I have missed?

The binary on tmpfs with huge pages. I wrote test-case that triggers the
problem.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
@ 2017-01-25 17:44           ` Kirill A. Shutemov
  0 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-25 17:44 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli,
	Hugh Dickins, Rik van Riel, linux-mm, linux-kernel,
	Oleg Nesterov, Peter Zijlstra

On Wed, Jan 25, 2017 at 08:55:22AM -0800, Srikar Dronamraju wrote:
> > > 
> > > > For THPs page_check_address() always fails. It's better to split them
> > > > first before trying to replace.
> > > 
> > > So what does this mean.  uprobes simply fails to work when trying to
> > > place a probe into a THP memory region?
> > 
> > Looks like we can end up with endless retry loop in uprobe_write_opcode().
> > 
> > > How come nobody noticed (and reported) this when using the feature?
> > 
> > I guess it's not often used for anon memory.
> > 
> 
> The first time the breakpoint is hit on a page, it replaces the text
> page with anon page.  Now lets assume we insert breakpoints in all the
> pages in a range. Here each page is individually replaced by a non THP
> anonpage. (since we dont have bulk breakpoint insertion support,
> breakpoint insertion happens one at a time). Now the only interesting
> case may be when each of these replaced pages happen to be physically
> contiguous so that THP kicks in to replace all of these pages with one
> THP page. Can happen in practice?

The problem is with the page you try to replace, not with page that you
replace it with.

> Are there any other cases that I have missed?

The binary on tmpfs with huge pages. I wrote test-case that triggers the
problem.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 02/12] mm: introduce page_check_walk()
  2017-01-24 22:55         ` Andrew Morton
@ 2017-01-25 17:53           ` Kirill A. Shutemov
  -1 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-25 17:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	linux-mm, linux-kernel

On Tue, Jan 24, 2017 at 02:55:13PM -0800, Andrew Morton wrote:
> On Wed, 25 Jan 2017 01:50:30 +0300 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > > > + * @pcw->ptl is unlocked and @pcw->pte is unmapped.
> > > > + *
> > > > + * If you need to stop the walk before page_check_walk() returned false, use
> > > > + * page_check_walk_done(). It will do the housekeeping.
> > > > + */
> > > > +static inline bool page_check_walk(struct page_check_walk *pcw)
> > > > +{
> > > > +	/* The only possible pmd mapping has been handled on last iteration */
> > > > +	if (pcw->pmd && !pcw->pte) {
> > > > +		page_check_walk_done(pcw);
> > > > +		return false;
> > > > +	}
> > > > +
> > > > +	/* Only for THP, seek to next pte entry makes sense */
> > > > +	if (pcw->pte) {
> > > > +		if (!PageTransHuge(pcw->page) || PageHuge(pcw->page)) {
> > > > +			page_check_walk_done(pcw);
> > > > +			return false;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	return __page_check_walk(pcw);
> > > > +}
> > > 
> > > Was the decision to inline this a correct one?
> > 
> > Well, my logic was that in most cases we would have exactly one iteration.
> > The only case when we need more than one iteration is PTE-mapped THP which
> > is rare.
> > I hoped to avoid additional function call. Not sure if it worth it.
> > 
> > Should I move it inside the function?
> 
> I suggest building a kernel with it uninlined, take a look at the bloat
> factor then make a seat-of-the pants decision about "is it worth it". 
> With quite a few callsites the saving from uninlining may be
> significant.

add/remove: 1/2 grow/shrink: 8/0 up/down: 5089/-2954 (2135)
function                                     old     new   delta
__page_vma_mapped_walk                         -    2928   +2928
try_to_unmap_one                            2916    3218    +302
page_mkclean_one                             513     802    +289
__replace_page                              1439    1719    +280
page_referenced_one                          753    1030    +277
page_mapped_in_vma                           799    1059    +260
remove_migration_pte                        1129    1388    +259
page_idle_clear_pte_refs_one                 197     456    +259
write_protect_page                          1210    1445    +235
page_idle_clear_pte_refs_one.part             26       -     -26
page_vma_mapped_walk                        2928       -   -2928
Total: Before=37784555, After=37786690, chg +0.01%

I'll drop inlining. It would save ~2k.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 02/12] mm: introduce page_check_walk()
@ 2017-01-25 17:53           ` Kirill A. Shutemov
  0 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-25 17:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	linux-mm, linux-kernel

On Tue, Jan 24, 2017 at 02:55:13PM -0800, Andrew Morton wrote:
> On Wed, 25 Jan 2017 01:50:30 +0300 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > > > + * @pcw->ptl is unlocked and @pcw->pte is unmapped.
> > > > + *
> > > > + * If you need to stop the walk before page_check_walk() returned false, use
> > > > + * page_check_walk_done(). It will do the housekeeping.
> > > > + */
> > > > +static inline bool page_check_walk(struct page_check_walk *pcw)
> > > > +{
> > > > +	/* The only possible pmd mapping has been handled on last iteration */
> > > > +	if (pcw->pmd && !pcw->pte) {
> > > > +		page_check_walk_done(pcw);
> > > > +		return false;
> > > > +	}
> > > > +
> > > > +	/* Only for THP, seek to next pte entry makes sense */
> > > > +	if (pcw->pte) {
> > > > +		if (!PageTransHuge(pcw->page) || PageHuge(pcw->page)) {
> > > > +			page_check_walk_done(pcw);
> > > > +			return false;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	return __page_check_walk(pcw);
> > > > +}
> > > 
> > > Was the decision to inline this a correct one?
> > 
> > Well, my logic was that in most cases we would have exactly one iteration.
> > The only case when we need more than one iteration is PTE-mapped THP which
> > is rare.
> > I hoped to avoid additional function call. Not sure if it worth it.
> > 
> > Should I move it inside the function?
> 
> I suggest building a kernel with it uninlined, take a look at the bloat
> factor then make a seat-of-the pants decision about "is it worth it". 
> With quite a few callsites the saving from uninlining may be
> significant.

add/remove: 1/2 grow/shrink: 8/0 up/down: 5089/-2954 (2135)
function                                     old     new   delta
__page_vma_mapped_walk                         -    2928   +2928
try_to_unmap_one                            2916    3218    +302
page_mkclean_one                             513     802    +289
__replace_page                              1439    1719    +280
page_referenced_one                          753    1030    +277
page_mapped_in_vma                           799    1059    +260
remove_migration_pte                        1129    1388    +259
page_idle_clear_pte_refs_one                 197     456    +259
write_protect_page                          1210    1445    +235
page_idle_clear_pte_refs_one.part             26       -     -26
page_vma_mapped_walk                        2928       -   -2928
Total: Before=37784555, After=37786690, chg +0.01%

I'll drop inlining. It would save ~2k.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
  2017-01-24 16:28   ` Kirill A. Shutemov
@ 2017-01-25 18:22     ` Johannes Weiner
  -1 siblings, 0 replies; 71+ messages in thread
From: Johannes Weiner @ 2017-01-25 18:22 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton,
	linux-mm, linux-kernel, Oleg Nesterov, Peter Zijlstra

On Tue, Jan 24, 2017 at 07:28:13PM +0300, Kirill A. Shutemov wrote:
> For THPs page_check_address() always fails. It's better to split them
> first before trying to replace.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
@ 2017-01-25 18:22     ` Johannes Weiner
  0 siblings, 0 replies; 71+ messages in thread
From: Johannes Weiner @ 2017-01-25 18:22 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton,
	linux-mm, linux-kernel, Oleg Nesterov, Peter Zijlstra

On Tue, Jan 24, 2017 at 07:28:13PM +0300, Kirill A. Shutemov wrote:
> For THPs page_check_address() always fails. It's better to split them
> first before trying to replace.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
  2017-01-25 16:55         ` Srikar Dronamraju
@ 2017-01-25 18:35           ` Johannes Weiner
  -1 siblings, 0 replies; 71+ messages in thread
From: Johannes Weiner @ 2017-01-25 18:35 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Kirill A. Shutemov, Andrew Morton, Kirill A. Shutemov,
	Andrea Arcangeli, Hugh Dickins, Rik van Riel, linux-mm,
	linux-kernel, Oleg Nesterov, Peter Zijlstra

On Wed, Jan 25, 2017 at 08:55:22AM -0800, Srikar Dronamraju wrote:
> > > 
> > > > For THPs page_check_address() always fails. It's better to split them
> > > > first before trying to replace.
> > > 
> > > So what does this mean.  uprobes simply fails to work when trying to
> > > place a probe into a THP memory region?
> > 
> > Looks like we can end up with endless retry loop in uprobe_write_opcode().
> > 
> > > How come nobody noticed (and reported) this when using the feature?
> > 
> > I guess it's not often used for anon memory.
> > 
> 
> The first time the breakpoint is hit on a page, it replaces the text
> page with anon page.  Now lets assume we insert breakpoints in all the
> pages in a range. Here each page is individually replaced by a non THP
> anonpage. (since we dont have bulk breakpoint insertion support,
> breakpoint insertion happens one at a time). Now the only interesting
> case may be when each of these replaced pages happen to be physically
> contiguous so that THP kicks in to replace all of these pages with one
> THP page. Can happen in practice?
> 
> Are there any other cases that I have missed?

We use a hack in our applications where we open /proc/self/maps, copy
text segments to a staging area, then create overlay anon mappings on
top and copy the text back into them. Now we have THP-backed text and
very little iTLB pressure :-)

That said, we haven't run into the uprobes issue yet.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
@ 2017-01-25 18:35           ` Johannes Weiner
  0 siblings, 0 replies; 71+ messages in thread
From: Johannes Weiner @ 2017-01-25 18:35 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Kirill A. Shutemov, Andrew Morton, Kirill A. Shutemov,
	Andrea Arcangeli, Hugh Dickins, Rik van Riel, linux-mm,
	linux-kernel, Oleg Nesterov, Peter Zijlstra

On Wed, Jan 25, 2017 at 08:55:22AM -0800, Srikar Dronamraju wrote:
> > > 
> > > > For THPs page_check_address() always fails. It's better to split them
> > > > first before trying to replace.
> > > 
> > > So what does this mean.  uprobes simply fails to work when trying to
> > > place a probe into a THP memory region?
> > 
> > Looks like we can end up with endless retry loop in uprobe_write_opcode().
> > 
> > > How come nobody noticed (and reported) this when using the feature?
> > 
> > I guess it's not often used for anon memory.
> > 
> 
> The first time the breakpoint is hit on a page, it replaces the text
> page with anon page.  Now lets assume we insert breakpoints in all the
> pages in a range. Here each page is individually replaced by a non THP
> anonpage. (since we dont have bulk breakpoint insertion support,
> breakpoint insertion happens one at a time). Now the only interesting
> case may be when each of these replaced pages happen to be physically
> contiguous so that THP kicks in to replace all of these pages with one
> THP page. Can happen in practice?
> 
> Are there any other cases that I have missed?

We use a hack in our applications where we open /proc/self/maps, copy
text segments to a staging area, then create overlay anon mappings on
top and copy the text back into them. Now we have THP-backed text and
very little iTLB pressure :-)

That said, we haven't run into the uprobes issue yet.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
  2017-01-25 18:35           ` Johannes Weiner
@ 2017-01-25 18:38             ` Kirill A. Shutemov
  -1 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-25 18:38 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Srikar Dronamraju, Andrew Morton, Kirill A. Shutemov,
	Andrea Arcangeli, Hugh Dickins, Rik van Riel, linux-mm,
	linux-kernel, Oleg Nesterov, Peter Zijlstra

On Wed, Jan 25, 2017 at 01:35:10PM -0500, Johannes Weiner wrote:
> On Wed, Jan 25, 2017 at 08:55:22AM -0800, Srikar Dronamraju wrote:
> > > > 
> > > > > For THPs page_check_address() always fails. It's better to split them
> > > > > first before trying to replace.
> > > > 
> > > > So what does this mean.  uprobes simply fails to work when trying to
> > > > place a probe into a THP memory region?
> > > 
> > > Looks like we can end up with endless retry loop in uprobe_write_opcode().
> > > 
> > > > How come nobody noticed (and reported) this when using the feature?
> > > 
> > > I guess it's not often used for anon memory.
> > > 
> > 
> > The first time the breakpoint is hit on a page, it replaces the text
> > page with anon page.  Now lets assume we insert breakpoints in all the
> > pages in a range. Here each page is individually replaced by a non THP
> > anonpage. (since we dont have bulk breakpoint insertion support,
> > breakpoint insertion happens one at a time). Now the only interesting
> > case may be when each of these replaced pages happen to be physically
> > contiguous so that THP kicks in to replace all of these pages with one
> > THP page. Can happen in practice?
> > 
> > Are there any other cases that I have missed?
> 
> We use a hack in our applications where we open /proc/self/maps, copy
> text segments to a staging area, then create overlay anon mappings on
> top and copy the text back into them. Now we have THP-backed text and
> very little iTLB pressure :-)

Just use tmpfs with huge pages :)

> That said, we haven't run into the uprobes issue yet.

Is it possible to have uprobes in anon memory?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
@ 2017-01-25 18:38             ` Kirill A. Shutemov
  0 siblings, 0 replies; 71+ messages in thread
From: Kirill A. Shutemov @ 2017-01-25 18:38 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Srikar Dronamraju, Andrew Morton, Kirill A. Shutemov,
	Andrea Arcangeli, Hugh Dickins, Rik van Riel, linux-mm,
	linux-kernel, Oleg Nesterov, Peter Zijlstra

On Wed, Jan 25, 2017 at 01:35:10PM -0500, Johannes Weiner wrote:
> On Wed, Jan 25, 2017 at 08:55:22AM -0800, Srikar Dronamraju wrote:
> > > > 
> > > > > For THPs page_check_address() always fails. It's better to split them
> > > > > first before trying to replace.
> > > > 
> > > > So what does this mean.  uprobes simply fails to work when trying to
> > > > place a probe into a THP memory region?
> > > 
> > > Looks like we can end up with endless retry loop in uprobe_write_opcode().
> > > 
> > > > How come nobody noticed (and reported) this when using the feature?
> > > 
> > > I guess it's not often used for anon memory.
> > > 
> > 
> > The first time the breakpoint is hit on a page, it replaces the text
> > page with anon page.  Now lets assume we insert breakpoints in all the
> > pages in a range. Here each page is individually replaced by a non THP
> > anonpage. (since we dont have bulk breakpoint insertion support,
> > breakpoint insertion happens one at a time). Now the only interesting
> > case may be when each of these replaced pages happen to be physically
> > contiguous so that THP kicks in to replace all of these pages with one
> > THP page. Can happen in practice?
> > 
> > Are there any other cases that I have missed?
> 
> We use a hack in our applications where we open /proc/self/maps, copy
> text segments to a staging area, then create overlay anon mappings on
> top and copy the text back into them. Now we have THP-backed text and
> very little iTLB pressure :-)

Just use tmpfs with huge pages :)

> That said, we haven't run into the uprobes issue yet.

Is it possible to have uprobes in anon memory?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
  2017-01-25 18:35           ` Johannes Weiner
@ 2017-01-26  2:54             ` Srikar Dronamraju
  -1 siblings, 0 replies; 71+ messages in thread
From: Srikar Dronamraju @ 2017-01-26  2:54 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Johannes Weiner, Andrew Morton, Kirill A. Shutemov,
	Andrea Arcangeli, Hugh Dickins, Rik van Riel, linux-mm,
	linux-kernel, Oleg Nesterov, Peter Zijlstra

> > 
> > The first time the breakpoint is hit on a page, it replaces the text
> > page with anon page.  Now lets assume we insert breakpoints in all the
> > pages in a range. Here each page is individually replaced by a non THP
> > anonpage. (since we dont have bulk breakpoint insertion support,
> > breakpoint insertion happens one at a time). Now the only interesting
> > case may be when each of these replaced pages happen to be physically
> > contiguous so that THP kicks in to replace all of these pages with one
> > THP page. Can happen in practice?
> > 
> > Are there any other cases that I have missed?
> 
> We use a hack in our applications where we open /proc/self/maps, copy
> text segments to a staging area, then create overlay anon mappings on
> top and copy the text back into them. Now we have THP-backed text and
> very little iTLB pressure :-)
> 
> That said, we haven't run into the uprobes issue yet.
> 

Thanks Johannes, Kirill, Rik.


Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 01/12] uprobes: split THPs before trying replace them
@ 2017-01-26  2:54             ` Srikar Dronamraju
  0 siblings, 0 replies; 71+ messages in thread
From: Srikar Dronamraju @ 2017-01-26  2:54 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Johannes Weiner, Andrew Morton, Kirill A. Shutemov,
	Andrea Arcangeli, Hugh Dickins, Rik van Riel, linux-mm,
	linux-kernel, Oleg Nesterov, Peter Zijlstra

> > 
> > The first time the breakpoint is hit on a page, it replaces the text
> > page with anon page.  Now lets assume we insert breakpoints in all the
> > pages in a range. Here each page is individually replaced by a non THP
> > anonpage. (since we dont have bulk breakpoint insertion support,
> > breakpoint insertion happens one at a time). Now the only interesting
> > case may be when each of these replaced pages happen to be physically
> > contiguous so that THP kicks in to replace all of these pages with one
> > THP page. Can happen in practice?
> > 
> > Are there any other cases that I have missed?
> 
> We use a hack in our applications where we open /proc/self/maps, copy
> text segments to a staging area, then create overlay anon mappings on
> top and copy the text back into them. Now we have THP-backed text and
> very little iTLB pressure :-)
> 
> That said, we haven't run into the uprobes issue yet.
> 

Thanks Johannes, Kirill, Rik.


Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 09/12] mm, uprobes: convert __replace_page() to page_check_walk()
  2017-01-24 16:28   ` Kirill A. Shutemov
@ 2017-01-26  2:58     ` Srikar Dronamraju
  -1 siblings, 0 replies; 71+ messages in thread
From: Srikar Dronamraju @ 2017-01-26  2:58 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton,
	linux-mm, linux-kernel

* Kirill A. Shutemov <kirill.shutemov@linux.intel.com> [2017-01-24 19:28:21]:

> For consistency, it worth converting all page_check_address() to
> page_check_walk(), so we could drop the former.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  kernel/events/uprobes.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 1e65c79e52a6..6dbaa93b22fa 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -153,14 +153,19 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
>  				struct page *old_page, struct page *new_page)
>  {
>  	struct mm_struct *mm = vma->vm_mm;

I thought the subject is a bit misleading, it looks as if we are
replacing __replace_page. Can it be changed to
"Convert __replace_page() to use page_check_walk()" ?

Otherwise looks good to me.

Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 09/12] mm, uprobes: convert __replace_page() to page_check_walk()
@ 2017-01-26  2:58     ` Srikar Dronamraju
  0 siblings, 0 replies; 71+ messages in thread
From: Srikar Dronamraju @ 2017-01-26  2:58 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Hugh Dickins, Rik van Riel, Andrew Morton,
	linux-mm, linux-kernel

* Kirill A. Shutemov <kirill.shutemov@linux.intel.com> [2017-01-24 19:28:21]:

> For consistency, it worth converting all page_check_address() to
> page_check_walk(), so we could drop the former.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  kernel/events/uprobes.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 1e65c79e52a6..6dbaa93b22fa 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -153,14 +153,19 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
>  				struct page *old_page, struct page *new_page)
>  {
>  	struct mm_struct *mm = vma->vm_mm;

I thought the subject is a bit misleading, it looks as if we are
replacing __replace_page. Can it be changed to
"Convert __replace_page() to use page_check_walk()" ?

Otherwise looks good to me.

Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 71+ messages in thread

end of thread, other threads:[~2017-01-26  3:00 UTC | newest]

Thread overview: 71+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-24 16:28 [PATCH 00/12] Fix few rmap-related THP bugs Kirill A. Shutemov
2017-01-24 16:28 ` Kirill A. Shutemov
2017-01-24 16:28 ` [PATCH 01/12] uprobes: split THPs before trying replace them Kirill A. Shutemov
2017-01-24 16:28   ` Kirill A. Shutemov
2017-01-24 18:08   ` Rik van Riel
2017-01-24 18:08     ` Rik van Riel
2017-01-24 21:28   ` Andrew Morton
2017-01-24 21:28     ` Andrew Morton
2017-01-24 22:22     ` Kirill A. Shutemov
2017-01-24 22:22       ` Kirill A. Shutemov
2017-01-24 22:35       ` Andrew Morton
2017-01-24 22:35         ` Andrew Morton
2017-01-24 22:56         ` Kirill A. Shutemov
2017-01-24 22:56           ` Kirill A. Shutemov
2017-01-25 16:55       ` Srikar Dronamraju
2017-01-25 16:55         ` Srikar Dronamraju
2017-01-25 17:44         ` Rik van Riel
2017-01-25 17:44         ` Kirill A. Shutemov
2017-01-25 17:44           ` Kirill A. Shutemov
2017-01-25 18:35         ` Johannes Weiner
2017-01-25 18:35           ` Johannes Weiner
2017-01-25 18:38           ` Kirill A. Shutemov
2017-01-25 18:38             ` Kirill A. Shutemov
2017-01-26  2:54           ` Srikar Dronamraju
2017-01-26  2:54             ` Srikar Dronamraju
2017-01-25 18:22   ` Johannes Weiner
2017-01-25 18:22     ` Johannes Weiner
2017-01-24 16:28 ` [PATCH 02/12] mm: introduce page_check_walk() Kirill A. Shutemov
2017-01-24 16:28   ` Kirill A. Shutemov
2017-01-24 21:41   ` Andrew Morton
2017-01-24 21:41     ` Andrew Morton
2017-01-24 22:50     ` Kirill A. Shutemov
2017-01-24 22:50       ` Kirill A. Shutemov
2017-01-24 22:55       ` Andrew Morton
2017-01-24 22:55         ` Andrew Morton
2017-01-25 17:53         ` Kirill A. Shutemov
2017-01-25 17:53           ` Kirill A. Shutemov
2017-01-25  1:19   ` kbuild test robot
2017-01-25  1:19     ` kbuild test robot
2017-01-25  1:59   ` kbuild test robot
2017-01-25  1:59     ` kbuild test robot
2017-01-24 16:28 ` [PATCH 03/12] mm: fix handling PTE-mapped THPs in page_referenced() Kirill A. Shutemov
2017-01-24 16:28   ` Kirill A. Shutemov
2017-01-24 16:28 ` [PATCH 04/12] mm: fix handling PTE-mapped THPs in page_idle_clear_pte_refs() Kirill A. Shutemov
2017-01-24 16:28   ` Kirill A. Shutemov
2017-01-24 16:28 ` [PATCH 05/12] mm, rmap: check all VMAs that PTE-mapped THP can be part of Kirill A. Shutemov
2017-01-24 16:28   ` Kirill A. Shutemov
2017-01-24 16:28 ` [PATCH 06/12] mm: convert page_mkclean_one() to page_check_walk() Kirill A. Shutemov
2017-01-24 16:28   ` Kirill A. Shutemov
2017-01-25  1:44   ` kbuild test robot
2017-01-25  1:44     ` kbuild test robot
2017-01-25  2:00   ` kbuild test robot
2017-01-25  2:00     ` kbuild test robot
2017-01-24 16:28 ` [PATCH 07/12] mm: convert try_to_unmap_one() " Kirill A. Shutemov
2017-01-24 16:28   ` Kirill A. Shutemov
2017-01-25  3:13   ` kbuild test robot
2017-01-25  3:13     ` kbuild test robot
2017-01-24 16:28 ` [PATCH 08/12] mm, ksm: convert write_protect_page() " Kirill A. Shutemov
2017-01-24 16:28   ` Kirill A. Shutemov
2017-01-24 16:28 ` [PATCH 09/12] mm, uprobes: convert __replace_page() " Kirill A. Shutemov
2017-01-24 16:28   ` Kirill A. Shutemov
2017-01-26  2:58   ` Srikar Dronamraju
2017-01-26  2:58     ` Srikar Dronamraju
2017-01-24 16:28 ` [PATCH 10/12] mm: convert page_mapped_in_vma() " Kirill A. Shutemov
2017-01-24 16:28   ` Kirill A. Shutemov
2017-01-24 16:28 ` [PATCH 11/12] mm: drop page_check_address{,_transhuge} Kirill A. Shutemov
2017-01-24 16:28   ` Kirill A. Shutemov
2017-01-24 16:28 ` [PATCH 12/12] mm: convert remove_migration_pte() to page_check_walk() Kirill A. Shutemov
2017-01-24 16:28   ` Kirill A. Shutemov
2017-01-25  1:54   ` kbuild test robot
2017-01-25  1:54     ` kbuild test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.