[PATCH -mm v2 00/11] pagewalk: standardize current users, move pmd locking, apply to mincore

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH -mm v2 00/11] pagewalk: standardize current users, move pmd locking, apply to mincore
@ 2014-06-12 21:48 ` Naoya Horiguchi
  0 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

This is ver.2 of page table walker patchset.

I move forward on this cleanup work, and added some improvement from the
previous version. Major changes are:
 - removed walk->skip which becomes removable due to refactoring existing
   users
 - commonalized the argments of entry handlers (pte|pmd|hugetlb)_entry()
   which allows us to use the same function as multiple handlers.

This patchset is based on mmotm-2014-05-21-16-57.

Tree: git@github.com:Naoya-Horiguchi/linux.git
Branch: mmotm-2014-05-21-16-57/page_table_walker.v2

Thanks,
Naoya Horiguchi
---
Summary:

Naoya Horiguchi (11):
      pagewalk: remove pgd_entry() and pud_entry()
      madvise: cleanup swapin_walk_pmd_entry()
      memcg: separate mem_cgroup_move_charge_pte_range()
      pagewalk: move pmd_trans_huge_lock() from callbacks to common code
      pagewalk: remove mm_walk->skip
      pagewalk: add size to struct mm_walk
      pagewalk: change type of arg of callbacks
      pagewalk: update comment on walk_page_range()
      fs/proc/task_mmu.c: refactor smaps
      fs/proc/task_mmu.c: clean up gather_*_stats()
      mincore: apply page table walker on do_mincore()

 arch/openrisc/kernel/dma.c     |   6 +-
 arch/powerpc/mm/subpage-prot.c |   5 +-
 fs/proc/task_mmu.c             | 140 ++++++++---------------------
 include/linux/mm.h             |  21 ++---
 mm/huge_memory.c               |  20 -----
 mm/madvise.c                   |  55 +++++-------
 mm/memcontrol.c                | 170 +++++++++++++++++------------------
 mm/memory.c                    |   5 +-
 mm/mempolicy.c                 |  15 ++--
 mm/mincore.c                   | 195 ++++++++++++++---------------------------
 mm/pagewalk.c                  | 143 +++++++++++++-----------------
 11 files changed, 294 insertions(+), 481 deletions(-)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 00/11] pagewalk: standardize current users, move pmd locking, apply to mincore
@ 2014-06-12 21:48 ` Naoya Horiguchi
  0 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

This is ver.2 of page table walker patchset.

I move forward on this cleanup work, and added some improvement from the
previous version. Major changes are:
 - removed walk->skip which becomes removable due to refactoring existing
   users
 - commonalized the argments of entry handlers (pte|pmd|hugetlb)_entry()
   which allows us to use the same function as multiple handlers.

This patchset is based on mmotm-2014-05-21-16-57.

Tree: git@github.com:Naoya-Horiguchi/linux.git
Branch: mmotm-2014-05-21-16-57/page_table_walker.v2

Thanks,
Naoya Horiguchi
---
Summary:

Naoya Horiguchi (11):
      pagewalk: remove pgd_entry() and pud_entry()
      madvise: cleanup swapin_walk_pmd_entry()
      memcg: separate mem_cgroup_move_charge_pte_range()
      pagewalk: move pmd_trans_huge_lock() from callbacks to common code
      pagewalk: remove mm_walk->skip
      pagewalk: add size to struct mm_walk
      pagewalk: change type of arg of callbacks
      pagewalk: update comment on walk_page_range()
      fs/proc/task_mmu.c: refactor smaps
      fs/proc/task_mmu.c: clean up gather_*_stats()
      mincore: apply page table walker on do_mincore()

 arch/openrisc/kernel/dma.c     |   6 +-
 arch/powerpc/mm/subpage-prot.c |   5 +-
 fs/proc/task_mmu.c             | 140 ++++++++---------------------
 include/linux/mm.h             |  21 ++---
 mm/huge_memory.c               |  20 -----
 mm/madvise.c                   |  55 +++++-------
 mm/memcontrol.c                | 170 +++++++++++++++++------------------
 mm/memory.c                    |   5 +-
 mm/mempolicy.c                 |  15 ++--
 mm/mincore.c                   | 195 ++++++++++++++---------------------------
 mm/pagewalk.c                  | 143 +++++++++++++-----------------
 11 files changed, 294 insertions(+), 481 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 01/11] pagewalk: remove pgd_entry() and pud_entry()
  2014-06-12 21:48 ` Naoya Horiguchi
@ 2014-06-12 21:48   ` Naoya Horiguchi
  -1 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

Currently no user of page table walker sets ->pgd_entry() or ->pud_entry(),
so checking their existence in each loop is just wasting CPU cycle.
So let's remove it to reduce overhead.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/mm.h |  6 ------
 mm/pagewalk.c      | 18 +-----------------
 2 files changed, 1 insertion(+), 23 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/include/linux/mm.h mmotm-2014-05-21-16-57/include/linux/mm.h
index 563c79ea07bd..b4aa6579f2b1 100644
--- mmotm-2014-05-21-16-57.orig/include/linux/mm.h
+++ mmotm-2014-05-21-16-57/include/linux/mm.h
@@ -1092,8 +1092,6 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
 
 /**
  * mm_walk - callbacks for walk_page_range
- * @pgd_entry: if set, called for each non-empty PGD (top-level) entry
- * @pud_entry: if set, called for each non-empty PUD (2nd-level) entry
  * @pmd_entry: if set, called for each non-empty PMD (3rd-level) entry
  *	       this handler is required to be able to handle
  *	       pmd_trans_huge() pmds.  They may simply choose to
@@ -1115,10 +1113,6 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
  * (see the comment on walk_page_range() for more details)
  */
 struct mm_walk {
-	int (*pgd_entry)(pgd_t *pgd, unsigned long addr,
-			 unsigned long next, struct mm_walk *walk);
-	int (*pud_entry)(pud_t *pud, unsigned long addr,
-	                 unsigned long next, struct mm_walk *walk);
 	int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
 			 unsigned long next, struct mm_walk *walk);
 	int (*pte_entry)(pte_t *pte, unsigned long addr,
diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
index 2eda3dbe0b52..e734f63276c2 100644
--- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
+++ mmotm-2014-05-21-16-57/mm/pagewalk.c
@@ -115,14 +115,6 @@ static int walk_pud_range(pgd_t *pgd, unsigned long addr,
 			continue;
 		}
 
-		if (walk->pud_entry) {
-			err = walk->pud_entry(pud, addr, next, walk);
-			if (skip_lower_level_walking(walk))
-				continue;
-			if (err)
-				break;
-		}
-
 		if (walk->pmd_entry || walk->pte_entry) {
 			err = walk_pmd_range(pud, addr, next, walk);
 			if (err)
@@ -152,15 +144,7 @@ static int walk_pgd_range(unsigned long addr, unsigned long end,
 			continue;
 		}
 
-		if (walk->pgd_entry) {
-			err = walk->pgd_entry(pgd, addr, next, walk);
-			if (skip_lower_level_walking(walk))
-				continue;
-			if (err)
-				break;
-		}
-
-		if (walk->pud_entry || walk->pmd_entry || walk->pte_entry) {
+		if (walk->pmd_entry || walk->pte_entry) {
 			err = walk_pud_range(pgd, addr, next, walk);
 			if (err)
 				break;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 01/11] pagewalk: remove pgd_entry() and pud_entry()
@ 2014-06-12 21:48   ` Naoya Horiguchi
  0 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

Currently no user of page table walker sets ->pgd_entry() or ->pud_entry(),
so checking their existence in each loop is just wasting CPU cycle.
So let's remove it to reduce overhead.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/mm.h |  6 ------
 mm/pagewalk.c      | 18 +-----------------
 2 files changed, 1 insertion(+), 23 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/include/linux/mm.h mmotm-2014-05-21-16-57/include/linux/mm.h
index 563c79ea07bd..b4aa6579f2b1 100644
--- mmotm-2014-05-21-16-57.orig/include/linux/mm.h
+++ mmotm-2014-05-21-16-57/include/linux/mm.h
@@ -1092,8 +1092,6 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
 
 /**
  * mm_walk - callbacks for walk_page_range
- * @pgd_entry: if set, called for each non-empty PGD (top-level) entry
- * @pud_entry: if set, called for each non-empty PUD (2nd-level) entry
  * @pmd_entry: if set, called for each non-empty PMD (3rd-level) entry
  *	       this handler is required to be able to handle
  *	       pmd_trans_huge() pmds.  They may simply choose to
@@ -1115,10 +1113,6 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
  * (see the comment on walk_page_range() for more details)
  */
 struct mm_walk {
-	int (*pgd_entry)(pgd_t *pgd, unsigned long addr,
-			 unsigned long next, struct mm_walk *walk);
-	int (*pud_entry)(pud_t *pud, unsigned long addr,
-	                 unsigned long next, struct mm_walk *walk);
 	int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
 			 unsigned long next, struct mm_walk *walk);
 	int (*pte_entry)(pte_t *pte, unsigned long addr,
diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
index 2eda3dbe0b52..e734f63276c2 100644
--- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
+++ mmotm-2014-05-21-16-57/mm/pagewalk.c
@@ -115,14 +115,6 @@ static int walk_pud_range(pgd_t *pgd, unsigned long addr,
 			continue;
 		}
 
-		if (walk->pud_entry) {
-			err = walk->pud_entry(pud, addr, next, walk);
-			if (skip_lower_level_walking(walk))
-				continue;
-			if (err)
-				break;
-		}
-
 		if (walk->pmd_entry || walk->pte_entry) {
 			err = walk_pmd_range(pud, addr, next, walk);
 			if (err)
@@ -152,15 +144,7 @@ static int walk_pgd_range(unsigned long addr, unsigned long end,
 			continue;
 		}
 
-		if (walk->pgd_entry) {
-			err = walk->pgd_entry(pgd, addr, next, walk);
-			if (skip_lower_level_walking(walk))
-				continue;
-			if (err)
-				break;
-		}
-
-		if (walk->pud_entry || walk->pmd_entry || walk->pte_entry) {
+		if (walk->pmd_entry || walk->pte_entry) {
 			err = walk_pud_range(pgd, addr, next, walk);
 			if (err)
 				break;
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 02/11] madvise: cleanup swapin_walk_pmd_entry()
  2014-06-12 21:48 ` Naoya Horiguchi
@ 2014-06-12 21:48   ` Naoya Horiguchi
  -1 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

With the recent update on page table walker, we can use common code for
the walking more. Unlike many other users, this swapin_walk expects to
handle swap entries. As a result we should be careful about ptl locking.
Swapin operation, read_swap_cache_async(), could cause page reclaim, so
we can't keep holding ptl throughout this pte loop.
In order to properly handle ptl in pte_entry(), this patch adds two new
members on struct mm_walk.

This cleanup is necessary to get to the final form of page table walker,
where we should do all caller's specific work on leaf entries (IOW, all
pmd_entry() should be used for trans_pmd.)

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Hugh Dickins <hughd@google.com>
---
 include/linux/mm.h |  4 ++++
 mm/madvise.c       | 54 +++++++++++++++++++++++-------------------------------
 mm/pagewalk.c      | 11 +++++------
 3 files changed, 32 insertions(+), 37 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/include/linux/mm.h mmotm-2014-05-21-16-57/include/linux/mm.h
index b4aa6579f2b1..aa832161a1ff 100644
--- mmotm-2014-05-21-16-57.orig/include/linux/mm.h
+++ mmotm-2014-05-21-16-57/include/linux/mm.h
@@ -1108,6 +1108,8 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
  * @vma:       vma currently walked
  * @skip:      internal control flag which is set when we skip the lower
  *             level entries.
+ * @pmd:       current pmd entry
+ * @ptl:       page table lock associated with current entry
  * @private:   private data for callbacks' use
  *
  * (see the comment on walk_page_range() for more details)
@@ -1126,6 +1128,8 @@ struct mm_walk {
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
 	int skip;
+	pmd_t *pmd;
+	spinlock_t *ptl;
 	void *private;
 };
 
diff --git mmotm-2014-05-21-16-57.orig/mm/madvise.c mmotm-2014-05-21-16-57/mm/madvise.c
index a402f8fdc68e..06b390a6fbbd 100644
--- mmotm-2014-05-21-16-57.orig/mm/madvise.c
+++ mmotm-2014-05-21-16-57/mm/madvise.c
@@ -135,38 +135,31 @@ static long madvise_behavior(struct vm_area_struct *vma,
 }
 
 #ifdef CONFIG_SWAP
-static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
+/*
+ * Assuming that page table walker holds page table lock.
+ */
+static int swapin_walk_pte_entry(pte_t *pte, unsigned long start,
 	unsigned long end, struct mm_walk *walk)
 {
-	pte_t *orig_pte;
-	struct vm_area_struct *vma = walk->private;
-	unsigned long index;
-
-	if (pmd_none_or_trans_huge_or_clear_bad(pmd))
-		return 0;
-
-	for (index = start; index != end; index += PAGE_SIZE) {
-		pte_t pte;
-		swp_entry_t entry;
-		struct page *page;
-		spinlock_t *ptl;
-
-		orig_pte = pte_offset_map_lock(vma->vm_mm, pmd, start, &ptl);
-		pte = *(orig_pte + ((index - start) / PAGE_SIZE));
-		pte_unmap_unlock(orig_pte, ptl);
-
-		if (pte_present(pte) || pte_none(pte) || pte_file(pte))
-			continue;
-		entry = pte_to_swp_entry(pte);
-		if (unlikely(non_swap_entry(entry)))
-			continue;
-
-		page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE,
-								vma, index);
-		if (page)
-			page_cache_release(page);
-	}
+	pte_t ptent;
+	pte_t *orig_pte = pte - ((start & (PMD_SIZE - 1)) >> PAGE_SHIFT);
+	swp_entry_t entry;
+	struct page *page;
 
+	ptent = *pte;
+	pte_unmap_unlock(orig_pte, walk->ptl);
+	if (pte_present(ptent) || pte_none(ptent) || pte_file(ptent))
+		goto lock;
+	entry = pte_to_swp_entry(ptent);
+	if (unlikely(non_swap_entry(entry)))
+		goto lock;
+	page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE,
+				     walk->vma, start);
+	if (page)
+		page_cache_release(page);
+lock:
+	pte_offset_map(walk->pmd, start & PMD_MASK);
+	spin_lock(walk->ptl);
 	return 0;
 }
 
@@ -175,8 +168,7 @@ static void force_swapin_readahead(struct vm_area_struct *vma,
 {
 	struct mm_walk walk = {
 		.mm = vma->vm_mm,
-		.pmd_entry = swapin_walk_pmd_entry,
-		.private = vma,
+		.pte_entry = swapin_walk_pte_entry,
 	};
 
 	walk_page_range(start, end, &walk);
diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
index e734f63276c2..24311d6f5c20 100644
--- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
+++ mmotm-2014-05-21-16-57/mm/pagewalk.c
@@ -27,10 +27,10 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr,
 	struct mm_struct *mm = walk->mm;
 	pte_t *pte;
 	pte_t *orig_pte;
-	spinlock_t *ptl;
 	int err = 0;
 
-	orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
+	walk->pmd = pmd;
+	orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &walk->ptl);
 	do {
 		if (pte_none(*pte)) {
 			if (walk->pte_hole)
@@ -48,7 +48,7 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr,
 		if (err)
 		       break;
 	} while (pte++, addr += PAGE_SIZE, addr < end);
-	pte_unmap_unlock(orig_pte, ptl);
+	pte_unmap_unlock(orig_pte, walk->ptl);
 	cond_resched();
 	return addr == end ? 0 : err;
 }
@@ -172,7 +172,6 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
 	unsigned long hmask = huge_page_mask(h);
 	pte_t *pte;
 	int err = 0;
-	spinlock_t *ptl;
 
 	do {
 		next = hugetlb_entry_end(h, addr, end);
@@ -186,14 +185,14 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
 				break;
 			continue;
 		}
-		ptl = huge_pte_lock(h, mm, pte);
+		walk->ptl = huge_pte_lock(h, mm, pte);
 		/*
 		 * Callers should have their own way to handle swap entries
 		 * in walk->hugetlb_entry().
 		 */
 		if (walk->hugetlb_entry)
 			err = walk->hugetlb_entry(pte, addr, next, walk);
-		spin_unlock(ptl);
+		spin_unlock(walk->ptl);
 		if (err)
 			break;
 	} while (addr = next, addr != end);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 02/11] madvise: cleanup swapin_walk_pmd_entry()
@ 2014-06-12 21:48   ` Naoya Horiguchi
  0 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

With the recent update on page table walker, we can use common code for
the walking more. Unlike many other users, this swapin_walk expects to
handle swap entries. As a result we should be careful about ptl locking.
Swapin operation, read_swap_cache_async(), could cause page reclaim, so
we can't keep holding ptl throughout this pte loop.
In order to properly handle ptl in pte_entry(), this patch adds two new
members on struct mm_walk.

This cleanup is necessary to get to the final form of page table walker,
where we should do all caller's specific work on leaf entries (IOW, all
pmd_entry() should be used for trans_pmd.)

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Hugh Dickins <hughd@google.com>
---
 include/linux/mm.h |  4 ++++
 mm/madvise.c       | 54 +++++++++++++++++++++++-------------------------------
 mm/pagewalk.c      | 11 +++++------
 3 files changed, 32 insertions(+), 37 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/include/linux/mm.h mmotm-2014-05-21-16-57/include/linux/mm.h
index b4aa6579f2b1..aa832161a1ff 100644
--- mmotm-2014-05-21-16-57.orig/include/linux/mm.h
+++ mmotm-2014-05-21-16-57/include/linux/mm.h
@@ -1108,6 +1108,8 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
  * @vma:       vma currently walked
  * @skip:      internal control flag which is set when we skip the lower
  *             level entries.
+ * @pmd:       current pmd entry
+ * @ptl:       page table lock associated with current entry
  * @private:   private data for callbacks' use
  *
  * (see the comment on walk_page_range() for more details)
@@ -1126,6 +1128,8 @@ struct mm_walk {
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
 	int skip;
+	pmd_t *pmd;
+	spinlock_t *ptl;
 	void *private;
 };
 
diff --git mmotm-2014-05-21-16-57.orig/mm/madvise.c mmotm-2014-05-21-16-57/mm/madvise.c
index a402f8fdc68e..06b390a6fbbd 100644
--- mmotm-2014-05-21-16-57.orig/mm/madvise.c
+++ mmotm-2014-05-21-16-57/mm/madvise.c
@@ -135,38 +135,31 @@ static long madvise_behavior(struct vm_area_struct *vma,
 }
 
 #ifdef CONFIG_SWAP
-static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
+/*
+ * Assuming that page table walker holds page table lock.
+ */
+static int swapin_walk_pte_entry(pte_t *pte, unsigned long start,
 	unsigned long end, struct mm_walk *walk)
 {
-	pte_t *orig_pte;
-	struct vm_area_struct *vma = walk->private;
-	unsigned long index;
-
-	if (pmd_none_or_trans_huge_or_clear_bad(pmd))
-		return 0;
-
-	for (index = start; index != end; index += PAGE_SIZE) {
-		pte_t pte;
-		swp_entry_t entry;
-		struct page *page;
-		spinlock_t *ptl;
-
-		orig_pte = pte_offset_map_lock(vma->vm_mm, pmd, start, &ptl);
-		pte = *(orig_pte + ((index - start) / PAGE_SIZE));
-		pte_unmap_unlock(orig_pte, ptl);
-
-		if (pte_present(pte) || pte_none(pte) || pte_file(pte))
-			continue;
-		entry = pte_to_swp_entry(pte);
-		if (unlikely(non_swap_entry(entry)))
-			continue;
-
-		page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE,
-								vma, index);
-		if (page)
-			page_cache_release(page);
-	}
+	pte_t ptent;
+	pte_t *orig_pte = pte - ((start & (PMD_SIZE - 1)) >> PAGE_SHIFT);
+	swp_entry_t entry;
+	struct page *page;
 
+	ptent = *pte;
+	pte_unmap_unlock(orig_pte, walk->ptl);
+	if (pte_present(ptent) || pte_none(ptent) || pte_file(ptent))
+		goto lock;
+	entry = pte_to_swp_entry(ptent);
+	if (unlikely(non_swap_entry(entry)))
+		goto lock;
+	page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE,
+				     walk->vma, start);
+	if (page)
+		page_cache_release(page);
+lock:
+	pte_offset_map(walk->pmd, start & PMD_MASK);
+	spin_lock(walk->ptl);
 	return 0;
 }
 
@@ -175,8 +168,7 @@ static void force_swapin_readahead(struct vm_area_struct *vma,
 {
 	struct mm_walk walk = {
 		.mm = vma->vm_mm,
-		.pmd_entry = swapin_walk_pmd_entry,
-		.private = vma,
+		.pte_entry = swapin_walk_pte_entry,
 	};
 
 	walk_page_range(start, end, &walk);
diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
index e734f63276c2..24311d6f5c20 100644
--- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
+++ mmotm-2014-05-21-16-57/mm/pagewalk.c
@@ -27,10 +27,10 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr,
 	struct mm_struct *mm = walk->mm;
 	pte_t *pte;
 	pte_t *orig_pte;
-	spinlock_t *ptl;
 	int err = 0;
 
-	orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
+	walk->pmd = pmd;
+	orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &walk->ptl);
 	do {
 		if (pte_none(*pte)) {
 			if (walk->pte_hole)
@@ -48,7 +48,7 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr,
 		if (err)
 		       break;
 	} while (pte++, addr += PAGE_SIZE, addr < end);
-	pte_unmap_unlock(orig_pte, ptl);
+	pte_unmap_unlock(orig_pte, walk->ptl);
 	cond_resched();
 	return addr == end ? 0 : err;
 }
@@ -172,7 +172,6 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
 	unsigned long hmask = huge_page_mask(h);
 	pte_t *pte;
 	int err = 0;
-	spinlock_t *ptl;
 
 	do {
 		next = hugetlb_entry_end(h, addr, end);
@@ -186,14 +185,14 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
 				break;
 			continue;
 		}
-		ptl = huge_pte_lock(h, mm, pte);
+		walk->ptl = huge_pte_lock(h, mm, pte);
 		/*
 		 * Callers should have their own way to handle swap entries
 		 * in walk->hugetlb_entry().
 		 */
 		if (walk->hugetlb_entry)
 			err = walk->hugetlb_entry(pte, addr, next, walk);
-		spin_unlock(ptl);
+		spin_unlock(walk->ptl);
 		if (err)
 			break;
 	} while (addr = next, addr != end);
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 03/11] memcg: separate mem_cgroup_move_charge_pte_range()
  2014-06-12 21:48 ` Naoya Horiguchi
@ 2014-06-12 21:48   ` Naoya Horiguchi
  -1 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

mem_cgroup_move_charge_pte_range() handles both pte and pmd, which is not
standardized, so let's cleanup it. One tricky part is the retry, which is
performed when we detect !mc.precharge. In such case we retry the same entry,
so we don't have to go outside the pte loop. With rewriting this retry in
the pte loop, we can separate pmd_entry() and pte_entry(), which is what
we need.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/memcontrol.c | 127 +++++++++++++++++++++++++++++---------------------------
 1 file changed, 66 insertions(+), 61 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/mm/memcontrol.c mmotm-2014-05-21-16-57/mm/memcontrol.c
index 6970857ba0c8..01a66a208769 100644
--- mmotm-2014-05-21-16-57.orig/mm/memcontrol.c
+++ mmotm-2014-05-21-16-57/mm/memcontrol.c
@@ -6881,14 +6881,72 @@ static void mem_cgroup_cancel_attach(struct cgroup_subsys_state *css,
 	mem_cgroup_clear_mc();
 }
 
-static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
+static int mem_cgroup_move_charge_pte(pte_t *pte,
 				unsigned long addr, unsigned long end,
 				struct mm_walk *walk)
 {
 	int ret = 0;
 	struct vm_area_struct *vma = walk->vma;
-	pte_t *pte;
-	spinlock_t *ptl;
+	union mc_target target;
+	struct page *page;
+	struct page_cgroup *pc;
+	swp_entry_t ent;
+
+retry:
+	if (!mc.precharge) {
+		pte_t *orig_pte = pte - ((addr & (PMD_SIZE - 1)) >> PAGE_SHIFT);
+		pte_unmap_unlock(orig_pte, walk->ptl);
+		cond_resched();
+		/*
+		 * We have consumed all precharges we got in can_attach().
+		 * We try charge one by one, but don't do any additional
+		 * charges to mc.to if we have failed in charge once in attach()
+		 * phase.
+		 */
+		ret = mem_cgroup_do_precharge(1);
+		pte_offset_map(walk->pmd, addr & PMD_MASK);
+		spin_lock(walk->ptl);
+		if (!ret)
+			goto retry;
+		return ret;
+	}
+
+	switch (get_mctgt_type(vma, addr, *pte, &target)) {
+	case MC_TARGET_PAGE:
+		page = target.page;
+		if (isolate_lru_page(page))
+			goto put;
+		pc = lookup_page_cgroup(page);
+		if (!mem_cgroup_move_account(page, 1, pc,
+					     mc.from, mc.to)) {
+			mc.precharge--;
+			/* we uncharge from mc.from later. */
+			mc.moved_charge++;
+		}
+		putback_lru_page(page);
+put:		/* get_mctgt_type() gets the page */
+		put_page(page);
+		break;
+	case MC_TARGET_SWAP:
+		ent = target.ent;
+		if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) {
+			mc.precharge--;
+			/* we fixup refcnts and charges later. */
+			mc.moved_swap++;
+		}
+		break;
+	default:
+		break;
+	}
+
+	return 0;
+}
+
+static int mem_cgroup_move_charge_pmd(pmd_t *pmd,
+				unsigned long addr, unsigned long end,
+				struct mm_walk *walk)
+{
+	struct vm_area_struct *vma = walk->vma;
 	enum mc_target_type target_type;
 	union mc_target target;
 	struct page *page;
@@ -6924,71 +6982,18 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
 			put_page(page);
 		}
 		spin_unlock(ptl);
-		return 0;
-	}
-
-	if (pmd_trans_unstable(pmd))
-		return 0;
-retry:
-	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
-	for (; addr != end; addr += PAGE_SIZE) {
-		pte_t ptent = *(pte++);
-		swp_entry_t ent;
-
-		if (!mc.precharge)
-			break;
-
-		switch (get_mctgt_type(vma, addr, ptent, &target)) {
-		case MC_TARGET_PAGE:
-			page = target.page;
-			if (isolate_lru_page(page))
-				goto put;
-			pc = lookup_page_cgroup(page);
-			if (!mem_cgroup_move_account(page, 1, pc,
-						     mc.from, mc.to)) {
-				mc.precharge--;
-				/* we uncharge from mc.from later. */
-				mc.moved_charge++;
-			}
-			putback_lru_page(page);
-put:			/* get_mctgt_type() gets the page */
-			put_page(page);
-			break;
-		case MC_TARGET_SWAP:
-			ent = target.ent;
-			if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) {
-				mc.precharge--;
-				/* we fixup refcnts and charges later. */
-				mc.moved_swap++;
-			}
-			break;
-		default:
-			break;
-		}
-	}
-	pte_unmap_unlock(pte - 1, ptl);
-	cond_resched();
-
-	if (addr != end) {
-		/*
-		 * We have consumed all precharges we got in can_attach().
-		 * We try charge one by one, but don't do any additional
-		 * charges to mc.to if we have failed in charge once in attach()
-		 * phase.
-		 */
-		ret = mem_cgroup_do_precharge(1);
-		if (!ret)
-			goto retry;
+		/* don't call mem_cgroup_move_charge_pte() */
+		walk->skip = 1;
 	}
-
-	return ret;
+	return 0;
 }
 
 static void mem_cgroup_move_charge(struct mm_struct *mm)
 {
 	struct vm_area_struct *vma;
 	struct mm_walk mem_cgroup_move_charge_walk = {
-		.pmd_entry = mem_cgroup_move_charge_pte_range,
+		.pmd_entry = mem_cgroup_move_charge_pmd,
+		.pte_entry = mem_cgroup_move_charge_pte,
 		.mm = mm,
 	};
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 03/11] memcg: separate mem_cgroup_move_charge_pte_range()
@ 2014-06-12 21:48   ` Naoya Horiguchi
  0 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

mem_cgroup_move_charge_pte_range() handles both pte and pmd, which is not
standardized, so let's cleanup it. One tricky part is the retry, which is
performed when we detect !mc.precharge. In such case we retry the same entry,
so we don't have to go outside the pte loop. With rewriting this retry in
the pte loop, we can separate pmd_entry() and pte_entry(), which is what
we need.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/memcontrol.c | 127 +++++++++++++++++++++++++++++---------------------------
 1 file changed, 66 insertions(+), 61 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/mm/memcontrol.c mmotm-2014-05-21-16-57/mm/memcontrol.c
index 6970857ba0c8..01a66a208769 100644
--- mmotm-2014-05-21-16-57.orig/mm/memcontrol.c
+++ mmotm-2014-05-21-16-57/mm/memcontrol.c
@@ -6881,14 +6881,72 @@ static void mem_cgroup_cancel_attach(struct cgroup_subsys_state *css,
 	mem_cgroup_clear_mc();
 }
 
-static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
+static int mem_cgroup_move_charge_pte(pte_t *pte,
 				unsigned long addr, unsigned long end,
 				struct mm_walk *walk)
 {
 	int ret = 0;
 	struct vm_area_struct *vma = walk->vma;
-	pte_t *pte;
-	spinlock_t *ptl;
+	union mc_target target;
+	struct page *page;
+	struct page_cgroup *pc;
+	swp_entry_t ent;
+
+retry:
+	if (!mc.precharge) {
+		pte_t *orig_pte = pte - ((addr & (PMD_SIZE - 1)) >> PAGE_SHIFT);
+		pte_unmap_unlock(orig_pte, walk->ptl);
+		cond_resched();
+		/*
+		 * We have consumed all precharges we got in can_attach().
+		 * We try charge one by one, but don't do any additional
+		 * charges to mc.to if we have failed in charge once in attach()
+		 * phase.
+		 */
+		ret = mem_cgroup_do_precharge(1);
+		pte_offset_map(walk->pmd, addr & PMD_MASK);
+		spin_lock(walk->ptl);
+		if (!ret)
+			goto retry;
+		return ret;
+	}
+
+	switch (get_mctgt_type(vma, addr, *pte, &target)) {
+	case MC_TARGET_PAGE:
+		page = target.page;
+		if (isolate_lru_page(page))
+			goto put;
+		pc = lookup_page_cgroup(page);
+		if (!mem_cgroup_move_account(page, 1, pc,
+					     mc.from, mc.to)) {
+			mc.precharge--;
+			/* we uncharge from mc.from later. */
+			mc.moved_charge++;
+		}
+		putback_lru_page(page);
+put:		/* get_mctgt_type() gets the page */
+		put_page(page);
+		break;
+	case MC_TARGET_SWAP:
+		ent = target.ent;
+		if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) {
+			mc.precharge--;
+			/* we fixup refcnts and charges later. */
+			mc.moved_swap++;
+		}
+		break;
+	default:
+		break;
+	}
+
+	return 0;
+}
+
+static int mem_cgroup_move_charge_pmd(pmd_t *pmd,
+				unsigned long addr, unsigned long end,
+				struct mm_walk *walk)
+{
+	struct vm_area_struct *vma = walk->vma;
 	enum mc_target_type target_type;
 	union mc_target target;
 	struct page *page;
@@ -6924,71 +6982,18 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
 			put_page(page);
 		}
 		spin_unlock(ptl);
-		return 0;
-	}
-
-	if (pmd_trans_unstable(pmd))
-		return 0;
-retry:
-	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
-	for (; addr != end; addr += PAGE_SIZE) {
-		pte_t ptent = *(pte++);
-		swp_entry_t ent;
-
-		if (!mc.precharge)
-			break;
-
-		switch (get_mctgt_type(vma, addr, ptent, &target)) {
-		case MC_TARGET_PAGE:
-			page = target.page;
-			if (isolate_lru_page(page))
-				goto put;
-			pc = lookup_page_cgroup(page);
-			if (!mem_cgroup_move_account(page, 1, pc,
-						     mc.from, mc.to)) {
-				mc.precharge--;
-				/* we uncharge from mc.from later. */
-				mc.moved_charge++;
-			}
-			putback_lru_page(page);
-put:			/* get_mctgt_type() gets the page */
-			put_page(page);
-			break;
-		case MC_TARGET_SWAP:
-			ent = target.ent;
-			if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) {
-				mc.precharge--;
-				/* we fixup refcnts and charges later. */
-				mc.moved_swap++;
-			}
-			break;
-		default:
-			break;
-		}
-	}
-	pte_unmap_unlock(pte - 1, ptl);
-	cond_resched();
-
-	if (addr != end) {
-		/*
-		 * We have consumed all precharges we got in can_attach().
-		 * We try charge one by one, but don't do any additional
-		 * charges to mc.to if we have failed in charge once in attach()
-		 * phase.
-		 */
-		ret = mem_cgroup_do_precharge(1);
-		if (!ret)
-			goto retry;
+		/* don't call mem_cgroup_move_charge_pte() */
+		walk->skip = 1;
 	}
-
-	return ret;
+	return 0;
 }
 
 static void mem_cgroup_move_charge(struct mm_struct *mm)
 {
 	struct vm_area_struct *vma;
 	struct mm_walk mem_cgroup_move_charge_walk = {
-		.pmd_entry = mem_cgroup_move_charge_pte_range,
+		.pmd_entry = mem_cgroup_move_charge_pmd,
+		.pte_entry = mem_cgroup_move_charge_pte,
 		.mm = mm,
 	};
 
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 04/11] pagewalk: move pmd_trans_huge_lock() from callbacks to common code
  2014-06-12 21:48 ` Naoya Horiguchi
@ 2014-06-12 21:48   ` Naoya Horiguchi
  -1 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

Now all of current users of page table walker are canonicalized, i.e.
pmd_entry() handles only trans_pmd entry, and pte_entry() handles pte entry.
So we can factorize common code more.
This patch moves pmd_trans_huge_lock() in each pmd_entry() to pagewalk core.

ChangeLog v2:
- add null check walk->vma in walk_pmd_range()
- move comment update into a separate patch

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 arch/powerpc/mm/subpage-prot.c |  2 ++
 fs/proc/task_mmu.c             | 66 ++++++++++++++----------------------------
 mm/memcontrol.c                | 55 ++++++++++-------------------------
 mm/pagewalk.c                  | 18 ++++++++++--
 4 files changed, 55 insertions(+), 86 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/arch/powerpc/mm/subpage-prot.c mmotm-2014-05-21-16-57/arch/powerpc/mm/subpage-prot.c
index fa9fb5b4c66c..d0d94ac606f3 100644
--- mmotm-2014-05-21-16-57.orig/arch/powerpc/mm/subpage-prot.c
+++ mmotm-2014-05-21-16-57/arch/powerpc/mm/subpage-prot.c
@@ -135,7 +135,9 @@ static int subpage_walk_pmd_entry(pmd_t *pmd, unsigned long addr,
 				  unsigned long end, struct mm_walk *walk)
 {
 	struct vm_area_struct *vma = walk->vma;
+	spin_unlock(walk->ptl);
 	split_huge_page_pmd(vma, addr, pmd);
+	spin_lock(walk->ptl);
 	return 0;
 }
 
diff --git mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
index fa6d6a4e85b3..059206ea3c6b 100644
--- mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c
+++ mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
@@ -496,15 +496,8 @@ static int smaps_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
 			struct mm_walk *walk)
 {
 	struct mem_size_stats *mss = walk->private;
-	spinlock_t *ptl;
-
-	if (pmd_trans_huge_lock(pmd, walk->vma, &ptl) == 1) {
-		smaps_pte((pte_t *)pmd, addr, addr + HPAGE_PMD_SIZE, walk);
-		spin_unlock(ptl);
-		mss->anonymous_thp += HPAGE_PMD_SIZE;
-		/* don't call smaps_pte() */
-		walk->skip = 1;
-	}
+	smaps_pte((pte_t *)pmd, addr, addr + HPAGE_PMD_SIZE, walk);
+	mss->anonymous_thp += HPAGE_PMD_SIZE;
 	return 0;
 }
 
@@ -983,31 +976,21 @@ static int pagemap_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
 	struct vm_area_struct *vma = walk->vma;
 	struct pagemapread *pm = walk->private;
 	pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
-	spinlock_t *ptl;
-
-	if (!vma)
-		return err;
-	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
-		int pmd_flags2;
+	int pmd_flags2;
 
-		if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
-			pmd_flags2 = __PM_SOFT_DIRTY;
-		else
-			pmd_flags2 = 0;
+	if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
+		pmd_flags2 = __PM_SOFT_DIRTY;
+	else
+		pmd_flags2 = 0;
 
-		for (; addr != end; addr += PAGE_SIZE) {
-			unsigned long offset;
+	for (; addr != end; addr += PAGE_SIZE) {
+		unsigned long offset;
 
-			offset = (addr & ~PAGEMAP_WALK_MASK) >>
-					PAGE_SHIFT;
-			thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
-			err = add_to_pagemap(addr, &pme, pm);
-			if (err)
-				break;
-		}
-		spin_unlock(ptl);
-		/* don't call pagemap_pte() */
-		walk->skip = 1;
+		offset = (addr & ~PAGEMAP_WALK_MASK) >> PAGE_SHIFT;
+		thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
+		err = add_to_pagemap(addr, &pme, pm);
+		if (err)
+			break;
 	}
 	return err;
 }
@@ -1277,20 +1260,13 @@ static int gather_pmd_stats(pmd_t *pmd, unsigned long addr,
 {
 	struct numa_maps *md = walk->private;
 	struct vm_area_struct *vma = walk->vma;
-	spinlock_t *ptl;
-
-	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
-		pte_t huge_pte = *(pte_t *)pmd;
-		struct page *page;
-
-		page = can_gather_numa_stats(huge_pte, vma, addr);
-		if (page)
-			gather_stats(page, md, pte_dirty(huge_pte),
-				     HPAGE_PMD_SIZE/PAGE_SIZE);
-		spin_unlock(ptl);
-		/* don't call gather_pte_stats() */
-		walk->skip = 1;
-	}
+	pte_t huge_pte = *(pte_t *)pmd;
+	struct page *page;
+
+	page = can_gather_numa_stats(huge_pte, vma, addr);
+	if (page)
+		gather_stats(page, md, pte_dirty(huge_pte),
+			     HPAGE_PMD_SIZE/PAGE_SIZE);
 	return 0;
 }
 #ifdef CONFIG_HUGETLB_PAGE
diff --git mmotm-2014-05-21-16-57.orig/mm/memcontrol.c mmotm-2014-05-21-16-57/mm/memcontrol.c
index 01a66a208769..bb987cb9e043 100644
--- mmotm-2014-05-21-16-57.orig/mm/memcontrol.c
+++ mmotm-2014-05-21-16-57/mm/memcontrol.c
@@ -6723,15 +6723,9 @@ static int mem_cgroup_count_precharge_pmd(pmd_t *pmd,
 					struct mm_walk *walk)
 {
 	struct vm_area_struct *vma = walk->vma;
-	spinlock_t *ptl;
-
-	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
-		if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
-			mc.precharge += HPAGE_PMD_NR;
-		spin_unlock(ptl);
-		/* don't call mem_cgroup_count_precharge_pte() */
-		walk->skip = 1;
-	}
+
+	if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
+		mc.precharge += HPAGE_PMD_NR;
 	return 0;
 }
 
@@ -6952,38 +6946,21 @@ static int mem_cgroup_move_charge_pmd(pmd_t *pmd,
 	struct page *page;
 	struct page_cgroup *pc;
 
-	/*
-	 * We don't take compound_lock() here but no race with splitting thp
-	 * happens because:
-	 *  - if pmd_trans_huge_lock() returns 1, the relevant thp is not
-	 *    under splitting, which means there's no concurrent thp split,
-	 *  - if another thread runs into split_huge_page() just after we
-	 *    entered this if-block, the thread must wait for page table lock
-	 *    to be unlocked in __split_huge_page_splitting(), where the main
-	 *    part of thp split is not executed yet.
-	 */
-	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
-		if (mc.precharge < HPAGE_PMD_NR) {
-			spin_unlock(ptl);
-			return 0;
-		}
-		target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
-		if (target_type == MC_TARGET_PAGE) {
-			page = target.page;
-			if (!isolate_lru_page(page)) {
-				pc = lookup_page_cgroup(page);
-				if (!mem_cgroup_move_account(page, HPAGE_PMD_NR,
-							pc, mc.from, mc.to)) {
-					mc.precharge -= HPAGE_PMD_NR;
-					mc.moved_charge += HPAGE_PMD_NR;
-				}
-				putback_lru_page(page);
+	if (mc.precharge < HPAGE_PMD_NR)
+		return 0;
+	target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
+	if (target_type == MC_TARGET_PAGE) {
+		page = target.page;
+		if (!isolate_lru_page(page)) {
+			pc = lookup_page_cgroup(page);
+			if (!mem_cgroup_move_account(page, HPAGE_PMD_NR,
+						     pc, mc.from, mc.to)) {
+				mc.precharge -= HPAGE_PMD_NR;
+				mc.moved_charge += HPAGE_PMD_NR;
 			}
-			put_page(page);
+			putback_lru_page(page);
 		}
-		spin_unlock(ptl);
-		/* don't call mem_cgroup_move_charge_pte() */
-		walk->skip = 1;
+		put_page(page);
 	}
 	return 0;
 }
diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
index 24311d6f5c20..f1a3417d0b51 100644
--- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
+++ mmotm-2014-05-21-16-57/mm/pagewalk.c
@@ -73,8 +73,22 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr,
 			continue;
 		}
 
-		if (walk->pmd_entry) {
-			err = walk->pmd_entry(pmd, addr, next, walk);
+		/*
+		 * We don't take compound_lock() here but no race with splitting
+		 * thp happens because:
+		 *  - if pmd_trans_huge_lock() returns 1, the relevant thp is
+		 *    not under splitting, which means there's no concurrent
+		 *    thp split,
+		 *  - if another thread runs into split_huge_page() just after
+		 *    we entered this if-block, the thread must wait for page
+		 *    table lock to be unlocked in __split_huge_page_splitting(),
+		 *    where the main part of thp split is not executed yet.
+		 */
+		if (walk->pmd_entry && walk->vma) {
+			if (pmd_trans_huge_lock(pmd, walk->vma, &walk->ptl) == 1) {
+				err = walk->pmd_entry(pmd, addr, next, walk);
+				spin_unlock(walk->ptl);
+			}
 			if (skip_lower_level_walking(walk))
 				continue;
 			if (err)
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 04/11] pagewalk: move pmd_trans_huge_lock() from callbacks to common code
@ 2014-06-12 21:48   ` Naoya Horiguchi
  0 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

Now all of current users of page table walker are canonicalized, i.e.
pmd_entry() handles only trans_pmd entry, and pte_entry() handles pte entry.
So we can factorize common code more.
This patch moves pmd_trans_huge_lock() in each pmd_entry() to pagewalk core.

ChangeLog v2:
- add null check walk->vma in walk_pmd_range()
- move comment update into a separate patch

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 arch/powerpc/mm/subpage-prot.c |  2 ++
 fs/proc/task_mmu.c             | 66 ++++++++++++++----------------------------
 mm/memcontrol.c                | 55 ++++++++++-------------------------
 mm/pagewalk.c                  | 18 ++++++++++--
 4 files changed, 55 insertions(+), 86 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/arch/powerpc/mm/subpage-prot.c mmotm-2014-05-21-16-57/arch/powerpc/mm/subpage-prot.c
index fa9fb5b4c66c..d0d94ac606f3 100644
--- mmotm-2014-05-21-16-57.orig/arch/powerpc/mm/subpage-prot.c
+++ mmotm-2014-05-21-16-57/arch/powerpc/mm/subpage-prot.c
@@ -135,7 +135,9 @@ static int subpage_walk_pmd_entry(pmd_t *pmd, unsigned long addr,
 				  unsigned long end, struct mm_walk *walk)
 {
 	struct vm_area_struct *vma = walk->vma;
+	spin_unlock(walk->ptl);
 	split_huge_page_pmd(vma, addr, pmd);
+	spin_lock(walk->ptl);
 	return 0;
 }
 
diff --git mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
index fa6d6a4e85b3..059206ea3c6b 100644
--- mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c
+++ mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
@@ -496,15 +496,8 @@ static int smaps_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
 			struct mm_walk *walk)
 {
 	struct mem_size_stats *mss = walk->private;
-	spinlock_t *ptl;
-
-	if (pmd_trans_huge_lock(pmd, walk->vma, &ptl) == 1) {
-		smaps_pte((pte_t *)pmd, addr, addr + HPAGE_PMD_SIZE, walk);
-		spin_unlock(ptl);
-		mss->anonymous_thp += HPAGE_PMD_SIZE;
-		/* don't call smaps_pte() */
-		walk->skip = 1;
-	}
+	smaps_pte((pte_t *)pmd, addr, addr + HPAGE_PMD_SIZE, walk);
+	mss->anonymous_thp += HPAGE_PMD_SIZE;
 	return 0;
 }
 
@@ -983,31 +976,21 @@ static int pagemap_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
 	struct vm_area_struct *vma = walk->vma;
 	struct pagemapread *pm = walk->private;
 	pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
-	spinlock_t *ptl;
-
-	if (!vma)
-		return err;
-	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
-		int pmd_flags2;
+	int pmd_flags2;
 
-		if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
-			pmd_flags2 = __PM_SOFT_DIRTY;
-		else
-			pmd_flags2 = 0;
+	if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
+		pmd_flags2 = __PM_SOFT_DIRTY;
+	else
+		pmd_flags2 = 0;
 
-		for (; addr != end; addr += PAGE_SIZE) {
-			unsigned long offset;
+	for (; addr != end; addr += PAGE_SIZE) {
+		unsigned long offset;
 
-			offset = (addr & ~PAGEMAP_WALK_MASK) >>
-					PAGE_SHIFT;
-			thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
-			err = add_to_pagemap(addr, &pme, pm);
-			if (err)
-				break;
-		}
-		spin_unlock(ptl);
-		/* don't call pagemap_pte() */
-		walk->skip = 1;
+		offset = (addr & ~PAGEMAP_WALK_MASK) >> PAGE_SHIFT;
+		thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
+		err = add_to_pagemap(addr, &pme, pm);
+		if (err)
+			break;
 	}
 	return err;
 }
@@ -1277,20 +1260,13 @@ static int gather_pmd_stats(pmd_t *pmd, unsigned long addr,
 {
 	struct numa_maps *md = walk->private;
 	struct vm_area_struct *vma = walk->vma;
-	spinlock_t *ptl;
-
-	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
-		pte_t huge_pte = *(pte_t *)pmd;
-		struct page *page;
-
-		page = can_gather_numa_stats(huge_pte, vma, addr);
-		if (page)
-			gather_stats(page, md, pte_dirty(huge_pte),
-				     HPAGE_PMD_SIZE/PAGE_SIZE);
-		spin_unlock(ptl);
-		/* don't call gather_pte_stats() */
-		walk->skip = 1;
-	}
+	pte_t huge_pte = *(pte_t *)pmd;
+	struct page *page;
+
+	page = can_gather_numa_stats(huge_pte, vma, addr);
+	if (page)
+		gather_stats(page, md, pte_dirty(huge_pte),
+			     HPAGE_PMD_SIZE/PAGE_SIZE);
 	return 0;
 }
 #ifdef CONFIG_HUGETLB_PAGE
diff --git mmotm-2014-05-21-16-57.orig/mm/memcontrol.c mmotm-2014-05-21-16-57/mm/memcontrol.c
index 01a66a208769..bb987cb9e043 100644
--- mmotm-2014-05-21-16-57.orig/mm/memcontrol.c
+++ mmotm-2014-05-21-16-57/mm/memcontrol.c
@@ -6723,15 +6723,9 @@ static int mem_cgroup_count_precharge_pmd(pmd_t *pmd,
 					struct mm_walk *walk)
 {
 	struct vm_area_struct *vma = walk->vma;
-	spinlock_t *ptl;
-
-	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
-		if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
-			mc.precharge += HPAGE_PMD_NR;
-		spin_unlock(ptl);
-		/* don't call mem_cgroup_count_precharge_pte() */
-		walk->skip = 1;
-	}
+
+	if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
+		mc.precharge += HPAGE_PMD_NR;
 	return 0;
 }
 
@@ -6952,38 +6946,21 @@ static int mem_cgroup_move_charge_pmd(pmd_t *pmd,
 	struct page *page;
 	struct page_cgroup *pc;
 
-	/*
-	 * We don't take compound_lock() here but no race with splitting thp
-	 * happens because:
-	 *  - if pmd_trans_huge_lock() returns 1, the relevant thp is not
-	 *    under splitting, which means there's no concurrent thp split,
-	 *  - if another thread runs into split_huge_page() just after we
-	 *    entered this if-block, the thread must wait for page table lock
-	 *    to be unlocked in __split_huge_page_splitting(), where the main
-	 *    part of thp split is not executed yet.
-	 */
-	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
-		if (mc.precharge < HPAGE_PMD_NR) {
-			spin_unlock(ptl);
-			return 0;
-		}
-		target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
-		if (target_type == MC_TARGET_PAGE) {
-			page = target.page;
-			if (!isolate_lru_page(page)) {
-				pc = lookup_page_cgroup(page);
-				if (!mem_cgroup_move_account(page, HPAGE_PMD_NR,
-							pc, mc.from, mc.to)) {
-					mc.precharge -= HPAGE_PMD_NR;
-					mc.moved_charge += HPAGE_PMD_NR;
-				}
-				putback_lru_page(page);
+	if (mc.precharge < HPAGE_PMD_NR)
+		return 0;
+	target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
+	if (target_type == MC_TARGET_PAGE) {
+		page = target.page;
+		if (!isolate_lru_page(page)) {
+			pc = lookup_page_cgroup(page);
+			if (!mem_cgroup_move_account(page, HPAGE_PMD_NR,
+						     pc, mc.from, mc.to)) {
+				mc.precharge -= HPAGE_PMD_NR;
+				mc.moved_charge += HPAGE_PMD_NR;
 			}
-			put_page(page);
+			putback_lru_page(page);
 		}
-		spin_unlock(ptl);
-		/* don't call mem_cgroup_move_charge_pte() */
-		walk->skip = 1;
+		put_page(page);
 	}
 	return 0;
 }
diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
index 24311d6f5c20..f1a3417d0b51 100644
--- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
+++ mmotm-2014-05-21-16-57/mm/pagewalk.c
@@ -73,8 +73,22 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr,
 			continue;
 		}
 
-		if (walk->pmd_entry) {
-			err = walk->pmd_entry(pmd, addr, next, walk);
+		/*
+		 * We don't take compound_lock() here but no race with splitting
+		 * thp happens because:
+		 *  - if pmd_trans_huge_lock() returns 1, the relevant thp is
+		 *    not under splitting, which means there's no concurrent
+		 *    thp split,
+		 *  - if another thread runs into split_huge_page() just after
+		 *    we entered this if-block, the thread must wait for page
+		 *    table lock to be unlocked in __split_huge_page_splitting(),
+		 *    where the main part of thp split is not executed yet.
+		 */
+		if (walk->pmd_entry && walk->vma) {
+			if (pmd_trans_huge_lock(pmd, walk->vma, &walk->ptl) == 1) {
+				err = walk->pmd_entry(pmd, addr, next, walk);
+				spin_unlock(walk->ptl);
+			}
 			if (skip_lower_level_walking(walk))
 				continue;
 			if (err)
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 05/11] pagewalk: remove mm_walk->skip
  2014-06-12 21:48 ` Naoya Horiguchi
@ 2014-06-12 21:48   ` Naoya Horiguchi
  -1 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

Due to the relocation of pmd locking, mm_walk->skip becomes less important
because only walk_page_test() and walk->test_walk() use it. None of these
functions uses a positive value as a return value, so we can define it to
determine whether we skip the current vma or not.
Thus this patch removes mm_walk->skip.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 fs/proc/task_mmu.c |  4 ++--
 include/linux/mm.h |  3 ---
 mm/mempolicy.c     |  9 ++++-----
 mm/pagewalk.c      | 36 ++++++++----------------------------
 4 files changed, 14 insertions(+), 38 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
index 059206ea3c6b..8211f6c8236d 100644
--- mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c
+++ mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
@@ -755,9 +755,9 @@ static int clear_refs_test_walk(unsigned long start, unsigned long end,
 	 * Writing 4 to /proc/pid/clear_refs affects all pages.
 	 */
 	if (cp->type == CLEAR_REFS_ANON && vma->vm_file)
-		walk->skip = 1;
+		return 1;
 	if (cp->type == CLEAR_REFS_MAPPED && !vma->vm_file)
-		walk->skip = 1;
+		return 1;
 	if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
 		if (vma->vm_flags & VM_SOFTDIRTY)
 			vma->vm_flags &= ~VM_SOFTDIRTY;
diff --git mmotm-2014-05-21-16-57.orig/include/linux/mm.h mmotm-2014-05-21-16-57/include/linux/mm.h
index aa832161a1ff..0a20674c84e2 100644
--- mmotm-2014-05-21-16-57.orig/include/linux/mm.h
+++ mmotm-2014-05-21-16-57/include/linux/mm.h
@@ -1106,8 +1106,6 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
  *             right now." 0 means "skip the current vma."
  * @mm:        mm_struct representing the target process of page table walk
  * @vma:       vma currently walked
- * @skip:      internal control flag which is set when we skip the lower
- *             level entries.
  * @pmd:       current pmd entry
  * @ptl:       page table lock associated with current entry
  * @private:   private data for callbacks' use
@@ -1127,7 +1125,6 @@ struct mm_walk {
 			struct mm_walk *walk);
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
-	int skip;
 	pmd_t *pmd;
 	spinlock_t *ptl;
 	void *private;
diff --git mmotm-2014-05-21-16-57.orig/mm/mempolicy.c mmotm-2014-05-21-16-57/mm/mempolicy.c
index cf3b995b21d0..b8267f753748 100644
--- mmotm-2014-05-21-16-57.orig/mm/mempolicy.c
+++ mmotm-2014-05-21-16-57/mm/mempolicy.c
@@ -596,22 +596,21 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
 	}
 
 	qp->prev = vma;
-	walk->skip = 1;
 
 	if (vma->vm_flags & VM_PFNMAP)
-		return 0;
+		return 1;
 
 	if (flags & MPOL_MF_LAZY) {
 		change_prot_numa(vma, start, endvma);
-		return 0;
+		return 1;
 	}
 
 	if ((flags & MPOL_MF_STRICT) ||
 	    ((flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) &&
 	     vma_migratable(vma)))
 		/* queue pages from current vma */
-		walk->skip = 0;
-	return 0;
+		return 0;
+	return 1;
 }
 
 /*
diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
index f1a3417d0b51..61d6bd9545d6 100644
--- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
+++ mmotm-2014-05-21-16-57/mm/pagewalk.c
@@ -3,24 +3,6 @@
 #include <linux/sched.h>
 #include <linux/hugetlb.h>
 
-/*
- * Check the current skip status of page table walker.
- *
- * Here what I mean by skip is to skip lower level walking, and that was
- * determined for each entry independently. For example, when walk_pmd_range
- * handles a pmd_trans_huge we don't have to walk over ptes under that pmd,
- * and the skipping does not affect the walking over ptes under other pmds.
- * That's why we reset @walk->skip after tested.
- */
-static bool skip_lower_level_walking(struct mm_walk *walk)
-{
-	if (walk->skip) {
-		walk->skip = 0;
-		return true;
-	}
-	return false;
-}
-
 static int walk_pte_range(pmd_t *pmd, unsigned long addr,
 				unsigned long end, struct mm_walk *walk)
 {
@@ -89,8 +71,6 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr,
 				err = walk->pmd_entry(pmd, addr, next, walk);
 				spin_unlock(walk->ptl);
 			}
-			if (skip_lower_level_walking(walk))
-				continue;
 			if (err)
 				break;
 		}
@@ -225,9 +205,9 @@ static inline int walk_hugetlb_range(unsigned long addr, unsigned long end,
 
 /*
  * Decide whether we really walk over the current vma on [@start, @end)
- * or skip it. When we skip it, we set @walk->skip to 1.
- * The return value is used to control the page table walking to
- * continue (for zero) or not (for non-zero).
+ * or skip it via the returned value. Return 0 if we do walk over the
+ * current vma, and return 1 if we skip the vma. Negative values means
+ * error, where we abort the current walk.
  *
  * Default check (only VM_PFNMAP check for now) is used when the caller
  * doesn't define test_walk() callback.
@@ -245,7 +225,7 @@ static int walk_page_test(unsigned long start, unsigned long end,
 	 * page backing a VM_PFNMAP range. See also commit a9ff785e4437.
 	 */
 	if (vma->vm_flags & VM_PFNMAP)
-		walk->skip = 1;
+		return 1;
 	return 0;
 }
 
@@ -330,9 +310,9 @@ int walk_page_range(unsigned long start, unsigned long end,
 			next = min(end, vma->vm_end);
 
 			err = walk_page_test(start, next, walk);
-			if (skip_lower_level_walking(walk))
+			if (err == 1)
 				continue;
-			if (err)
+			if (err < 0)
 				break;
 		}
 		err = __walk_page_range(start, next, walk);
@@ -353,9 +333,9 @@ int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk)
 	VM_BUG_ON(!vma);
 	walk->vma = vma;
 	err = walk_page_test(vma->vm_start, vma->vm_end, walk);
-	if (skip_lower_level_walking(walk))
+	if (err == 1)
 		return 0;
-	if (err)
+	if (err < 0)
 		return err;
 	return __walk_page_range(vma->vm_start, vma->vm_end, walk);
 }
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 05/11] pagewalk: remove mm_walk->skip
@ 2014-06-12 21:48   ` Naoya Horiguchi
  0 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

Due to the relocation of pmd locking, mm_walk->skip becomes less important
because only walk_page_test() and walk->test_walk() use it. None of these
functions uses a positive value as a return value, so we can define it to
determine whether we skip the current vma or not.
Thus this patch removes mm_walk->skip.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 fs/proc/task_mmu.c |  4 ++--
 include/linux/mm.h |  3 ---
 mm/mempolicy.c     |  9 ++++-----
 mm/pagewalk.c      | 36 ++++++++----------------------------
 4 files changed, 14 insertions(+), 38 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
index 059206ea3c6b..8211f6c8236d 100644
--- mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c
+++ mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
@@ -755,9 +755,9 @@ static int clear_refs_test_walk(unsigned long start, unsigned long end,
 	 * Writing 4 to /proc/pid/clear_refs affects all pages.
 	 */
 	if (cp->type == CLEAR_REFS_ANON && vma->vm_file)
-		walk->skip = 1;
+		return 1;
 	if (cp->type == CLEAR_REFS_MAPPED && !vma->vm_file)
-		walk->skip = 1;
+		return 1;
 	if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
 		if (vma->vm_flags & VM_SOFTDIRTY)
 			vma->vm_flags &= ~VM_SOFTDIRTY;
diff --git mmotm-2014-05-21-16-57.orig/include/linux/mm.h mmotm-2014-05-21-16-57/include/linux/mm.h
index aa832161a1ff..0a20674c84e2 100644
--- mmotm-2014-05-21-16-57.orig/include/linux/mm.h
+++ mmotm-2014-05-21-16-57/include/linux/mm.h
@@ -1106,8 +1106,6 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
  *             right now." 0 means "skip the current vma."
  * @mm:        mm_struct representing the target process of page table walk
  * @vma:       vma currently walked
- * @skip:      internal control flag which is set when we skip the lower
- *             level entries.
  * @pmd:       current pmd entry
  * @ptl:       page table lock associated with current entry
  * @private:   private data for callbacks' use
@@ -1127,7 +1125,6 @@ struct mm_walk {
 			struct mm_walk *walk);
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
-	int skip;
 	pmd_t *pmd;
 	spinlock_t *ptl;
 	void *private;
diff --git mmotm-2014-05-21-16-57.orig/mm/mempolicy.c mmotm-2014-05-21-16-57/mm/mempolicy.c
index cf3b995b21d0..b8267f753748 100644
--- mmotm-2014-05-21-16-57.orig/mm/mempolicy.c
+++ mmotm-2014-05-21-16-57/mm/mempolicy.c
@@ -596,22 +596,21 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
 	}
 
 	qp->prev = vma;
-	walk->skip = 1;
 
 	if (vma->vm_flags & VM_PFNMAP)
-		return 0;
+		return 1;
 
 	if (flags & MPOL_MF_LAZY) {
 		change_prot_numa(vma, start, endvma);
-		return 0;
+		return 1;
 	}
 
 	if ((flags & MPOL_MF_STRICT) ||
 	    ((flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) &&
 	     vma_migratable(vma)))
 		/* queue pages from current vma */
-		walk->skip = 0;
-	return 0;
+		return 0;
+	return 1;
 }
 
 /*
diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
index f1a3417d0b51..61d6bd9545d6 100644
--- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
+++ mmotm-2014-05-21-16-57/mm/pagewalk.c
@@ -3,24 +3,6 @@
 #include <linux/sched.h>
 #include <linux/hugetlb.h>
 
-/*
- * Check the current skip status of page table walker.
- *
- * Here what I mean by skip is to skip lower level walking, and that was
- * determined for each entry independently. For example, when walk_pmd_range
- * handles a pmd_trans_huge we don't have to walk over ptes under that pmd,
- * and the skipping does not affect the walking over ptes under other pmds.
- * That's why we reset @walk->skip after tested.
- */
-static bool skip_lower_level_walking(struct mm_walk *walk)
-{
-	if (walk->skip) {
-		walk->skip = 0;
-		return true;
-	}
-	return false;
-}
-
 static int walk_pte_range(pmd_t *pmd, unsigned long addr,
 				unsigned long end, struct mm_walk *walk)
 {
@@ -89,8 +71,6 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr,
 				err = walk->pmd_entry(pmd, addr, next, walk);
 				spin_unlock(walk->ptl);
 			}
-			if (skip_lower_level_walking(walk))
-				continue;
 			if (err)
 				break;
 		}
@@ -225,9 +205,9 @@ static inline int walk_hugetlb_range(unsigned long addr, unsigned long end,
 
 /*
  * Decide whether we really walk over the current vma on [@start, @end)
- * or skip it. When we skip it, we set @walk->skip to 1.
- * The return value is used to control the page table walking to
- * continue (for zero) or not (for non-zero).
+ * or skip it via the returned value. Return 0 if we do walk over the
+ * current vma, and return 1 if we skip the vma. Negative values means
+ * error, where we abort the current walk.
  *
  * Default check (only VM_PFNMAP check for now) is used when the caller
  * doesn't define test_walk() callback.
@@ -245,7 +225,7 @@ static int walk_page_test(unsigned long start, unsigned long end,
 	 * page backing a VM_PFNMAP range. See also commit a9ff785e4437.
 	 */
 	if (vma->vm_flags & VM_PFNMAP)
-		walk->skip = 1;
+		return 1;
 	return 0;
 }
 
@@ -330,9 +310,9 @@ int walk_page_range(unsigned long start, unsigned long end,
 			next = min(end, vma->vm_end);
 
 			err = walk_page_test(start, next, walk);
-			if (skip_lower_level_walking(walk))
+			if (err == 1)
 				continue;
-			if (err)
+			if (err < 0)
 				break;
 		}
 		err = __walk_page_range(start, next, walk);
@@ -353,9 +333,9 @@ int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk)
 	VM_BUG_ON(!vma);
 	walk->vma = vma;
 	err = walk_page_test(vma->vm_start, vma->vm_end, walk);
-	if (skip_lower_level_walking(walk))
+	if (err == 1)
 		return 0;
-	if (err)
+	if (err < 0)
 		return err;
 	return __walk_page_range(vma->vm_start, vma->vm_end, walk);
 }
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 06/11] pagewalk: add size to struct mm_walk
  2014-06-12 21:48 ` Naoya Horiguchi
@ 2014-06-12 21:48   ` Naoya Horiguchi
  -1 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

This variable is helpful if we try to share the callback function between
multiple slots (for example between pte_entry() and pmd_entry()) as done
in later patches.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/mm.h | 2 ++
 mm/pagewalk.c      | 5 +++++
 2 files changed, 7 insertions(+)

diff --git mmotm-2014-05-21-16-57.orig/include/linux/mm.h mmotm-2014-05-21-16-57/include/linux/mm.h
index 0a20674c84e2..cbe17d9cbd7f 100644
--- mmotm-2014-05-21-16-57.orig/include/linux/mm.h
+++ mmotm-2014-05-21-16-57/include/linux/mm.h
@@ -1108,6 +1108,7 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
  * @vma:       vma currently walked
  * @pmd:       current pmd entry
  * @ptl:       page table lock associated with current entry
+ * @size:      size of current entry
  * @private:   private data for callbacks' use
  *
  * (see the comment on walk_page_range() for more details)
@@ -1127,6 +1128,7 @@ struct mm_walk {
 	struct vm_area_struct *vma;
 	pmd_t *pmd;
 	spinlock_t *ptl;
+	long size;
 	void *private;
 };
 
diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
index 61d6bd9545d6..b46c8882c643 100644
--- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
+++ mmotm-2014-05-21-16-57/mm/pagewalk.c
@@ -11,6 +11,7 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr,
 	pte_t *orig_pte;
 	int err = 0;
 
+	walk->size = PAGE_SIZE;
 	walk->pmd = pmd;
 	orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &walk->ptl);
 	do {
@@ -42,6 +43,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr,
 	unsigned long next;
 	int err = 0;
 
+	walk->size = PMD_SIZE;
 	pmd = pmd_offset(pud, addr);
 	do {
 again:
@@ -97,6 +99,7 @@ static int walk_pud_range(pgd_t *pgd, unsigned long addr,
 	unsigned long next;
 	int err = 0;
 
+	walk->size = PUD_SIZE;
 	pud = pud_offset(pgd, addr);
 	do {
 		next = pud_addr_end(addr, end);
@@ -126,6 +129,7 @@ static int walk_pgd_range(unsigned long addr, unsigned long end,
 	unsigned long next;
 	int err = 0;
 
+	walk->size = PGDIR_SIZE;
 	pgd = pgd_offset(walk->mm, addr);
 	do {
 		next = pgd_addr_end(addr, end);
@@ -167,6 +171,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
 	pte_t *pte;
 	int err = 0;
 
+	walk->size = huge_page_size(h);
 	do {
 		next = hugetlb_entry_end(h, addr, end);
 		pte = huge_pte_offset(walk->mm, addr & hmask);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 06/11] pagewalk: add size to struct mm_walk
@ 2014-06-12 21:48   ` Naoya Horiguchi
  0 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

This variable is helpful if we try to share the callback function between
multiple slots (for example between pte_entry() and pmd_entry()) as done
in later patches.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/mm.h | 2 ++
 mm/pagewalk.c      | 5 +++++
 2 files changed, 7 insertions(+)

diff --git mmotm-2014-05-21-16-57.orig/include/linux/mm.h mmotm-2014-05-21-16-57/include/linux/mm.h
index 0a20674c84e2..cbe17d9cbd7f 100644
--- mmotm-2014-05-21-16-57.orig/include/linux/mm.h
+++ mmotm-2014-05-21-16-57/include/linux/mm.h
@@ -1108,6 +1108,7 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
  * @vma:       vma currently walked
  * @pmd:       current pmd entry
  * @ptl:       page table lock associated with current entry
+ * @size:      size of current entry
  * @private:   private data for callbacks' use
  *
  * (see the comment on walk_page_range() for more details)
@@ -1127,6 +1128,7 @@ struct mm_walk {
 	struct vm_area_struct *vma;
 	pmd_t *pmd;
 	spinlock_t *ptl;
+	long size;
 	void *private;
 };
 
diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
index 61d6bd9545d6..b46c8882c643 100644
--- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
+++ mmotm-2014-05-21-16-57/mm/pagewalk.c
@@ -11,6 +11,7 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr,
 	pte_t *orig_pte;
 	int err = 0;
 
+	walk->size = PAGE_SIZE;
 	walk->pmd = pmd;
 	orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &walk->ptl);
 	do {
@@ -42,6 +43,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr,
 	unsigned long next;
 	int err = 0;
 
+	walk->size = PMD_SIZE;
 	pmd = pmd_offset(pud, addr);
 	do {
 again:
@@ -97,6 +99,7 @@ static int walk_pud_range(pgd_t *pgd, unsigned long addr,
 	unsigned long next;
 	int err = 0;
 
+	walk->size = PUD_SIZE;
 	pud = pud_offset(pgd, addr);
 	do {
 		next = pud_addr_end(addr, end);
@@ -126,6 +129,7 @@ static int walk_pgd_range(unsigned long addr, unsigned long end,
 	unsigned long next;
 	int err = 0;
 
+	walk->size = PGDIR_SIZE;
 	pgd = pgd_offset(walk->mm, addr);
 	do {
 		next = pgd_addr_end(addr, end);
@@ -167,6 +171,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
 	pte_t *pte;
 	int err = 0;
 
+	walk->size = huge_page_size(h);
 	do {
 		next = hugetlb_entry_end(h, addr, end);
 		pte = huge_pte_offset(walk->mm, addr & hmask);
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 07/11] pagewalk: change type of arg of callbacks
  2014-06-12 21:48 ` Naoya Horiguchi
@ 2014-06-12 21:48   ` Naoya Horiguchi
  -1 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

Page table walker focuses on the leaf entries, and in some situation the
caller is interested only in the size of entry (not in the details of pages
pointed to by the entry.) Then it's helpful to share callback functions
between different levels. For this purpose this patch changes args in callback
functions and let them get the pointer of the entry in type of (void *).

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 arch/openrisc/kernel/dma.c     |  6 ++++--
 arch/powerpc/mm/subpage-prot.c |  3 ++-
 fs/proc/task_mmu.c             | 31 +++++++++++++++++++------------
 include/linux/mm.h             |  6 +++---
 mm/madvise.c                   |  3 ++-
 mm/memcontrol.c                | 12 ++++++++----
 mm/mempolicy.c                 |  6 ++++--
 7 files changed, 42 insertions(+), 25 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/arch/openrisc/kernel/dma.c mmotm-2014-05-21-16-57/arch/openrisc/kernel/dma.c
index 0b77ddb1ee07..a2983e8f6f04 100644
--- mmotm-2014-05-21-16-57.orig/arch/openrisc/kernel/dma.c
+++ mmotm-2014-05-21-16-57/arch/openrisc/kernel/dma.c
@@ -29,9 +29,10 @@
 #include <asm/tlbflush.h>
 
 static int
-page_set_nocache(pte_t *pte, unsigned long addr,
+page_set_nocache(void *entry, unsigned long addr,
 		 unsigned long next, struct mm_walk *walk)
 {
+	pte_t *pte = entry;
 	unsigned long cl;
 
 	pte_val(*pte) |= _PAGE_CI;
@@ -50,9 +51,10 @@ page_set_nocache(pte_t *pte, unsigned long addr,
 }
 
 static int
-page_clear_nocache(pte_t *pte, unsigned long addr,
+page_clear_nocache(void *entry, unsigned long addr,
 		   unsigned long next, struct mm_walk *walk)
 {
+	pte_t *pte = entry;
 	pte_val(*pte) &= ~_PAGE_CI;
 
 	/*
diff --git mmotm-2014-05-21-16-57.orig/arch/powerpc/mm/subpage-prot.c mmotm-2014-05-21-16-57/arch/powerpc/mm/subpage-prot.c
index d0d94ac606f3..d62e9adc93fb 100644
--- mmotm-2014-05-21-16-57.orig/arch/powerpc/mm/subpage-prot.c
+++ mmotm-2014-05-21-16-57/arch/powerpc/mm/subpage-prot.c
@@ -131,9 +131,10 @@ static void subpage_prot_clear(unsigned long addr, unsigned long len)
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static int subpage_walk_pmd_entry(pmd_t *pmd, unsigned long addr,
+static int subpage_walk_pmd_entry(void *entry, unsigned long addr,
 				  unsigned long end, struct mm_walk *walk)
 {
+	pmd_t *pmd = entry;
 	struct vm_area_struct *vma = walk->vma;
 	spin_unlock(walk->ptl);
 	split_huge_page_pmd(vma, addr, pmd);
diff --git mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
index 8211f6c8236d..a750d0842875 100644
--- mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c
+++ mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
@@ -437,9 +437,10 @@ struct mem_size_stats {
 	u64 pss;
 };
 
-static int smaps_pte(pte_t *pte, unsigned long addr, unsigned long end,
+static int smaps_pte(void *entry, unsigned long addr, unsigned long end,
 			struct mm_walk *walk)
 {
+	pte_t *pte = entry;
 	struct mem_size_stats *mss = walk->private;
 	struct vm_area_struct *vma = walk->vma;
 	pgoff_t pgoff = linear_page_index(vma, addr);
@@ -492,11 +493,11 @@ static int smaps_pte(pte_t *pte, unsigned long addr, unsigned long end,
 	return 0;
 }
 
-static int smaps_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
+static int smaps_pmd(void *entry, unsigned long addr, unsigned long end,
 			struct mm_walk *walk)
 {
 	struct mem_size_stats *mss = walk->private;
-	smaps_pte((pte_t *)pmd, addr, addr + HPAGE_PMD_SIZE, walk);
+	smaps_pte(entry, addr, end, walk);
 	mss->anonymous_thp += HPAGE_PMD_SIZE;
 	return 0;
 }
@@ -720,9 +721,10 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma,
 #endif
 }
 
-static int clear_refs_pte(pte_t *pte, unsigned long addr,
+static int clear_refs_pte(void *entry, unsigned long addr,
 				unsigned long end, struct mm_walk *walk)
 {
+	pte_t *pte = entry;
 	struct clear_refs_private *cp = walk->private;
 	struct vm_area_struct *vma = walk->vma;
 	struct page *page;
@@ -954,9 +956,10 @@ static inline void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemap
 }
 #endif
 
-static int pagemap_pte(pte_t *pte, unsigned long addr, unsigned long end,
+static int pagemap_pte(void *entry, unsigned long addr, unsigned long end,
 			     struct mm_walk *walk)
 {
+	pte_t *pte = entry;
 	struct vm_area_struct *vma = walk->vma;
 	struct pagemapread *pm = walk->private;
 	pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
@@ -969,10 +972,11 @@ static int pagemap_pte(pte_t *pte, unsigned long addr, unsigned long end,
 	return add_to_pagemap(addr, &pme, pm);
 }
 
-static int pagemap_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
+static int pagemap_pmd(void *entry, unsigned long addr, unsigned long end,
 			     struct mm_walk *walk)
 {
 	int err = 0;
+	pmd_t *pmd = entry;
 	struct vm_area_struct *vma = walk->vma;
 	struct pagemapread *pm = walk->private;
 	pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
@@ -1009,9 +1013,10 @@ static void huge_pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *
 }
 
 /* This function walks within one hugetlb entry in the single call */
-static int pagemap_hugetlb(pte_t *pte, unsigned long addr, unsigned long end,
+static int pagemap_hugetlb(void *entry, unsigned long addr, unsigned long end,
 			   struct mm_walk *walk)
 {
+	pte_t *pte = entry;
 	struct pagemapread *pm = walk->private;
 	struct vm_area_struct *vma = walk->vma;
 	int err = 0;
@@ -1243,9 +1248,10 @@ static struct page *can_gather_numa_stats(pte_t pte, struct vm_area_struct *vma,
 	return page;
 }
 
-static int gather_pte_stats(pte_t *pte, unsigned long addr,
+static int gather_pte_stats(void *entry, unsigned long addr,
 		unsigned long end, struct mm_walk *walk)
 {
+	pte_t *pte = entry;
 	struct numa_maps *md = walk->private;
 
 	struct page *page = can_gather_numa_stats(*pte, walk->vma, addr);
@@ -1255,12 +1261,12 @@ static int gather_pte_stats(pte_t *pte, unsigned long addr,
 	return 0;
 }
 
-static int gather_pmd_stats(pmd_t *pmd, unsigned long addr,
+static int gather_pmd_stats(void *entry, unsigned long addr,
 		unsigned long end, struct mm_walk *walk)
 {
 	struct numa_maps *md = walk->private;
 	struct vm_area_struct *vma = walk->vma;
-	pte_t huge_pte = *(pte_t *)pmd;
+	pte_t huge_pte = *(pte_t *)entry;
 	struct page *page;
 
 	page = can_gather_numa_stats(huge_pte, vma, addr);
@@ -1270,9 +1276,10 @@ static int gather_pmd_stats(pmd_t *pmd, unsigned long addr,
 	return 0;
 }
 #ifdef CONFIG_HUGETLB_PAGE
-static int gather_hugetlb_stats(pte_t *pte, unsigned long addr,
+static int gather_hugetlb_stats(void *entry, unsigned long addr,
 				unsigned long end, struct mm_walk *walk)
 {
+	pte_t *pte = entry;
 	struct numa_maps *md;
 	struct page *page;
 
@@ -1292,7 +1299,7 @@ static int gather_hugetlb_stats(pte_t *pte, unsigned long addr,
 }
 
 #else
-static int gather_hugetlb_stats(pte_t *pte, unsigned long addr,
+static int gather_hugetlb_stats(void *entry, unsigned long addr,
 				unsigned long end, struct mm_walk *walk)
 {
 	return 0;
diff --git mmotm-2014-05-21-16-57.orig/include/linux/mm.h mmotm-2014-05-21-16-57/include/linux/mm.h
index cbe17d9cbd7f..08c2a128dd5c 100644
--- mmotm-2014-05-21-16-57.orig/include/linux/mm.h
+++ mmotm-2014-05-21-16-57/include/linux/mm.h
@@ -1114,13 +1114,13 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
  * (see the comment on walk_page_range() for more details)
  */
 struct mm_walk {
-	int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
+	int (*pmd_entry)(void *entry, unsigned long addr,
 			 unsigned long next, struct mm_walk *walk);
-	int (*pte_entry)(pte_t *pte, unsigned long addr,
+	int (*pte_entry)(void *entry, unsigned long addr,
 			 unsigned long next, struct mm_walk *walk);
 	int (*pte_hole)(unsigned long addr, unsigned long next,
 			struct mm_walk *walk);
-	int (*hugetlb_entry)(pte_t *pte, unsigned long addr,
+	int (*hugetlb_entry)(void *entry, unsigned long addr,
 			unsigned long next, struct mm_walk *walk);
 	int (*test_walk)(unsigned long addr, unsigned long next,
 			struct mm_walk *walk);
diff --git mmotm-2014-05-21-16-57.orig/mm/madvise.c mmotm-2014-05-21-16-57/mm/madvise.c
index 06b390a6fbbd..df664fcbd443 100644
--- mmotm-2014-05-21-16-57.orig/mm/madvise.c
+++ mmotm-2014-05-21-16-57/mm/madvise.c
@@ -138,9 +138,10 @@ static long madvise_behavior(struct vm_area_struct *vma,
 /*
  * Assuming that page table walker holds page table lock.
  */
-static int swapin_walk_pte_entry(pte_t *pte, unsigned long start,
+static int swapin_walk_pte_entry(void *ent, unsigned long start,
 	unsigned long end, struct mm_walk *walk)
 {
+	pte_t *pte = ent;
 	pte_t ptent;
 	pte_t *orig_pte = pte - ((start & (PMD_SIZE - 1)) >> PAGE_SHIFT);
 	swp_entry_t entry;
diff --git mmotm-2014-05-21-16-57.orig/mm/memcontrol.c mmotm-2014-05-21-16-57/mm/memcontrol.c
index bb987cb9e043..7d62b6778a5b 100644
--- mmotm-2014-05-21-16-57.orig/mm/memcontrol.c
+++ mmotm-2014-05-21-16-57/mm/memcontrol.c
@@ -6709,19 +6709,21 @@ static inline enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma,
 }
 #endif
 
-static int mem_cgroup_count_precharge_pte(pte_t *pte,
+static int mem_cgroup_count_precharge_pte(void *entry,
 					unsigned long addr, unsigned long end,
 					struct mm_walk *walk)
 {
+	pte_t *pte = entry;
 	if (get_mctgt_type(walk->vma, addr, *pte, NULL))
 		mc.precharge++;	/* increment precharge temporarily */
 	return 0;
 }
 
-static int mem_cgroup_count_precharge_pmd(pmd_t *pmd,
+static int mem_cgroup_count_precharge_pmd(void *entry,
 					unsigned long addr, unsigned long end,
 					struct mm_walk *walk)
 {
+	pmd_t *pmd = entry;
 	struct vm_area_struct *vma = walk->vma;
 
 	if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
@@ -6875,11 +6877,12 @@ static void mem_cgroup_cancel_attach(struct cgroup_subsys_state *css,
 	mem_cgroup_clear_mc();
 }
 
-static int mem_cgroup_move_charge_pte(pte_t *pte,
+static int mem_cgroup_move_charge_pte(void *entry,
 				unsigned long addr, unsigned long end,
 				struct mm_walk *walk)
 {
 	int ret = 0;
+	pte_t *pte = entry;
 	struct vm_area_struct *vma = walk->vma;
 	union mc_target target;
 	struct page *page;
@@ -6936,10 +6939,11 @@ put:		/* get_mctgt_type() gets the page */
 	return 0;
 }
 
-static int mem_cgroup_move_charge_pmd(pmd_t *pmd,
+static int mem_cgroup_move_charge_pmd(void *entry,
 				unsigned long addr, unsigned long end,
 				struct mm_walk *walk)
 {
+	pmd_t *pmd = entry;
 	struct vm_area_struct *vma = walk->vma;
 	enum mc_target_type target_type;
 	union mc_target target;
diff --git mmotm-2014-05-21-16-57.orig/mm/mempolicy.c mmotm-2014-05-21-16-57/mm/mempolicy.c
index b8267f753748..f74cacb36b95 100644
--- mmotm-2014-05-21-16-57.orig/mm/mempolicy.c
+++ mmotm-2014-05-21-16-57/mm/mempolicy.c
@@ -490,9 +490,10 @@ struct queue_pages {
  * Scan through pages checking if pages follow certain conditions,
  * and move them to the pagelist if they do.
  */
-static int queue_pages_pte(pte_t *pte, unsigned long addr,
+static int queue_pages_pte(void *entry, unsigned long addr,
 			unsigned long next, struct mm_walk *walk)
 {
+	pte_t *pte = entry;
 	struct vm_area_struct *vma = walk->vma;
 	struct page *page;
 	struct queue_pages *qp = walk->private;
@@ -519,10 +520,11 @@ static int queue_pages_pte(pte_t *pte, unsigned long addr,
 	return 0;
 }
 
-static int queue_pages_hugetlb(pte_t *pte, unsigned long addr,
+static int queue_pages_hugetlb(void *ent, unsigned long addr,
 				unsigned long next, struct mm_walk *walk)
 {
 #ifdef CONFIG_HUGETLB_PAGE
+	pte_t *pte = ent;
 	struct queue_pages *qp = walk->private;
 	unsigned long flags = qp->flags;
 	int nid;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 07/11] pagewalk: change type of arg of callbacks
@ 2014-06-12 21:48   ` Naoya Horiguchi
  0 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

Page table walker focuses on the leaf entries, and in some situation the
caller is interested only in the size of entry (not in the details of pages
pointed to by the entry.) Then it's helpful to share callback functions
between different levels. For this purpose this patch changes args in callback
functions and let them get the pointer of the entry in type of (void *).

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 arch/openrisc/kernel/dma.c     |  6 ++++--
 arch/powerpc/mm/subpage-prot.c |  3 ++-
 fs/proc/task_mmu.c             | 31 +++++++++++++++++++------------
 include/linux/mm.h             |  6 +++---
 mm/madvise.c                   |  3 ++-
 mm/memcontrol.c                | 12 ++++++++----
 mm/mempolicy.c                 |  6 ++++--
 7 files changed, 42 insertions(+), 25 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/arch/openrisc/kernel/dma.c mmotm-2014-05-21-16-57/arch/openrisc/kernel/dma.c
index 0b77ddb1ee07..a2983e8f6f04 100644
--- mmotm-2014-05-21-16-57.orig/arch/openrisc/kernel/dma.c
+++ mmotm-2014-05-21-16-57/arch/openrisc/kernel/dma.c
@@ -29,9 +29,10 @@
 #include <asm/tlbflush.h>
 
 static int
-page_set_nocache(pte_t *pte, unsigned long addr,
+page_set_nocache(void *entry, unsigned long addr,
 		 unsigned long next, struct mm_walk *walk)
 {
+	pte_t *pte = entry;
 	unsigned long cl;
 
 	pte_val(*pte) |= _PAGE_CI;
@@ -50,9 +51,10 @@ page_set_nocache(pte_t *pte, unsigned long addr,
 }
 
 static int
-page_clear_nocache(pte_t *pte, unsigned long addr,
+page_clear_nocache(void *entry, unsigned long addr,
 		   unsigned long next, struct mm_walk *walk)
 {
+	pte_t *pte = entry;
 	pte_val(*pte) &= ~_PAGE_CI;
 
 	/*
diff --git mmotm-2014-05-21-16-57.orig/arch/powerpc/mm/subpage-prot.c mmotm-2014-05-21-16-57/arch/powerpc/mm/subpage-prot.c
index d0d94ac606f3..d62e9adc93fb 100644
--- mmotm-2014-05-21-16-57.orig/arch/powerpc/mm/subpage-prot.c
+++ mmotm-2014-05-21-16-57/arch/powerpc/mm/subpage-prot.c
@@ -131,9 +131,10 @@ static void subpage_prot_clear(unsigned long addr, unsigned long len)
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static int subpage_walk_pmd_entry(pmd_t *pmd, unsigned long addr,
+static int subpage_walk_pmd_entry(void *entry, unsigned long addr,
 				  unsigned long end, struct mm_walk *walk)
 {
+	pmd_t *pmd = entry;
 	struct vm_area_struct *vma = walk->vma;
 	spin_unlock(walk->ptl);
 	split_huge_page_pmd(vma, addr, pmd);
diff --git mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
index 8211f6c8236d..a750d0842875 100644
--- mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c
+++ mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
@@ -437,9 +437,10 @@ struct mem_size_stats {
 	u64 pss;
 };
 
-static int smaps_pte(pte_t *pte, unsigned long addr, unsigned long end,
+static int smaps_pte(void *entry, unsigned long addr, unsigned long end,
 			struct mm_walk *walk)
 {
+	pte_t *pte = entry;
 	struct mem_size_stats *mss = walk->private;
 	struct vm_area_struct *vma = walk->vma;
 	pgoff_t pgoff = linear_page_index(vma, addr);
@@ -492,11 +493,11 @@ static int smaps_pte(pte_t *pte, unsigned long addr, unsigned long end,
 	return 0;
 }
 
-static int smaps_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
+static int smaps_pmd(void *entry, unsigned long addr, unsigned long end,
 			struct mm_walk *walk)
 {
 	struct mem_size_stats *mss = walk->private;
-	smaps_pte((pte_t *)pmd, addr, addr + HPAGE_PMD_SIZE, walk);
+	smaps_pte(entry, addr, end, walk);
 	mss->anonymous_thp += HPAGE_PMD_SIZE;
 	return 0;
 }
@@ -720,9 +721,10 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma,
 #endif
 }
 
-static int clear_refs_pte(pte_t *pte, unsigned long addr,
+static int clear_refs_pte(void *entry, unsigned long addr,
 				unsigned long end, struct mm_walk *walk)
 {
+	pte_t *pte = entry;
 	struct clear_refs_private *cp = walk->private;
 	struct vm_area_struct *vma = walk->vma;
 	struct page *page;
@@ -954,9 +956,10 @@ static inline void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemap
 }
 #endif
 
-static int pagemap_pte(pte_t *pte, unsigned long addr, unsigned long end,
+static int pagemap_pte(void *entry, unsigned long addr, unsigned long end,
 			     struct mm_walk *walk)
 {
+	pte_t *pte = entry;
 	struct vm_area_struct *vma = walk->vma;
 	struct pagemapread *pm = walk->private;
 	pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
@@ -969,10 +972,11 @@ static int pagemap_pte(pte_t *pte, unsigned long addr, unsigned long end,
 	return add_to_pagemap(addr, &pme, pm);
 }
 
-static int pagemap_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
+static int pagemap_pmd(void *entry, unsigned long addr, unsigned long end,
 			     struct mm_walk *walk)
 {
 	int err = 0;
+	pmd_t *pmd = entry;
 	struct vm_area_struct *vma = walk->vma;
 	struct pagemapread *pm = walk->private;
 	pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
@@ -1009,9 +1013,10 @@ static void huge_pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *
 }
 
 /* This function walks within one hugetlb entry in the single call */
-static int pagemap_hugetlb(pte_t *pte, unsigned long addr, unsigned long end,
+static int pagemap_hugetlb(void *entry, unsigned long addr, unsigned long end,
 			   struct mm_walk *walk)
 {
+	pte_t *pte = entry;
 	struct pagemapread *pm = walk->private;
 	struct vm_area_struct *vma = walk->vma;
 	int err = 0;
@@ -1243,9 +1248,10 @@ static struct page *can_gather_numa_stats(pte_t pte, struct vm_area_struct *vma,
 	return page;
 }
 
-static int gather_pte_stats(pte_t *pte, unsigned long addr,
+static int gather_pte_stats(void *entry, unsigned long addr,
 		unsigned long end, struct mm_walk *walk)
 {
+	pte_t *pte = entry;
 	struct numa_maps *md = walk->private;
 
 	struct page *page = can_gather_numa_stats(*pte, walk->vma, addr);
@@ -1255,12 +1261,12 @@ static int gather_pte_stats(pte_t *pte, unsigned long addr,
 	return 0;
 }
 
-static int gather_pmd_stats(pmd_t *pmd, unsigned long addr,
+static int gather_pmd_stats(void *entry, unsigned long addr,
 		unsigned long end, struct mm_walk *walk)
 {
 	struct numa_maps *md = walk->private;
 	struct vm_area_struct *vma = walk->vma;
-	pte_t huge_pte = *(pte_t *)pmd;
+	pte_t huge_pte = *(pte_t *)entry;
 	struct page *page;
 
 	page = can_gather_numa_stats(huge_pte, vma, addr);
@@ -1270,9 +1276,10 @@ static int gather_pmd_stats(pmd_t *pmd, unsigned long addr,
 	return 0;
 }
 #ifdef CONFIG_HUGETLB_PAGE
-static int gather_hugetlb_stats(pte_t *pte, unsigned long addr,
+static int gather_hugetlb_stats(void *entry, unsigned long addr,
 				unsigned long end, struct mm_walk *walk)
 {
+	pte_t *pte = entry;
 	struct numa_maps *md;
 	struct page *page;
 
@@ -1292,7 +1299,7 @@ static int gather_hugetlb_stats(pte_t *pte, unsigned long addr,
 }
 
 #else
-static int gather_hugetlb_stats(pte_t *pte, unsigned long addr,
+static int gather_hugetlb_stats(void *entry, unsigned long addr,
 				unsigned long end, struct mm_walk *walk)
 {
 	return 0;
diff --git mmotm-2014-05-21-16-57.orig/include/linux/mm.h mmotm-2014-05-21-16-57/include/linux/mm.h
index cbe17d9cbd7f..08c2a128dd5c 100644
--- mmotm-2014-05-21-16-57.orig/include/linux/mm.h
+++ mmotm-2014-05-21-16-57/include/linux/mm.h
@@ -1114,13 +1114,13 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
  * (see the comment on walk_page_range() for more details)
  */
 struct mm_walk {
-	int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
+	int (*pmd_entry)(void *entry, unsigned long addr,
 			 unsigned long next, struct mm_walk *walk);
-	int (*pte_entry)(pte_t *pte, unsigned long addr,
+	int (*pte_entry)(void *entry, unsigned long addr,
 			 unsigned long next, struct mm_walk *walk);
 	int (*pte_hole)(unsigned long addr, unsigned long next,
 			struct mm_walk *walk);
-	int (*hugetlb_entry)(pte_t *pte, unsigned long addr,
+	int (*hugetlb_entry)(void *entry, unsigned long addr,
 			unsigned long next, struct mm_walk *walk);
 	int (*test_walk)(unsigned long addr, unsigned long next,
 			struct mm_walk *walk);
diff --git mmotm-2014-05-21-16-57.orig/mm/madvise.c mmotm-2014-05-21-16-57/mm/madvise.c
index 06b390a6fbbd..df664fcbd443 100644
--- mmotm-2014-05-21-16-57.orig/mm/madvise.c
+++ mmotm-2014-05-21-16-57/mm/madvise.c
@@ -138,9 +138,10 @@ static long madvise_behavior(struct vm_area_struct *vma,
 /*
  * Assuming that page table walker holds page table lock.
  */
-static int swapin_walk_pte_entry(pte_t *pte, unsigned long start,
+static int swapin_walk_pte_entry(void *ent, unsigned long start,
 	unsigned long end, struct mm_walk *walk)
 {
+	pte_t *pte = ent;
 	pte_t ptent;
 	pte_t *orig_pte = pte - ((start & (PMD_SIZE - 1)) >> PAGE_SHIFT);
 	swp_entry_t entry;
diff --git mmotm-2014-05-21-16-57.orig/mm/memcontrol.c mmotm-2014-05-21-16-57/mm/memcontrol.c
index bb987cb9e043..7d62b6778a5b 100644
--- mmotm-2014-05-21-16-57.orig/mm/memcontrol.c
+++ mmotm-2014-05-21-16-57/mm/memcontrol.c
@@ -6709,19 +6709,21 @@ static inline enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma,
 }
 #endif
 
-static int mem_cgroup_count_precharge_pte(pte_t *pte,
+static int mem_cgroup_count_precharge_pte(void *entry,
 					unsigned long addr, unsigned long end,
 					struct mm_walk *walk)
 {
+	pte_t *pte = entry;
 	if (get_mctgt_type(walk->vma, addr, *pte, NULL))
 		mc.precharge++;	/* increment precharge temporarily */
 	return 0;
 }
 
-static int mem_cgroup_count_precharge_pmd(pmd_t *pmd,
+static int mem_cgroup_count_precharge_pmd(void *entry,
 					unsigned long addr, unsigned long end,
 					struct mm_walk *walk)
 {
+	pmd_t *pmd = entry;
 	struct vm_area_struct *vma = walk->vma;
 
 	if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
@@ -6875,11 +6877,12 @@ static void mem_cgroup_cancel_attach(struct cgroup_subsys_state *css,
 	mem_cgroup_clear_mc();
 }
 
-static int mem_cgroup_move_charge_pte(pte_t *pte,
+static int mem_cgroup_move_charge_pte(void *entry,
 				unsigned long addr, unsigned long end,
 				struct mm_walk *walk)
 {
 	int ret = 0;
+	pte_t *pte = entry;
 	struct vm_area_struct *vma = walk->vma;
 	union mc_target target;
 	struct page *page;
@@ -6936,10 +6939,11 @@ put:		/* get_mctgt_type() gets the page */
 	return 0;
 }
 
-static int mem_cgroup_move_charge_pmd(pmd_t *pmd,
+static int mem_cgroup_move_charge_pmd(void *entry,
 				unsigned long addr, unsigned long end,
 				struct mm_walk *walk)
 {
+	pmd_t *pmd = entry;
 	struct vm_area_struct *vma = walk->vma;
 	enum mc_target_type target_type;
 	union mc_target target;
diff --git mmotm-2014-05-21-16-57.orig/mm/mempolicy.c mmotm-2014-05-21-16-57/mm/mempolicy.c
index b8267f753748..f74cacb36b95 100644
--- mmotm-2014-05-21-16-57.orig/mm/mempolicy.c
+++ mmotm-2014-05-21-16-57/mm/mempolicy.c
@@ -490,9 +490,10 @@ struct queue_pages {
  * Scan through pages checking if pages follow certain conditions,
  * and move them to the pagelist if they do.
  */
-static int queue_pages_pte(pte_t *pte, unsigned long addr,
+static int queue_pages_pte(void *entry, unsigned long addr,
 			unsigned long next, struct mm_walk *walk)
 {
+	pte_t *pte = entry;
 	struct vm_area_struct *vma = walk->vma;
 	struct page *page;
 	struct queue_pages *qp = walk->private;
@@ -519,10 +520,11 @@ static int queue_pages_pte(pte_t *pte, unsigned long addr,
 	return 0;
 }
 
-static int queue_pages_hugetlb(pte_t *pte, unsigned long addr,
+static int queue_pages_hugetlb(void *ent, unsigned long addr,
 				unsigned long next, struct mm_walk *walk)
 {
 #ifdef CONFIG_HUGETLB_PAGE
+	pte_t *pte = ent;
 	struct queue_pages *qp = walk->private;
 	unsigned long flags = qp->flags;
 	int nid;
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 08/11] pagewalk: update comment on walk_page_range()
  2014-06-12 21:48 ` Naoya Horiguchi
@ 2014-06-12 21:48   ` Naoya Horiguchi
  -1 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

Rewriting common code of page table walker has been done, so this patch
updates the comment on walk_page_range() for the future development.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/pagewalk.c | 55 +++++++++++++++++++++++++++----------------------------
 1 file changed, 27 insertions(+), 28 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
index b46c8882c643..626a80d4d9dd 100644
--- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
+++ mmotm-2014-05-21-16-57/mm/pagewalk.c
@@ -253,39 +253,38 @@ static int __walk_page_range(unsigned long start, unsigned long end,
  * walk_page_range - walk page table with caller specific callbacks
  *
  * Recursively walk the page table tree of the process represented by
- * @walk->mm within the virtual address range [@start, @end). In walking,
- * we can call caller-specific callback functions against each entry.
+ * @walk->mm within the virtual address range [@start, @end). During walking,
+ * we can call caller-specific callback functions against each leaf entry.
  *
  * Before starting to walk page table, some callers want to check whether
- * they really want to walk over the vma (for example by checking vm_flags.)
- * walk_page_test() and @walk->test_walk() do that check.
+ * they really want to walk over the current vma, typically by checking
+ * its vm_flags. walk_page_test() and @walk->test_walk() are used for this
+ * purpose.
  *
- * If any callback returns a non-zero value, the page table walk is aborted
- * immediately and the return value is propagated back to the caller.
- * Note that the meaning of the positive returned value can be defined
- * by the caller for its own purpose.
+ * Currently we have three types of possible leaf enties, pte (for normal
+ * pages,) pmd (for thps,) and hugetlb. We handle these three with pte_entry(),
+ * pmd_entry(), and hugetlb_entry(), respectively.
+ * If you don't set any function to some of these callbacks, the associated
+ * entries/pages are ignored.
+ * The return values of these three callbacks are commonly defined like below:
+ *  - 0  : succeeded to handle the current entry, and if you don't reach the
+ *         end address yet, continue to walk.
+ *  - >0 : succeeded to handle the current entry, and return to the caller
+ *         with caller specific value.
+ *  - <0 : failed to handle the current entry, and return to the caller
+ *         with error code.
+ * We can set the same function to different callbacks, where @walk->size
+ * should be helpful to know the type of entry in callbacks.
  *
- * If the caller defines multiple callbacks in different levels, the
- * callbacks are called in depth-first manner. It could happen that
- * multiple callbacks are called on a address. For example if some caller
- * defines test_walk(), pmd_entry(), and pte_entry(), then callbacks are
- * called in the order of test_walk(), pmd_entry(), and pte_entry().
- * If you don't want to go down to lower level at some point and move to
- * the next entry in the same level, you set @walk->skip to 1.
- * For example if you succeed to handle some pmd entry as trans_huge entry,
- * you need not call walk_pte_range() any more, so set it to avoid that.
- * We can't determine whether to go down to lower level with the return
- * value of the callback, because the whole range of return values (0, >0,
- * and <0) are used up for other meanings.
+ * struct mm_walk keeps current values of some common data like vma and pmd,
+ * which are useful for the access from callbacks. If you want to pass some
+ * caller-specific data to callbacks, @walk->private should be helpful.
  *
- * Each callback can access to the vma over which it is doing page table
- * walk right now via @walk->vma. @walk->vma is set to NULL in walking
- * outside a vma. If you want to access to some caller-specific data from
- * callbacks, @walk->private should be helpful.
- *
- * The callers should hold @walk->mm->mmap_sem. Note that the lower level
- * iterators can take page table lock in lowest level iteration and/or
- * in split_huge_page_pmd().
+ * Locking:
+ *   Callers of walk_page_range() and walk_page_vma() should hold
+ *   @walk->mm->mmap_sem, because these function traverse vma list and/or
+ *   access to vma's data. And page table lock is held during running
+ *   pmd_entry(), pte_entry(), and hugetlb_entry().
  */
 int walk_page_range(unsigned long start, unsigned long end,
 		    struct mm_walk *walk)
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 08/11] pagewalk: update comment on walk_page_range()
@ 2014-06-12 21:48   ` Naoya Horiguchi
  0 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

Rewriting common code of page table walker has been done, so this patch
updates the comment on walk_page_range() for the future development.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/pagewalk.c | 55 +++++++++++++++++++++++++++----------------------------
 1 file changed, 27 insertions(+), 28 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
index b46c8882c643..626a80d4d9dd 100644
--- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
+++ mmotm-2014-05-21-16-57/mm/pagewalk.c
@@ -253,39 +253,38 @@ static int __walk_page_range(unsigned long start, unsigned long end,
  * walk_page_range - walk page table with caller specific callbacks
  *
  * Recursively walk the page table tree of the process represented by
- * @walk->mm within the virtual address range [@start, @end). In walking,
- * we can call caller-specific callback functions against each entry.
+ * @walk->mm within the virtual address range [@start, @end). During walking,
+ * we can call caller-specific callback functions against each leaf entry.
  *
  * Before starting to walk page table, some callers want to check whether
- * they really want to walk over the vma (for example by checking vm_flags.)
- * walk_page_test() and @walk->test_walk() do that check.
+ * they really want to walk over the current vma, typically by checking
+ * its vm_flags. walk_page_test() and @walk->test_walk() are used for this
+ * purpose.
  *
- * If any callback returns a non-zero value, the page table walk is aborted
- * immediately and the return value is propagated back to the caller.
- * Note that the meaning of the positive returned value can be defined
- * by the caller for its own purpose.
+ * Currently we have three types of possible leaf enties, pte (for normal
+ * pages,) pmd (for thps,) and hugetlb. We handle these three with pte_entry(),
+ * pmd_entry(), and hugetlb_entry(), respectively.
+ * If you don't set any function to some of these callbacks, the associated
+ * entries/pages are ignored.
+ * The return values of these three callbacks are commonly defined like below:
+ *  - 0  : succeeded to handle the current entry, and if you don't reach the
+ *         end address yet, continue to walk.
+ *  - >0 : succeeded to handle the current entry, and return to the caller
+ *         with caller specific value.
+ *  - <0 : failed to handle the current entry, and return to the caller
+ *         with error code.
+ * We can set the same function to different callbacks, where @walk->size
+ * should be helpful to know the type of entry in callbacks.
  *
- * If the caller defines multiple callbacks in different levels, the
- * callbacks are called in depth-first manner. It could happen that
- * multiple callbacks are called on a address. For example if some caller
- * defines test_walk(), pmd_entry(), and pte_entry(), then callbacks are
- * called in the order of test_walk(), pmd_entry(), and pte_entry().
- * If you don't want to go down to lower level at some point and move to
- * the next entry in the same level, you set @walk->skip to 1.
- * For example if you succeed to handle some pmd entry as trans_huge entry,
- * you need not call walk_pte_range() any more, so set it to avoid that.
- * We can't determine whether to go down to lower level with the return
- * value of the callback, because the whole range of return values (0, >0,
- * and <0) are used up for other meanings.
+ * struct mm_walk keeps current values of some common data like vma and pmd,
+ * which are useful for the access from callbacks. If you want to pass some
+ * caller-specific data to callbacks, @walk->private should be helpful.
  *
- * Each callback can access to the vma over which it is doing page table
- * walk right now via @walk->vma. @walk->vma is set to NULL in walking
- * outside a vma. If you want to access to some caller-specific data from
- * callbacks, @walk->private should be helpful.
- *
- * The callers should hold @walk->mm->mmap_sem. Note that the lower level
- * iterators can take page table lock in lowest level iteration and/or
- * in split_huge_page_pmd().
+ * Locking:
+ *   Callers of walk_page_range() and walk_page_vma() should hold
+ *   @walk->mm->mmap_sem, because these function traverse vma list and/or
+ *   access to vma's data. And page table lock is held during running
+ *   pmd_entry(), pte_entry(), and hugetlb_entry().
  */
 int walk_page_range(unsigned long start, unsigned long end,
 		    struct mm_walk *walk)
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 09/11] fs/proc/task_mmu.c: refactor smaps
  2014-06-12 21:48 ` Naoya Horiguchi
@ 2014-06-12 21:48   ` Naoya Horiguchi
  -1 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

smaps_(pte|pmd)() are almost the same, so let's clean them up to a single
function.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 fs/proc/task_mmu.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
index a750d0842875..1f2eab58ae14 100644
--- mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c
+++ mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
@@ -437,7 +437,7 @@ struct mem_size_stats {
 	u64 pss;
 };
 
-static int smaps_pte(void *entry, unsigned long addr, unsigned long end,
+static int smaps_entry(void *entry, unsigned long addr, unsigned long end,
 			struct mm_walk *walk)
 {
 	pte_t *pte = entry;
@@ -490,15 +490,10 @@ static int smaps_pte(void *entry, unsigned long addr, unsigned long end,
 			mss->private_clean += ptent_size;
 		mss->pss += (ptent_size << PSS_SHIFT);
 	}
-	return 0;
-}
 
-static int smaps_pmd(void *entry, unsigned long addr, unsigned long end,
-			struct mm_walk *walk)
-{
-	struct mem_size_stats *mss = walk->private;
-	smaps_pte(entry, addr, end, walk);
-	mss->anonymous_thp += HPAGE_PMD_SIZE;
+	if (walk->size == PMD_SIZE)
+		mss->anonymous_thp += HPAGE_PMD_SIZE;
+
 	return 0;
 }
 
@@ -563,8 +558,8 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 	struct vm_area_struct *vma = v;
 	struct mem_size_stats mss;
 	struct mm_walk smaps_walk = {
-		.pmd_entry = smaps_pmd,
-		.pte_entry = smaps_pte,
+		.pmd_entry = smaps_entry,
+		.pte_entry = smaps_entry,
 		.mm = vma->vm_mm,
 		.vma = vma,
 		.private = &mss,
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 09/11] fs/proc/task_mmu.c: refactor smaps
@ 2014-06-12 21:48   ` Naoya Horiguchi
  0 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

smaps_(pte|pmd)() are almost the same, so let's clean them up to a single
function.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 fs/proc/task_mmu.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
index a750d0842875..1f2eab58ae14 100644
--- mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c
+++ mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
@@ -437,7 +437,7 @@ struct mem_size_stats {
 	u64 pss;
 };
 
-static int smaps_pte(void *entry, unsigned long addr, unsigned long end,
+static int smaps_entry(void *entry, unsigned long addr, unsigned long end,
 			struct mm_walk *walk)
 {
 	pte_t *pte = entry;
@@ -490,15 +490,10 @@ static int smaps_pte(void *entry, unsigned long addr, unsigned long end,
 			mss->private_clean += ptent_size;
 		mss->pss += (ptent_size << PSS_SHIFT);
 	}
-	return 0;
-}
 
-static int smaps_pmd(void *entry, unsigned long addr, unsigned long end,
-			struct mm_walk *walk)
-{
-	struct mem_size_stats *mss = walk->private;
-	smaps_pte(entry, addr, end, walk);
-	mss->anonymous_thp += HPAGE_PMD_SIZE;
+	if (walk->size == PMD_SIZE)
+		mss->anonymous_thp += HPAGE_PMD_SIZE;
+
 	return 0;
 }
 
@@ -563,8 +558,8 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 	struct vm_area_struct *vma = v;
 	struct mem_size_stats mss;
 	struct mm_walk smaps_walk = {
-		.pmd_entry = smaps_pmd,
-		.pte_entry = smaps_pte,
+		.pmd_entry = smaps_entry,
+		.pte_entry = smaps_entry,
 		.mm = vma->vm_mm,
 		.vma = vma,
 		.private = &mss,
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 10/11] fs/proc/task_mmu.c: clean up gather_*_stats()
  2014-06-12 21:48 ` Naoya Horiguchi
@ 2014-06-12 21:48   ` Naoya Horiguchi
  -1 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

Most code of gather_(pte|pmd|hugetlb)_stats() are duplicate, so let's clean
them up with a single function.

vm_normal_page() doesn't calculate pgoff correctly for hugetlbfs, so this
patch also fixes it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 fs/proc/task_mmu.c | 58 ++++++------------------------------------------------
 mm/memory.c        |  5 ++---
 2 files changed, 8 insertions(+), 55 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
index 1f2eab58ae14..27ad736c6b30 100644
--- mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c
+++ mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
@@ -1243,63 +1243,17 @@ static struct page *can_gather_numa_stats(pte_t pte, struct vm_area_struct *vma,
 	return page;
 }
 
-static int gather_pte_stats(void *entry, unsigned long addr,
+static int gather_stats_entry(void *entry, unsigned long addr,
 		unsigned long end, struct mm_walk *walk)
 {
 	pte_t *pte = entry;
 	struct numa_maps *md = walk->private;
-
 	struct page *page = can_gather_numa_stats(*pte, walk->vma, addr);
-	if (!page)
-		return 0;
-	gather_stats(page, md, pte_dirty(*pte), 1);
-	return 0;
-}
-
-static int gather_pmd_stats(void *entry, unsigned long addr,
-		unsigned long end, struct mm_walk *walk)
-{
-	struct numa_maps *md = walk->private;
-	struct vm_area_struct *vma = walk->vma;
-	pte_t huge_pte = *(pte_t *)entry;
-	struct page *page;
-
-	page = can_gather_numa_stats(huge_pte, vma, addr);
 	if (page)
-		gather_stats(page, md, pte_dirty(huge_pte),
-			     HPAGE_PMD_SIZE/PAGE_SIZE);
+		gather_stats(page, md, pte_dirty(*pte),
+			     walk->size >> PAGE_SHIFT);
 	return 0;
 }
-#ifdef CONFIG_HUGETLB_PAGE
-static int gather_hugetlb_stats(void *entry, unsigned long addr,
-				unsigned long end, struct mm_walk *walk)
-{
-	pte_t *pte = entry;
-	struct numa_maps *md;
-	struct page *page;
-
-	if (pte_none(*pte))
-		return 0;
-
-	if (!pte_present(*pte))
-		return 0;
-
-	page = pte_page(*pte);
-	if (!page)
-		return 0;
-
-	md = walk->private;
-	gather_stats(page, md, pte_dirty(*pte), 1);
-	return 0;
-}
-
-#else
-static int gather_hugetlb_stats(void *entry, unsigned long addr,
-				unsigned long end, struct mm_walk *walk)
-{
-	return 0;
-}
-#endif
 
 /*
  * Display pages allocated per node and memory policy via /proc.
@@ -1324,9 +1278,9 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
 	/* Ensure we start with an empty set of numa_maps statistics. */
 	memset(md, 0, sizeof(*md));
 
-	walk.hugetlb_entry = gather_hugetlb_stats;
-	walk.pmd_entry = gather_pmd_stats;
-	walk.pte_entry = gather_pte_stats;
+	walk.hugetlb_entry = gather_stats_entry;
+	walk.pmd_entry = gather_stats_entry;
+	walk.pte_entry = gather_stats_entry;
 	walk.private = md;
 	walk.mm = mm;
 	walk.vma = vma;
diff --git mmotm-2014-05-21-16-57.orig/mm/memory.c mmotm-2014-05-21-16-57/mm/memory.c
index fd16b767dd68..7389dd04370f 100644
--- mmotm-2014-05-21-16-57.orig/mm/memory.c
+++ mmotm-2014-05-21-16-57/mm/memory.c
@@ -768,9 +768,8 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 				return NULL;
 			goto out;
 		} else {
-			unsigned long off;
-			off = (addr - vma->vm_start) >> PAGE_SHIFT;
-			if (pfn == vma->vm_pgoff + off)
+			unsigned long off = linear_page_index(vma, addr);
+			if (pfn == off)
 				return NULL;
 			if (!is_cow_mapping(vma->vm_flags))
 				return NULL;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 10/11] fs/proc/task_mmu.c: clean up gather_*_stats()
@ 2014-06-12 21:48   ` Naoya Horiguchi
  0 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

Most code of gather_(pte|pmd|hugetlb)_stats() are duplicate, so let's clean
them up with a single function.

vm_normal_page() doesn't calculate pgoff correctly for hugetlbfs, so this
patch also fixes it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 fs/proc/task_mmu.c | 58 ++++++------------------------------------------------
 mm/memory.c        |  5 ++---
 2 files changed, 8 insertions(+), 55 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
index 1f2eab58ae14..27ad736c6b30 100644
--- mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c
+++ mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
@@ -1243,63 +1243,17 @@ static struct page *can_gather_numa_stats(pte_t pte, struct vm_area_struct *vma,
 	return page;
 }
 
-static int gather_pte_stats(void *entry, unsigned long addr,
+static int gather_stats_entry(void *entry, unsigned long addr,
 		unsigned long end, struct mm_walk *walk)
 {
 	pte_t *pte = entry;
 	struct numa_maps *md = walk->private;
-
 	struct page *page = can_gather_numa_stats(*pte, walk->vma, addr);
-	if (!page)
-		return 0;
-	gather_stats(page, md, pte_dirty(*pte), 1);
-	return 0;
-}
-
-static int gather_pmd_stats(void *entry, unsigned long addr,
-		unsigned long end, struct mm_walk *walk)
-{
-	struct numa_maps *md = walk->private;
-	struct vm_area_struct *vma = walk->vma;
-	pte_t huge_pte = *(pte_t *)entry;
-	struct page *page;
-
-	page = can_gather_numa_stats(huge_pte, vma, addr);
 	if (page)
-		gather_stats(page, md, pte_dirty(huge_pte),
-			     HPAGE_PMD_SIZE/PAGE_SIZE);
+		gather_stats(page, md, pte_dirty(*pte),
+			     walk->size >> PAGE_SHIFT);
 	return 0;
 }
-#ifdef CONFIG_HUGETLB_PAGE
-static int gather_hugetlb_stats(void *entry, unsigned long addr,
-				unsigned long end, struct mm_walk *walk)
-{
-	pte_t *pte = entry;
-	struct numa_maps *md;
-	struct page *page;
-
-	if (pte_none(*pte))
-		return 0;
-
-	if (!pte_present(*pte))
-		return 0;
-
-	page = pte_page(*pte);
-	if (!page)
-		return 0;
-
-	md = walk->private;
-	gather_stats(page, md, pte_dirty(*pte), 1);
-	return 0;
-}
-
-#else
-static int gather_hugetlb_stats(void *entry, unsigned long addr,
-				unsigned long end, struct mm_walk *walk)
-{
-	return 0;
-}
-#endif
 
 /*
  * Display pages allocated per node and memory policy via /proc.
@@ -1324,9 +1278,9 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
 	/* Ensure we start with an empty set of numa_maps statistics. */
 	memset(md, 0, sizeof(*md));
 
-	walk.hugetlb_entry = gather_hugetlb_stats;
-	walk.pmd_entry = gather_pmd_stats;
-	walk.pte_entry = gather_pte_stats;
+	walk.hugetlb_entry = gather_stats_entry;
+	walk.pmd_entry = gather_stats_entry;
+	walk.pte_entry = gather_stats_entry;
 	walk.private = md;
 	walk.mm = mm;
 	walk.vma = vma;
diff --git mmotm-2014-05-21-16-57.orig/mm/memory.c mmotm-2014-05-21-16-57/mm/memory.c
index fd16b767dd68..7389dd04370f 100644
--- mmotm-2014-05-21-16-57.orig/mm/memory.c
+++ mmotm-2014-05-21-16-57/mm/memory.c
@@ -768,9 +768,8 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 				return NULL;
 			goto out;
 		} else {
-			unsigned long off;
-			off = (addr - vma->vm_start) >> PAGE_SHIFT;
-			if (pfn == vma->vm_pgoff + off)
+			unsigned long off = linear_page_index(vma, addr);
+			if (pfn == off)
 				return NULL;
 			if (!is_cow_mapping(vma->vm_flags))
 				return NULL;
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 11/11] mincore: apply page table walker on do_mincore()
  2014-06-12 21:48 ` Naoya Horiguchi
@ 2014-06-12 21:48   ` Naoya Horiguchi
  -1 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

This patch makes do_mincore() use walk_page_vma(), which reduces many lines
of code by using common page table walk code.

ChangeLog v2:
- change type of args of callbacks to void *
- move definition of mincore_walk to the start of the function to fix compiler
  warning

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/huge_memory.c |  20 ------
 mm/mincore.c     | 195 +++++++++++++++++++------------------------------------
 2 files changed, 67 insertions(+), 148 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/mm/huge_memory.c mmotm-2014-05-21-16-57/mm/huge_memory.c
index 6fd0668d4e1d..2671a9621d0e 100644
--- mmotm-2014-05-21-16-57.orig/mm/huge_memory.c
+++ mmotm-2014-05-21-16-57/mm/huge_memory.c
@@ -1379,26 +1379,6 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	return ret;
 }
 
-int mincore_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
-		unsigned long addr, unsigned long end,
-		unsigned char *vec)
-{
-	spinlock_t *ptl;
-	int ret = 0;
-
-	if (__pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
-		/*
-		 * All logical pages in the range are present
-		 * if backed by a huge page.
-		 */
-		spin_unlock(ptl);
-		memset(vec, 1, (end - addr) >> PAGE_SHIFT);
-		ret = 1;
-	}
-
-	return ret;
-}
-
 int move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
 		  unsigned long old_addr,
 		  unsigned long new_addr, unsigned long old_end,
diff --git mmotm-2014-05-21-16-57.orig/mm/mincore.c mmotm-2014-05-21-16-57/mm/mincore.c
index 725c80961048..d53b07548ce1 100644
--- mmotm-2014-05-21-16-57.orig/mm/mincore.c
+++ mmotm-2014-05-21-16-57/mm/mincore.c
@@ -19,38 +19,28 @@
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
 
-static void mincore_hugetlb_page_range(struct vm_area_struct *vma,
-				unsigned long addr, unsigned long end,
-				unsigned char *vec)
+static int mincore_hugetlb(void *entry, unsigned long addr,
+			unsigned long end, struct mm_walk *walk)
 {
+	int err = 0;
 #ifdef CONFIG_HUGETLB_PAGE
-	struct hstate *h;
+	pte_t *pte = entry;
+	unsigned char present;
+	unsigned char *vec = walk->private;
 
-	h = hstate_vma(vma);
-	while (1) {
-		unsigned char present;
-		pte_t *ptep;
-		/*
-		 * Huge pages are always in RAM for now, but
-		 * theoretically it needs to be checked.
-		 */
-		ptep = huge_pte_offset(current->mm,
-				       addr & huge_page_mask(h));
-		present = ptep && !huge_pte_none(huge_ptep_get(ptep));
-		while (1) {
-			*vec = present;
-			vec++;
-			addr += PAGE_SIZE;
-			if (addr == end)
-				return;
-			/* check hugepage border */
-			if (!(addr & ~huge_page_mask(h)))
-				break;
-		}
-	}
+	/*
+	 * Hugepages under user process are always in RAM and never
+	 * swapped out, but theoretically it needs to be checked.
+	 */
+	present = pte && !huge_pte_none(huge_ptep_get(pte));
+	for (; addr != end; vec++, addr += PAGE_SIZE)
+		*vec = present;
+	cond_resched();
+	walk->private += (end - addr) >> PAGE_SHIFT;
 #else
 	BUG();
 #endif
+	return err;
 }
 
 /*
@@ -94,10 +84,11 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff)
 	return present;
 }
 
-static void mincore_unmapped_range(struct vm_area_struct *vma,
-				unsigned long addr, unsigned long end,
-				unsigned char *vec)
+static int mincore_hole(unsigned long addr, unsigned long end,
+				struct mm_walk *walk)
 {
+	struct vm_area_struct *vma = walk->vma;
+	unsigned char *vec = walk->private;
 	unsigned long nr = (end - addr) >> PAGE_SHIFT;
 	int i;
 
@@ -111,110 +102,50 @@ static void mincore_unmapped_range(struct vm_area_struct *vma,
 		for (i = 0; i < nr; i++)
 			vec[i] = 0;
 	}
+	walk->private += nr;
+	return 0;
 }
 
-static void mincore_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
-			unsigned long addr, unsigned long end,
-			unsigned char *vec)
+static int mincore_pte(void *entry, unsigned long addr, unsigned long end,
+			struct mm_walk *walk)
 {
-	unsigned long next;
-	spinlock_t *ptl;
-	pte_t *ptep;
-
-	ptep = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
-	do {
-		pte_t pte = *ptep;
-		pgoff_t pgoff;
-
-		next = addr + PAGE_SIZE;
-		if (pte_none(pte))
-			mincore_unmapped_range(vma, addr, next, vec);
-		else if (pte_present(pte))
+	pte_t *pte = entry;
+	struct vm_area_struct *vma = walk->vma;
+	unsigned char *vec = walk->private;
+	pgoff_t pgoff;
+
+	if (pte_present(*pte))
+		*vec = 1;
+	else if (pte_file(*pte)) {
+		pgoff = pte_to_pgoff(*pte);
+		*vec = mincore_page(vma->vm_file->f_mapping, pgoff);
+	} else { /* pte is a swap entry */
+		swp_entry_t entry = pte_to_swp_entry(*pte);
+
+		if (is_migration_entry(entry)) {
+			/* migration entries are always uptodate */
 			*vec = 1;
-		else if (pte_file(pte)) {
-			pgoff = pte_to_pgoff(pte);
-			*vec = mincore_page(vma->vm_file->f_mapping, pgoff);
-		} else { /* pte is a swap entry */
-			swp_entry_t entry = pte_to_swp_entry(pte);
-
-			if (is_migration_entry(entry)) {
-				/* migration entries are always uptodate */
-				*vec = 1;
-			} else {
+		} else {
 #ifdef CONFIG_SWAP
-				pgoff = entry.val;
-				*vec = mincore_page(swap_address_space(entry),
-					pgoff);
+			pgoff = entry.val;
+			*vec = mincore_page(swap_address_space(entry),
+					    pgoff);
 #else
-				WARN_ON(1);
-				*vec = 1;
+			WARN_ON(1);
+			*vec = 1;
 #endif
-			}
-		}
-		vec++;
-	} while (ptep++, addr = next, addr != end);
-	pte_unmap_unlock(ptep - 1, ptl);
-}
-
-static void mincore_pmd_range(struct vm_area_struct *vma, pud_t *pud,
-			unsigned long addr, unsigned long end,
-			unsigned char *vec)
-{
-	unsigned long next;
-	pmd_t *pmd;
-
-	pmd = pmd_offset(pud, addr);
-	do {
-		next = pmd_addr_end(addr, end);
-		if (pmd_trans_huge(*pmd)) {
-			if (mincore_huge_pmd(vma, pmd, addr, next, vec)) {
-				vec += (next - addr) >> PAGE_SHIFT;
-				continue;
-			}
-			/* fall through */
 		}
-		if (pmd_none_or_trans_huge_or_clear_bad(pmd))
-			mincore_unmapped_range(vma, addr, next, vec);
-		else
-			mincore_pte_range(vma, pmd, addr, next, vec);
-		vec += (next - addr) >> PAGE_SHIFT;
-	} while (pmd++, addr = next, addr != end);
-}
-
-static void mincore_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
-			unsigned long addr, unsigned long end,
-			unsigned char *vec)
-{
-	unsigned long next;
-	pud_t *pud;
-
-	pud = pud_offset(pgd, addr);
-	do {
-		next = pud_addr_end(addr, end);
-		if (pud_none_or_clear_bad(pud))
-			mincore_unmapped_range(vma, addr, next, vec);
-		else
-			mincore_pmd_range(vma, pud, addr, next, vec);
-		vec += (next - addr) >> PAGE_SHIFT;
-	} while (pud++, addr = next, addr != end);
+	}
+	walk->private++;
+	return 0;
 }
 
-static void mincore_page_range(struct vm_area_struct *vma,
-			unsigned long addr, unsigned long end,
-			unsigned char *vec)
+static int mincore_pmd(void *entry, unsigned long addr, unsigned long end,
+			struct mm_walk *walk)
 {
-	unsigned long next;
-	pgd_t *pgd;
-
-	pgd = pgd_offset(vma->vm_mm, addr);
-	do {
-		next = pgd_addr_end(addr, end);
-		if (pgd_none_or_clear_bad(pgd))
-			mincore_unmapped_range(vma, addr, next, vec);
-		else
-			mincore_pud_range(vma, pgd, addr, next, vec);
-		vec += (next - addr) >> PAGE_SHIFT;
-	} while (pgd++, addr = next, addr != end);
+	memset(walk->private, 1, (end - addr) >> PAGE_SHIFT);
+	walk->private += (end - addr) >> PAGE_SHIFT;
+	return 0;
 }
 
 /*
@@ -226,19 +157,27 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v
 {
 	struct vm_area_struct *vma;
 	unsigned long end;
+	int err;
+	struct mm_walk mincore_walk = {
+		.pmd_entry = mincore_pmd,
+		.pte_entry = mincore_pte,
+		.pte_hole = mincore_hole,
+		.hugetlb_entry = mincore_hugetlb,
+		.private = vec,
+	};
 
 	vma = find_vma(current->mm, addr);
 	if (!vma || addr < vma->vm_start)
 		return -ENOMEM;
-
+	mincore_walk.mm = vma->vm_mm;
+	mincore_walk.vma = vma;
 	end = min(vma->vm_end, addr + (pages << PAGE_SHIFT));
 
-	if (is_vm_hugetlb_page(vma))
-		mincore_hugetlb_page_range(vma, addr, end, vec);
+	err = walk_page_vma(vma, &mincore_walk);
+	if (err < 0)
+		return err;
 	else
-		mincore_page_range(vma, addr, end, vec);
-
-	return (end - addr) >> PAGE_SHIFT;
+		return (end - addr) >> PAGE_SHIFT;
 }
 
 /*
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH -mm v2 11/11] mincore: apply page table walker on do_mincore()
@ 2014-06-12 21:48   ` Naoya Horiguchi
  0 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 21:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

This patch makes do_mincore() use walk_page_vma(), which reduces many lines
of code by using common page table walk code.

ChangeLog v2:
- change type of args of callbacks to void *
- move definition of mincore_walk to the start of the function to fix compiler
  warning

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/huge_memory.c |  20 ------
 mm/mincore.c     | 195 +++++++++++++++++++------------------------------------
 2 files changed, 67 insertions(+), 148 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/mm/huge_memory.c mmotm-2014-05-21-16-57/mm/huge_memory.c
index 6fd0668d4e1d..2671a9621d0e 100644
--- mmotm-2014-05-21-16-57.orig/mm/huge_memory.c
+++ mmotm-2014-05-21-16-57/mm/huge_memory.c
@@ -1379,26 +1379,6 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	return ret;
 }
 
-int mincore_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
-		unsigned long addr, unsigned long end,
-		unsigned char *vec)
-{
-	spinlock_t *ptl;
-	int ret = 0;
-
-	if (__pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
-		/*
-		 * All logical pages in the range are present
-		 * if backed by a huge page.
-		 */
-		spin_unlock(ptl);
-		memset(vec, 1, (end - addr) >> PAGE_SHIFT);
-		ret = 1;
-	}
-
-	return ret;
-}
-
 int move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
 		  unsigned long old_addr,
 		  unsigned long new_addr, unsigned long old_end,
diff --git mmotm-2014-05-21-16-57.orig/mm/mincore.c mmotm-2014-05-21-16-57/mm/mincore.c
index 725c80961048..d53b07548ce1 100644
--- mmotm-2014-05-21-16-57.orig/mm/mincore.c
+++ mmotm-2014-05-21-16-57/mm/mincore.c
@@ -19,38 +19,28 @@
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
 
-static void mincore_hugetlb_page_range(struct vm_area_struct *vma,
-				unsigned long addr, unsigned long end,
-				unsigned char *vec)
+static int mincore_hugetlb(void *entry, unsigned long addr,
+			unsigned long end, struct mm_walk *walk)
 {
+	int err = 0;
 #ifdef CONFIG_HUGETLB_PAGE
-	struct hstate *h;
+	pte_t *pte = entry;
+	unsigned char present;
+	unsigned char *vec = walk->private;
 
-	h = hstate_vma(vma);
-	while (1) {
-		unsigned char present;
-		pte_t *ptep;
-		/*
-		 * Huge pages are always in RAM for now, but
-		 * theoretically it needs to be checked.
-		 */
-		ptep = huge_pte_offset(current->mm,
-				       addr & huge_page_mask(h));
-		present = ptep && !huge_pte_none(huge_ptep_get(ptep));
-		while (1) {
-			*vec = present;
-			vec++;
-			addr += PAGE_SIZE;
-			if (addr == end)
-				return;
-			/* check hugepage border */
-			if (!(addr & ~huge_page_mask(h)))
-				break;
-		}
-	}
+	/*
+	 * Hugepages under user process are always in RAM and never
+	 * swapped out, but theoretically it needs to be checked.
+	 */
+	present = pte && !huge_pte_none(huge_ptep_get(pte));
+	for (; addr != end; vec++, addr += PAGE_SIZE)
+		*vec = present;
+	cond_resched();
+	walk->private += (end - addr) >> PAGE_SHIFT;
 #else
 	BUG();
 #endif
+	return err;
 }
 
 /*
@@ -94,10 +84,11 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff)
 	return present;
 }
 
-static void mincore_unmapped_range(struct vm_area_struct *vma,
-				unsigned long addr, unsigned long end,
-				unsigned char *vec)
+static int mincore_hole(unsigned long addr, unsigned long end,
+				struct mm_walk *walk)
 {
+	struct vm_area_struct *vma = walk->vma;
+	unsigned char *vec = walk->private;
 	unsigned long nr = (end - addr) >> PAGE_SHIFT;
 	int i;
 
@@ -111,110 +102,50 @@ static void mincore_unmapped_range(struct vm_area_struct *vma,
 		for (i = 0; i < nr; i++)
 			vec[i] = 0;
 	}
+	walk->private += nr;
+	return 0;
 }
 
-static void mincore_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
-			unsigned long addr, unsigned long end,
-			unsigned char *vec)
+static int mincore_pte(void *entry, unsigned long addr, unsigned long end,
+			struct mm_walk *walk)
 {
-	unsigned long next;
-	spinlock_t *ptl;
-	pte_t *ptep;
-
-	ptep = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
-	do {
-		pte_t pte = *ptep;
-		pgoff_t pgoff;
-
-		next = addr + PAGE_SIZE;
-		if (pte_none(pte))
-			mincore_unmapped_range(vma, addr, next, vec);
-		else if (pte_present(pte))
+	pte_t *pte = entry;
+	struct vm_area_struct *vma = walk->vma;
+	unsigned char *vec = walk->private;
+	pgoff_t pgoff;
+
+	if (pte_present(*pte))
+		*vec = 1;
+	else if (pte_file(*pte)) {
+		pgoff = pte_to_pgoff(*pte);
+		*vec = mincore_page(vma->vm_file->f_mapping, pgoff);
+	} else { /* pte is a swap entry */
+		swp_entry_t entry = pte_to_swp_entry(*pte);
+
+		if (is_migration_entry(entry)) {
+			/* migration entries are always uptodate */
 			*vec = 1;
-		else if (pte_file(pte)) {
-			pgoff = pte_to_pgoff(pte);
-			*vec = mincore_page(vma->vm_file->f_mapping, pgoff);
-		} else { /* pte is a swap entry */
-			swp_entry_t entry = pte_to_swp_entry(pte);
-
-			if (is_migration_entry(entry)) {
-				/* migration entries are always uptodate */
-				*vec = 1;
-			} else {
+		} else {
 #ifdef CONFIG_SWAP
-				pgoff = entry.val;
-				*vec = mincore_page(swap_address_space(entry),
-					pgoff);
+			pgoff = entry.val;
+			*vec = mincore_page(swap_address_space(entry),
+					    pgoff);
 #else
-				WARN_ON(1);
-				*vec = 1;
+			WARN_ON(1);
+			*vec = 1;
 #endif
-			}
-		}
-		vec++;
-	} while (ptep++, addr = next, addr != end);
-	pte_unmap_unlock(ptep - 1, ptl);
-}
-
-static void mincore_pmd_range(struct vm_area_struct *vma, pud_t *pud,
-			unsigned long addr, unsigned long end,
-			unsigned char *vec)
-{
-	unsigned long next;
-	pmd_t *pmd;
-
-	pmd = pmd_offset(pud, addr);
-	do {
-		next = pmd_addr_end(addr, end);
-		if (pmd_trans_huge(*pmd)) {
-			if (mincore_huge_pmd(vma, pmd, addr, next, vec)) {
-				vec += (next - addr) >> PAGE_SHIFT;
-				continue;
-			}
-			/* fall through */
 		}
-		if (pmd_none_or_trans_huge_or_clear_bad(pmd))
-			mincore_unmapped_range(vma, addr, next, vec);
-		else
-			mincore_pte_range(vma, pmd, addr, next, vec);
-		vec += (next - addr) >> PAGE_SHIFT;
-	} while (pmd++, addr = next, addr != end);
-}
-
-static void mincore_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
-			unsigned long addr, unsigned long end,
-			unsigned char *vec)
-{
-	unsigned long next;
-	pud_t *pud;
-
-	pud = pud_offset(pgd, addr);
-	do {
-		next = pud_addr_end(addr, end);
-		if (pud_none_or_clear_bad(pud))
-			mincore_unmapped_range(vma, addr, next, vec);
-		else
-			mincore_pmd_range(vma, pud, addr, next, vec);
-		vec += (next - addr) >> PAGE_SHIFT;
-	} while (pud++, addr = next, addr != end);
+	}
+	walk->private++;
+	return 0;
 }
 
-static void mincore_page_range(struct vm_area_struct *vma,
-			unsigned long addr, unsigned long end,
-			unsigned char *vec)
+static int mincore_pmd(void *entry, unsigned long addr, unsigned long end,
+			struct mm_walk *walk)
 {
-	unsigned long next;
-	pgd_t *pgd;
-
-	pgd = pgd_offset(vma->vm_mm, addr);
-	do {
-		next = pgd_addr_end(addr, end);
-		if (pgd_none_or_clear_bad(pgd))
-			mincore_unmapped_range(vma, addr, next, vec);
-		else
-			mincore_pud_range(vma, pgd, addr, next, vec);
-		vec += (next - addr) >> PAGE_SHIFT;
-	} while (pgd++, addr = next, addr != end);
+	memset(walk->private, 1, (end - addr) >> PAGE_SHIFT);
+	walk->private += (end - addr) >> PAGE_SHIFT;
+	return 0;
 }
 
 /*
@@ -226,19 +157,27 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v
 {
 	struct vm_area_struct *vma;
 	unsigned long end;
+	int err;
+	struct mm_walk mincore_walk = {
+		.pmd_entry = mincore_pmd,
+		.pte_entry = mincore_pte,
+		.pte_hole = mincore_hole,
+		.hugetlb_entry = mincore_hugetlb,
+		.private = vec,
+	};
 
 	vma = find_vma(current->mm, addr);
 	if (!vma || addr < vma->vm_start)
 		return -ENOMEM;
-
+	mincore_walk.mm = vma->vm_mm;
+	mincore_walk.vma = vma;
 	end = min(vma->vm_end, addr + (pages << PAGE_SHIFT));
 
-	if (is_vm_hugetlb_page(vma))
-		mincore_hugetlb_page_range(vma, addr, end, vec);
+	err = walk_page_vma(vma, &mincore_walk);
+	if (err < 0)
+		return err;
 	else
-		mincore_page_range(vma, addr, end, vec);
-
-	return (end - addr) >> PAGE_SHIFT;
+		return (end - addr) >> PAGE_SHIFT;
 }
 
 /*
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH -mm v2 00/11] pagewalk: standardize current users, move pmd locking, apply to mincore
  2014-06-12 21:48 ` Naoya Horiguchi
@ 2014-06-12 21:56   ` Andrew Morton
  -1 siblings, 0 replies; 42+ messages in thread
From: Andrew Morton @ 2014-06-12 21:56 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Dave Hansen, Hugh Dickins, Kirill A. Shutemov, linux-kernel

On Thu, 12 Jun 2014 17:48:00 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:

> This is ver.2 of page table walker patchset.
> 
> I move forward on this cleanup work, and added some improvement from the
> previous version. Major changes are:
>  - removed walk->skip which becomes removable due to refactoring existing
>    users
>  - commonalized the argments of entry handlers (pte|pmd|hugetlb)_entry()
>    which allows us to use the same function as multiple handlers.
> 

Are you sure you didn't miss anything?

mm-hugetlbfs-fix-rmapping-for-anonymous-hugepages-with-page_pgoff.patch
mm-hugetlbfs-fix-rmapping-for-anonymous-hugepages-with-page_pgoff-v2.patch
mm-hugetlbfs-fix-rmapping-for-anonymous-hugepages-with-page_pgoff-v3.patch
mm-hugetlbfs-fix-rmapping-for-anonymous-hugepages-with-page_pgoff-v3-fix.patch
pagewalk-update-page-table-walker-core.patch
pagewalk-update-page-table-walker-core-fix-end-address-calculation-in-walk_page_range.patch
pagewalk-update-page-table-walker-core-fix-end-address-calculation-in-walk_page_range-fix.patch
pagewalk-update-page-table-walker-core-fix.patch
pagewalk-add-walk_page_vma.patch
smaps-redefine-callback-functions-for-page-table-walker.patch
clear_refs-redefine-callback-functions-for-page-table-walker.patch
pagemap-redefine-callback-functions-for-page-table-walker.patch
pagemap-redefine-callback-functions-for-page-table-walker-fix.patch
numa_maps-redefine-callback-functions-for-page-table-walker.patch
memcg-redefine-callback-functions-for-page-table-walker.patch
arch-powerpc-mm-subpage-protc-use-walk_page_vma-instead-of-walk_page_range.patch
pagewalk-remove-argument-hmask-from-hugetlb_entry.patch
pagewalk-remove-argument-hmask-from-hugetlb_entry-fix.patch
pagewalk-remove-argument-hmask-from-hugetlb_entry-fix-fix.patch
mempolicy-apply-page-table-walker-on-queue_pages_range.patch
mm-pagewalkc-move-pte-null-check.patch
mm-prom-pid-clear_refs-avoid-split_huge_page.patch
#
mm-pagewalk-remove-pgd_entry-and-pud_entry.patch
mm-pagewalk-replace-mm_walk-skip-with-more-general-mm_walk-control.patch
madvise-cleanup-swapin_walk_pmd_entry.patch
memcg-separate-mem_cgroup_move_charge_pte_range.patch
memcg-separate-mem_cgroup_move_charge_pte_range-checkpatch-fixes.patch
arch-powerpc-mm-subpage-protc-cleanup-subpage_walk_pmd_entry.patch
mm-pagewalk-move-pmd_trans_huge_lock-from-callbacks-to-common-code.patch
mm-pagewalk-move-pmd_trans_huge_lock-from-callbacks-to-common-code-checkpatch-fixes.patch
mincore-apply-page-table-walker-on-do_mincore.patch


mm-hugetlbfs-fix-rmapping-for-anonymous-hugepages-with-page_pgoff.patch
apepars to have disappeared, I didn't check the rest.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH -mm v2 00/11] pagewalk: standardize current users, move pmd locking, apply to mincore
@ 2014-06-12 21:56   ` Andrew Morton
  0 siblings, 0 replies; 42+ messages in thread
From: Andrew Morton @ 2014-06-12 21:56 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Dave Hansen, Hugh Dickins, Kirill A. Shutemov, linux-kernel

On Thu, 12 Jun 2014 17:48:00 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:

> This is ver.2 of page table walker patchset.
> 
> I move forward on this cleanup work, and added some improvement from the
> previous version. Major changes are:
>  - removed walk->skip which becomes removable due to refactoring existing
>    users
>  - commonalized the argments of entry handlers (pte|pmd|hugetlb)_entry()
>    which allows us to use the same function as multiple handlers.
> 

Are you sure you didn't miss anything?

mm-hugetlbfs-fix-rmapping-for-anonymous-hugepages-with-page_pgoff.patch
mm-hugetlbfs-fix-rmapping-for-anonymous-hugepages-with-page_pgoff-v2.patch
mm-hugetlbfs-fix-rmapping-for-anonymous-hugepages-with-page_pgoff-v3.patch
mm-hugetlbfs-fix-rmapping-for-anonymous-hugepages-with-page_pgoff-v3-fix.patch
pagewalk-update-page-table-walker-core.patch
pagewalk-update-page-table-walker-core-fix-end-address-calculation-in-walk_page_range.patch
pagewalk-update-page-table-walker-core-fix-end-address-calculation-in-walk_page_range-fix.patch
pagewalk-update-page-table-walker-core-fix.patch
pagewalk-add-walk_page_vma.patch
smaps-redefine-callback-functions-for-page-table-walker.patch
clear_refs-redefine-callback-functions-for-page-table-walker.patch
pagemap-redefine-callback-functions-for-page-table-walker.patch
pagemap-redefine-callback-functions-for-page-table-walker-fix.patch
numa_maps-redefine-callback-functions-for-page-table-walker.patch
memcg-redefine-callback-functions-for-page-table-walker.patch
arch-powerpc-mm-subpage-protc-use-walk_page_vma-instead-of-walk_page_range.patch
pagewalk-remove-argument-hmask-from-hugetlb_entry.patch
pagewalk-remove-argument-hmask-from-hugetlb_entry-fix.patch
pagewalk-remove-argument-hmask-from-hugetlb_entry-fix-fix.patch
mempolicy-apply-page-table-walker-on-queue_pages_range.patch
mm-pagewalkc-move-pte-null-check.patch
mm-prom-pid-clear_refs-avoid-split_huge_page.patch
#
mm-pagewalk-remove-pgd_entry-and-pud_entry.patch
mm-pagewalk-replace-mm_walk-skip-with-more-general-mm_walk-control.patch
madvise-cleanup-swapin_walk_pmd_entry.patch
memcg-separate-mem_cgroup_move_charge_pte_range.patch
memcg-separate-mem_cgroup_move_charge_pte_range-checkpatch-fixes.patch
arch-powerpc-mm-subpage-protc-cleanup-subpage_walk_pmd_entry.patch
mm-pagewalk-move-pmd_trans_huge_lock-from-callbacks-to-common-code.patch
mm-pagewalk-move-pmd_trans_huge_lock-from-callbacks-to-common-code-checkpatch-fixes.patch
mincore-apply-page-table-walker-on-do_mincore.patch


mm-hugetlbfs-fix-rmapping-for-anonymous-hugepages-with-page_pgoff.patch
apepars to have disappeared, I didn't check the rest.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH -mm v2 06/11] pagewalk: add size to struct mm_walk
  2014-06-12 21:48   ` Naoya Horiguchi
@ 2014-06-12 22:07     ` Dave Hansen
  -1 siblings, 0 replies; 42+ messages in thread
From: Dave Hansen @ 2014-06-12 22:07 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm
  Cc: Andrew Morton, Hugh Dickins, Kirill A. Shutemov, linux-kernel

On 06/12/2014 02:48 PM, Naoya Horiguchi wrote:
> This variable is helpful if we try to share the callback function between
> multiple slots (for example between pte_entry() and pmd_entry()) as done
> in later patches.

smaps_pte() already does this:

static int smaps_pte(pte_t *pte, unsigned long addr, unsigned long end,
                        struct mm_walk *walk)
...
        unsigned long ptent_size = end - addr;

Other than the hugetlb handler, can't we always imply the size from
end-addr?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH -mm v2 06/11] pagewalk: add size to struct mm_walk
@ 2014-06-12 22:07     ` Dave Hansen
  0 siblings, 0 replies; 42+ messages in thread
From: Dave Hansen @ 2014-06-12 22:07 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm
  Cc: Andrew Morton, Hugh Dickins, Kirill A. Shutemov, linux-kernel

On 06/12/2014 02:48 PM, Naoya Horiguchi wrote:
> This variable is helpful if we try to share the callback function between
> multiple slots (for example between pte_entry() and pmd_entry()) as done
> in later patches.

smaps_pte() already does this:

static int smaps_pte(pte_t *pte, unsigned long addr, unsigned long end,
                        struct mm_walk *walk)
...
        unsigned long ptent_size = end - addr;

Other than the hugetlb handler, can't we always imply the size from
end-addr?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH -mm v2 00/11] pagewalk: standardize current users, move pmd locking, apply to mincore
  2014-06-12 21:56   ` Andrew Morton
  (?)
@ 2014-06-12 22:21   ` Naoya Horiguchi
  -1 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 22:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Dave Hansen, Hugh Dickins, Kirill A. Shutemov, linux-kernel

On Thu, Jun 12, 2014 at 02:56:17PM -0700, Andrew Morton wrote:
> On Thu, 12 Jun 2014 17:48:00 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> 
> > This is ver.2 of page table walker patchset.
> > 
> > I move forward on this cleanup work, and added some improvement from the
> > previous version. Major changes are:
> >  - removed walk->skip which becomes removable due to refactoring existing
> >    users
> >  - commonalized the argments of entry handlers (pte|pmd|hugetlb)_entry()
> >    which allows us to use the same function as multiple handlers.
> > 
> 
> Are you sure you didn't miss anything?
> 
> mm-hugetlbfs-fix-rmapping-for-anonymous-hugepages-with-page_pgoff.patch
> mm-hugetlbfs-fix-rmapping-for-anonymous-hugepages-with-page_pgoff-v2.patch
> mm-hugetlbfs-fix-rmapping-for-anonymous-hugepages-with-page_pgoff-v3.patch
> mm-hugetlbfs-fix-rmapping-for-anonymous-hugepages-with-page_pgoff-v3-fix.patch
> pagewalk-update-page-table-walker-core.patch
> pagewalk-update-page-table-walker-core-fix-end-address-calculation-in-walk_page_range.patch
> pagewalk-update-page-table-walker-core-fix-end-address-calculation-in-walk_page_range-fix.patch
> pagewalk-update-page-table-walker-core-fix.patch
> pagewalk-add-walk_page_vma.patch
> smaps-redefine-callback-functions-for-page-table-walker.patch
> clear_refs-redefine-callback-functions-for-page-table-walker.patch
> pagemap-redefine-callback-functions-for-page-table-walker.patch
> pagemap-redefine-callback-functions-for-page-table-walker-fix.patch
> numa_maps-redefine-callback-functions-for-page-table-walker.patch
> memcg-redefine-callback-functions-for-page-table-walker.patch
> arch-powerpc-mm-subpage-protc-use-walk_page_vma-instead-of-walk_page_range.patch
> pagewalk-remove-argument-hmask-from-hugetlb_entry.patch
> pagewalk-remove-argument-hmask-from-hugetlb_entry-fix.patch
> pagewalk-remove-argument-hmask-from-hugetlb_entry-fix-fix.patch
> mempolicy-apply-page-table-walker-on-queue_pages_range.patch
> mm-pagewalkc-move-pte-null-check.patch

This patchset is based on mmotm-2014-05-21-16, so supposed to be applied
on top of the above patches.

> mm-prom-pid-clear_refs-avoid-split_huge_page.patch

I didn't assume this patch as a base, so if my patchset conflicts with it,
please let me know.

> #
> mm-pagewalk-remove-pgd_entry-and-pud_entry.patch
> mm-pagewalk-replace-mm_walk-skip-with-more-general-mm_walk-control.patch
> madvise-cleanup-swapin_walk_pmd_entry.patch
> memcg-separate-mem_cgroup_move_charge_pte_range.patch
> memcg-separate-mem_cgroup_move_charge_pte_range-checkpatch-fixes.patch
> arch-powerpc-mm-subpage-protc-cleanup-subpage_walk_pmd_entry.patch
> mm-pagewalk-move-pmd_trans_huge_lock-from-callbacks-to-common-code.patch
> mm-pagewalk-move-pmd_trans_huge_lock-from-callbacks-to-common-code-checkpatch-fixes.patch
> mincore-apply-page-table-walker-on-do_mincore.patch

Yes, the current post is the revision of these patches.

And to be precise, this patchset also depends on a patch which I posted today
for the fix discussed in the thread about "kmemleak: Unable to handle kernel
paging request." So it might be a source of conflict.

Sorry if it's confusing.

Thanks,
Naoya Horiguchi

> 
> mm-hugetlbfs-fix-rmapping-for-anonymous-hugepages-with-page_pgoff.patch
> apepars to have disappeared, I didn't check the rest.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH -mm v2 06/11] pagewalk: add size to struct mm_walk
  2014-06-12 22:07     ` Dave Hansen
  (?)
@ 2014-06-12 22:36     ` Naoya Horiguchi
  -1 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 22:36 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, Andrew Morton, Hugh Dickins, Kirill A. Shutemov, linux-kernel

Hello Dave,

On Thu, Jun 12, 2014 at 03:07:11PM -0700, Dave Hansen wrote:
> On 06/12/2014 02:48 PM, Naoya Horiguchi wrote:
> > This variable is helpful if we try to share the callback function between
> > multiple slots (for example between pte_entry() and pmd_entry()) as done
> > in later patches.
> 
> smaps_pte() already does this:
> 
> static int smaps_pte(pte_t *pte, unsigned long addr, unsigned long end,
>                         struct mm_walk *walk)
> ...
>         unsigned long ptent_size = end - addr;
> 
> Other than the hugetlb handler, can't we always imply the size from
> end-addr?

Good point, thanks. I didn't care about this variable.

Currently we call this walk via walk_page_vma() so addr and end is
always between [vma->vm_start, vma->vm_end]. If a vma is not aligned
to pmd-boundary, this size might have incorrect value.
But using end-addr seems to cause no practical problem because in
such case first or final pmd never have a thp.
I'm not sure every caller (especially callers of walk_page_range())
assumes addr/end is page aligned, but walk->size approach looks safer
to me.

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH -mm v2 02/11] madvise: cleanup swapin_walk_pmd_entry()
  2014-06-12 21:48   ` Naoya Horiguchi
@ 2014-06-15 20:24     ` Hugh Dickins
  -1 siblings, 0 replies; 42+ messages in thread
From: Hugh Dickins @ 2014-06-15 20:24 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Dave Hansen, Andrew Morton, Hugh Dickins,
	Kirill A. Shutemov, linux-kernel

On Thu, 12 Jun 2014, Naoya Horiguchi wrote:

> With the recent update on page table walker, we can use common code for
> the walking more. Unlike many other users, this swapin_walk expects to
> handle swap entries. As a result we should be careful about ptl locking.
> Swapin operation, read_swap_cache_async(), could cause page reclaim, so
> we can't keep holding ptl throughout this pte loop.
> In order to properly handle ptl in pte_entry(), this patch adds two new
> members on struct mm_walk.
> 
> This cleanup is necessary to get to the final form of page table walker,
> where we should do all caller's specific work on leaf entries (IOW, all
> pmd_entry() should be used for trans_pmd.)
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Hugh Dickins <hughd@google.com>

Sorry, I believe this (and probably several other of your conversions)
is badly flawed.

You have a pattern of doing pte_offset_map_lock() inside the page walker,
then dropping and regetting map and lock inside your pte handler.

But on, say, x86_32 with CONFIG_HIGHMEM, CONFIG_SMP and CONFIG_PREEMPT,
you may be preempted then run on a different cpu while atomic kmap and
lock are dropped: so that the pte pointer then used on return to
walk_pte_range() will no longer correspond to the right mapping.

Presumably that can be fixed by keeping the pte pointer in the mm_walk
structure; but I'm not at all sure that's the right thing to do.

I am not nearly so keen as you to reduce all these to per-pte callouts,
which seem inefficient to me.  It can be argued both ways on the less
important functions (like this madvise one); but I hope you don't try
to make this kind of conversion to fast paths like those in memory.c.

Hugh

> ---
>  include/linux/mm.h |  4 ++++
>  mm/madvise.c       | 54 +++++++++++++++++++++++-------------------------------
>  mm/pagewalk.c      | 11 +++++------
>  3 files changed, 32 insertions(+), 37 deletions(-)
> 
> diff --git mmotm-2014-05-21-16-57.orig/include/linux/mm.h mmotm-2014-05-21-16-57/include/linux/mm.h
> index b4aa6579f2b1..aa832161a1ff 100644
> --- mmotm-2014-05-21-16-57.orig/include/linux/mm.h
> +++ mmotm-2014-05-21-16-57/include/linux/mm.h
> @@ -1108,6 +1108,8 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
>   * @vma:       vma currently walked
>   * @skip:      internal control flag which is set when we skip the lower
>   *             level entries.
> + * @pmd:       current pmd entry
> + * @ptl:       page table lock associated with current entry
>   * @private:   private data for callbacks' use
>   *
>   * (see the comment on walk_page_range() for more details)
> @@ -1126,6 +1128,8 @@ struct mm_walk {
>  	struct mm_struct *mm;
>  	struct vm_area_struct *vma;
>  	int skip;
> +	pmd_t *pmd;
> +	spinlock_t *ptl;
>  	void *private;
>  };
>  
> diff --git mmotm-2014-05-21-16-57.orig/mm/madvise.c mmotm-2014-05-21-16-57/mm/madvise.c
> index a402f8fdc68e..06b390a6fbbd 100644
> --- mmotm-2014-05-21-16-57.orig/mm/madvise.c
> +++ mmotm-2014-05-21-16-57/mm/madvise.c
> @@ -135,38 +135,31 @@ static long madvise_behavior(struct vm_area_struct *vma,
>  }
>  
>  #ifdef CONFIG_SWAP
> -static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
> +/*
> + * Assuming that page table walker holds page table lock.
> + */
> +static int swapin_walk_pte_entry(pte_t *pte, unsigned long start,
>  	unsigned long end, struct mm_walk *walk)
>  {
> -	pte_t *orig_pte;
> -	struct vm_area_struct *vma = walk->private;
> -	unsigned long index;
> -
> -	if (pmd_none_or_trans_huge_or_clear_bad(pmd))
> -		return 0;
> -
> -	for (index = start; index != end; index += PAGE_SIZE) {
> -		pte_t pte;
> -		swp_entry_t entry;
> -		struct page *page;
> -		spinlock_t *ptl;
> -
> -		orig_pte = pte_offset_map_lock(vma->vm_mm, pmd, start, &ptl);
> -		pte = *(orig_pte + ((index - start) / PAGE_SIZE));
> -		pte_unmap_unlock(orig_pte, ptl);
> -
> -		if (pte_present(pte) || pte_none(pte) || pte_file(pte))
> -			continue;
> -		entry = pte_to_swp_entry(pte);
> -		if (unlikely(non_swap_entry(entry)))
> -			continue;
> -
> -		page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE,
> -								vma, index);
> -		if (page)
> -			page_cache_release(page);
> -	}
> +	pte_t ptent;
> +	pte_t *orig_pte = pte - ((start & (PMD_SIZE - 1)) >> PAGE_SHIFT);
> +	swp_entry_t entry;
> +	struct page *page;
>  
> +	ptent = *pte;
> +	pte_unmap_unlock(orig_pte, walk->ptl);
> +	if (pte_present(ptent) || pte_none(ptent) || pte_file(ptent))
> +		goto lock;
> +	entry = pte_to_swp_entry(ptent);
> +	if (unlikely(non_swap_entry(entry)))
> +		goto lock;
> +	page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE,
> +				     walk->vma, start);
> +	if (page)
> +		page_cache_release(page);
> +lock:
> +	pte_offset_map(walk->pmd, start & PMD_MASK);
> +	spin_lock(walk->ptl);
>  	return 0;
>  }
>  
> @@ -175,8 +168,7 @@ static void force_swapin_readahead(struct vm_area_struct *vma,
>  {
>  	struct mm_walk walk = {
>  		.mm = vma->vm_mm,
> -		.pmd_entry = swapin_walk_pmd_entry,
> -		.private = vma,
> +		.pte_entry = swapin_walk_pte_entry,
>  	};
>  
>  	walk_page_range(start, end, &walk);
> diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
> index e734f63276c2..24311d6f5c20 100644
> --- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
> +++ mmotm-2014-05-21-16-57/mm/pagewalk.c
> @@ -27,10 +27,10 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr,
>  	struct mm_struct *mm = walk->mm;
>  	pte_t *pte;
>  	pte_t *orig_pte;
> -	spinlock_t *ptl;
>  	int err = 0;
>  
> -	orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
> +	walk->pmd = pmd;
> +	orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &walk->ptl);
>  	do {
>  		if (pte_none(*pte)) {
>  			if (walk->pte_hole)
> @@ -48,7 +48,7 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr,
>  		if (err)
>  		       break;
>  	} while (pte++, addr += PAGE_SIZE, addr < end);
> -	pte_unmap_unlock(orig_pte, ptl);
> +	pte_unmap_unlock(orig_pte, walk->ptl);
>  	cond_resched();
>  	return addr == end ? 0 : err;
>  }
> @@ -172,7 +172,6 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
>  	unsigned long hmask = huge_page_mask(h);
>  	pte_t *pte;
>  	int err = 0;
> -	spinlock_t *ptl;
>  
>  	do {
>  		next = hugetlb_entry_end(h, addr, end);
> @@ -186,14 +185,14 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
>  				break;
>  			continue;
>  		}
> -		ptl = huge_pte_lock(h, mm, pte);
> +		walk->ptl = huge_pte_lock(h, mm, pte);
>  		/*
>  		 * Callers should have their own way to handle swap entries
>  		 * in walk->hugetlb_entry().
>  		 */
>  		if (walk->hugetlb_entry)
>  			err = walk->hugetlb_entry(pte, addr, next, walk);
> -		spin_unlock(ptl);
> +		spin_unlock(walk->ptl);
>  		if (err)
>  			break;
>  	} while (addr = next, addr != end);
> -- 
> 1.9.3
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH -mm v2 02/11] madvise: cleanup swapin_walk_pmd_entry()
@ 2014-06-15 20:24     ` Hugh Dickins
  0 siblings, 0 replies; 42+ messages in thread
From: Hugh Dickins @ 2014-06-15 20:24 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Dave Hansen, Andrew Morton, Hugh Dickins,
	Kirill A. Shutemov, linux-kernel

On Thu, 12 Jun 2014, Naoya Horiguchi wrote:

> With the recent update on page table walker, we can use common code for
> the walking more. Unlike many other users, this swapin_walk expects to
> handle swap entries. As a result we should be careful about ptl locking.
> Swapin operation, read_swap_cache_async(), could cause page reclaim, so
> we can't keep holding ptl throughout this pte loop.
> In order to properly handle ptl in pte_entry(), this patch adds two new
> members on struct mm_walk.
> 
> This cleanup is necessary to get to the final form of page table walker,
> where we should do all caller's specific work on leaf entries (IOW, all
> pmd_entry() should be used for trans_pmd.)
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Hugh Dickins <hughd@google.com>

Sorry, I believe this (and probably several other of your conversions)
is badly flawed.

You have a pattern of doing pte_offset_map_lock() inside the page walker,
then dropping and regetting map and lock inside your pte handler.

But on, say, x86_32 with CONFIG_HIGHMEM, CONFIG_SMP and CONFIG_PREEMPT,
you may be preempted then run on a different cpu while atomic kmap and
lock are dropped: so that the pte pointer then used on return to
walk_pte_range() will no longer correspond to the right mapping.

Presumably that can be fixed by keeping the pte pointer in the mm_walk
structure; but I'm not at all sure that's the right thing to do.

I am not nearly so keen as you to reduce all these to per-pte callouts,
which seem inefficient to me.  It can be argued both ways on the less
important functions (like this madvise one); but I hope you don't try
to make this kind of conversion to fast paths like those in memory.c.

Hugh

> ---
>  include/linux/mm.h |  4 ++++
>  mm/madvise.c       | 54 +++++++++++++++++++++++-------------------------------
>  mm/pagewalk.c      | 11 +++++------
>  3 files changed, 32 insertions(+), 37 deletions(-)
> 
> diff --git mmotm-2014-05-21-16-57.orig/include/linux/mm.h mmotm-2014-05-21-16-57/include/linux/mm.h
> index b4aa6579f2b1..aa832161a1ff 100644
> --- mmotm-2014-05-21-16-57.orig/include/linux/mm.h
> +++ mmotm-2014-05-21-16-57/include/linux/mm.h
> @@ -1108,6 +1108,8 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
>   * @vma:       vma currently walked
>   * @skip:      internal control flag which is set when we skip the lower
>   *             level entries.
> + * @pmd:       current pmd entry
> + * @ptl:       page table lock associated with current entry
>   * @private:   private data for callbacks' use
>   *
>   * (see the comment on walk_page_range() for more details)
> @@ -1126,6 +1128,8 @@ struct mm_walk {
>  	struct mm_struct *mm;
>  	struct vm_area_struct *vma;
>  	int skip;
> +	pmd_t *pmd;
> +	spinlock_t *ptl;
>  	void *private;
>  };
>  
> diff --git mmotm-2014-05-21-16-57.orig/mm/madvise.c mmotm-2014-05-21-16-57/mm/madvise.c
> index a402f8fdc68e..06b390a6fbbd 100644
> --- mmotm-2014-05-21-16-57.orig/mm/madvise.c
> +++ mmotm-2014-05-21-16-57/mm/madvise.c
> @@ -135,38 +135,31 @@ static long madvise_behavior(struct vm_area_struct *vma,
>  }
>  
>  #ifdef CONFIG_SWAP
> -static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
> +/*
> + * Assuming that page table walker holds page table lock.
> + */
> +static int swapin_walk_pte_entry(pte_t *pte, unsigned long start,
>  	unsigned long end, struct mm_walk *walk)
>  {
> -	pte_t *orig_pte;
> -	struct vm_area_struct *vma = walk->private;
> -	unsigned long index;
> -
> -	if (pmd_none_or_trans_huge_or_clear_bad(pmd))
> -		return 0;
> -
> -	for (index = start; index != end; index += PAGE_SIZE) {
> -		pte_t pte;
> -		swp_entry_t entry;
> -		struct page *page;
> -		spinlock_t *ptl;
> -
> -		orig_pte = pte_offset_map_lock(vma->vm_mm, pmd, start, &ptl);
> -		pte = *(orig_pte + ((index - start) / PAGE_SIZE));
> -		pte_unmap_unlock(orig_pte, ptl);
> -
> -		if (pte_present(pte) || pte_none(pte) || pte_file(pte))
> -			continue;
> -		entry = pte_to_swp_entry(pte);
> -		if (unlikely(non_swap_entry(entry)))
> -			continue;
> -
> -		page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE,
> -								vma, index);
> -		if (page)
> -			page_cache_release(page);
> -	}
> +	pte_t ptent;
> +	pte_t *orig_pte = pte - ((start & (PMD_SIZE - 1)) >> PAGE_SHIFT);
> +	swp_entry_t entry;
> +	struct page *page;
>  
> +	ptent = *pte;
> +	pte_unmap_unlock(orig_pte, walk->ptl);
> +	if (pte_present(ptent) || pte_none(ptent) || pte_file(ptent))
> +		goto lock;
> +	entry = pte_to_swp_entry(ptent);
> +	if (unlikely(non_swap_entry(entry)))
> +		goto lock;
> +	page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE,
> +				     walk->vma, start);
> +	if (page)
> +		page_cache_release(page);
> +lock:
> +	pte_offset_map(walk->pmd, start & PMD_MASK);
> +	spin_lock(walk->ptl);
>  	return 0;
>  }
>  
> @@ -175,8 +168,7 @@ static void force_swapin_readahead(struct vm_area_struct *vma,
>  {
>  	struct mm_walk walk = {
>  		.mm = vma->vm_mm,
> -		.pmd_entry = swapin_walk_pmd_entry,
> -		.private = vma,
> +		.pte_entry = swapin_walk_pte_entry,
>  	};
>  
>  	walk_page_range(start, end, &walk);
> diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
> index e734f63276c2..24311d6f5c20 100644
> --- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
> +++ mmotm-2014-05-21-16-57/mm/pagewalk.c
> @@ -27,10 +27,10 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr,
>  	struct mm_struct *mm = walk->mm;
>  	pte_t *pte;
>  	pte_t *orig_pte;
> -	spinlock_t *ptl;
>  	int err = 0;
>  
> -	orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
> +	walk->pmd = pmd;
> +	orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &walk->ptl);
>  	do {
>  		if (pte_none(*pte)) {
>  			if (walk->pte_hole)
> @@ -48,7 +48,7 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr,
>  		if (err)
>  		       break;
>  	} while (pte++, addr += PAGE_SIZE, addr < end);
> -	pte_unmap_unlock(orig_pte, ptl);
> +	pte_unmap_unlock(orig_pte, walk->ptl);
>  	cond_resched();
>  	return addr == end ? 0 : err;
>  }
> @@ -172,7 +172,6 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
>  	unsigned long hmask = huge_page_mask(h);
>  	pte_t *pte;
>  	int err = 0;
> -	spinlock_t *ptl;
>  
>  	do {
>  		next = hugetlb_entry_end(h, addr, end);
> @@ -186,14 +185,14 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
>  				break;
>  			continue;
>  		}
> -		ptl = huge_pte_lock(h, mm, pte);
> +		walk->ptl = huge_pte_lock(h, mm, pte);
>  		/*
>  		 * Callers should have their own way to handle swap entries
>  		 * in walk->hugetlb_entry().
>  		 */
>  		if (walk->hugetlb_entry)
>  			err = walk->hugetlb_entry(pte, addr, next, walk);
> -		spin_unlock(ptl);
> +		spin_unlock(walk->ptl);
>  		if (err)
>  			break;
>  	} while (addr = next, addr != end);
> -- 
> 1.9.3
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH -mm v2 02/11] madvise: cleanup swapin_walk_pmd_entry()
  2014-06-15 20:24     ` Hugh Dickins
@ 2014-06-16 15:59       ` Naoya Horiguchi
  -1 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-16 15:59 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-mm, Dave Hansen, Andrew Morton, Kirill A. Shutemov, linux-kernel

On Sun, Jun 15, 2014 at 01:24:30PM -0700, Hugh Dickins wrote:
> On Thu, 12 Jun 2014, Naoya Horiguchi wrote:
> 
> > With the recent update on page table walker, we can use common code for
> > the walking more. Unlike many other users, this swapin_walk expects to
> > handle swap entries. As a result we should be careful about ptl locking.
> > Swapin operation, read_swap_cache_async(), could cause page reclaim, so
> > we can't keep holding ptl throughout this pte loop.
> > In order to properly handle ptl in pte_entry(), this patch adds two new
> > members on struct mm_walk.
> > 
> > This cleanup is necessary to get to the final form of page table walker,
> > where we should do all caller's specific work on leaf entries (IOW, all
> > pmd_entry() should be used for trans_pmd.)
> > 
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Cc: Hugh Dickins <hughd@google.com>
> 
> Sorry, I believe this (and probably several other of your conversions)
> is badly flawed.
> 
> You have a pattern of doing pte_offset_map_lock() inside the page walker,
> then dropping and regetting map and lock inside your pte handler.
> 
> But on, say, x86_32 with CONFIG_HIGHMEM, CONFIG_SMP and CONFIG_PREEMPT,
> you may be preempted then run on a different cpu while atomic kmap and
> lock are dropped: so that the pte pointer then used on return to
> walk_pte_range() will no longer correspond to the right mapping.

Sorry, I didn't handle it correctly.

> Presumably that can be fixed by keeping the pte pointer in the mm_walk
> structure; but I'm not at all sure that's the right thing to do.

orig_pte should be updated if we call pte_offset_map_lock() or pte_offset_map()
inside the callback. So moving orig_pte into mm_walk is what I think I'll
do in the next post.

> I am not nearly so keen as you to reduce all these to per-pte callouts,
> which seem inefficient to me.

Right, we can't do inlining for callbacks, so calling callbacks more is
slower than open code. To make it better, code generator which creates
open page walk code for each users at build time might be one option.
Standardization done in patchset is the first step for doing it.

>  It can be argued both ways on the less
> important functions (like this madvise one); but I hope you don't try
> to make this kind of conversion to fast paths like those in memory.c.

OK. I never touch fast paths until performance concern is solved.

Thanks,
Naoya Horiguchi

> Hugh
> 
> > ---
> >  include/linux/mm.h |  4 ++++
> >  mm/madvise.c       | 54 +++++++++++++++++++++++-------------------------------
> >  mm/pagewalk.c      | 11 +++++------
> >  3 files changed, 32 insertions(+), 37 deletions(-)
> > 
> > diff --git mmotm-2014-05-21-16-57.orig/include/linux/mm.h mmotm-2014-05-21-16-57/include/linux/mm.h
> > index b4aa6579f2b1..aa832161a1ff 100644
> > --- mmotm-2014-05-21-16-57.orig/include/linux/mm.h
> > +++ mmotm-2014-05-21-16-57/include/linux/mm.h
> > @@ -1108,6 +1108,8 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
> >   * @vma:       vma currently walked
> >   * @skip:      internal control flag which is set when we skip the lower
> >   *             level entries.
> > + * @pmd:       current pmd entry
> > + * @ptl:       page table lock associated with current entry
> >   * @private:   private data for callbacks' use
> >   *
> >   * (see the comment on walk_page_range() for more details)
> > @@ -1126,6 +1128,8 @@ struct mm_walk {
> >  	struct mm_struct *mm;
> >  	struct vm_area_struct *vma;
> >  	int skip;
> > +	pmd_t *pmd;
> > +	spinlock_t *ptl;
> >  	void *private;
> >  };
> >  
> > diff --git mmotm-2014-05-21-16-57.orig/mm/madvise.c mmotm-2014-05-21-16-57/mm/madvise.c
> > index a402f8fdc68e..06b390a6fbbd 100644
> > --- mmotm-2014-05-21-16-57.orig/mm/madvise.c
> > +++ mmotm-2014-05-21-16-57/mm/madvise.c
> > @@ -135,38 +135,31 @@ static long madvise_behavior(struct vm_area_struct *vma,
> >  }
> >  
> >  #ifdef CONFIG_SWAP
> > -static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
> > +/*
> > + * Assuming that page table walker holds page table lock.
> > + */
> > +static int swapin_walk_pte_entry(pte_t *pte, unsigned long start,
> >  	unsigned long end, struct mm_walk *walk)
> >  {
> > -	pte_t *orig_pte;
> > -	struct vm_area_struct *vma = walk->private;
> > -	unsigned long index;
> > -
> > -	if (pmd_none_or_trans_huge_or_clear_bad(pmd))
> > -		return 0;
> > -
> > -	for (index = start; index != end; index += PAGE_SIZE) {
> > -		pte_t pte;
> > -		swp_entry_t entry;
> > -		struct page *page;
> > -		spinlock_t *ptl;
> > -
> > -		orig_pte = pte_offset_map_lock(vma->vm_mm, pmd, start, &ptl);
> > -		pte = *(orig_pte + ((index - start) / PAGE_SIZE));
> > -		pte_unmap_unlock(orig_pte, ptl);
> > -
> > -		if (pte_present(pte) || pte_none(pte) || pte_file(pte))
> > -			continue;
> > -		entry = pte_to_swp_entry(pte);
> > -		if (unlikely(non_swap_entry(entry)))
> > -			continue;
> > -
> > -		page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE,
> > -								vma, index);
> > -		if (page)
> > -			page_cache_release(page);
> > -	}
> > +	pte_t ptent;
> > +	pte_t *orig_pte = pte - ((start & (PMD_SIZE - 1)) >> PAGE_SHIFT);
> > +	swp_entry_t entry;
> > +	struct page *page;
> >  
> > +	ptent = *pte;
> > +	pte_unmap_unlock(orig_pte, walk->ptl);
> > +	if (pte_present(ptent) || pte_none(ptent) || pte_file(ptent))
> > +		goto lock;
> > +	entry = pte_to_swp_entry(ptent);
> > +	if (unlikely(non_swap_entry(entry)))
> > +		goto lock;
> > +	page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE,
> > +				     walk->vma, start);
> > +	if (page)
> > +		page_cache_release(page);
> > +lock:
> > +	pte_offset_map(walk->pmd, start & PMD_MASK);
> > +	spin_lock(walk->ptl);
> >  	return 0;
> >  }
> >  
> > @@ -175,8 +168,7 @@ static void force_swapin_readahead(struct vm_area_struct *vma,
> >  {
> >  	struct mm_walk walk = {
> >  		.mm = vma->vm_mm,
> > -		.pmd_entry = swapin_walk_pmd_entry,
> > -		.private = vma,
> > +		.pte_entry = swapin_walk_pte_entry,
> >  	};
> >  
> >  	walk_page_range(start, end, &walk);
> > diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
> > index e734f63276c2..24311d6f5c20 100644
> > --- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
> > +++ mmotm-2014-05-21-16-57/mm/pagewalk.c
> > @@ -27,10 +27,10 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr,
> >  	struct mm_struct *mm = walk->mm;
> >  	pte_t *pte;
> >  	pte_t *orig_pte;
> > -	spinlock_t *ptl;
> >  	int err = 0;
> >  
> > -	orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
> > +	walk->pmd = pmd;
> > +	orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &walk->ptl);
> >  	do {
> >  		if (pte_none(*pte)) {
> >  			if (walk->pte_hole)
> > @@ -48,7 +48,7 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr,
> >  		if (err)
> >  		       break;
> >  	} while (pte++, addr += PAGE_SIZE, addr < end);
> > -	pte_unmap_unlock(orig_pte, ptl);
> > +	pte_unmap_unlock(orig_pte, walk->ptl);
> >  	cond_resched();
> >  	return addr == end ? 0 : err;
> >  }
> > @@ -172,7 +172,6 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
> >  	unsigned long hmask = huge_page_mask(h);
> >  	pte_t *pte;
> >  	int err = 0;
> > -	spinlock_t *ptl;
> >  
> >  	do {
> >  		next = hugetlb_entry_end(h, addr, end);
> > @@ -186,14 +185,14 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
> >  				break;
> >  			continue;
> >  		}
> > -		ptl = huge_pte_lock(h, mm, pte);
> > +		walk->ptl = huge_pte_lock(h, mm, pte);
> >  		/*
> >  		 * Callers should have their own way to handle swap entries
> >  		 * in walk->hugetlb_entry().
> >  		 */
> >  		if (walk->hugetlb_entry)
> >  			err = walk->hugetlb_entry(pte, addr, next, walk);
> > -		spin_unlock(ptl);
> > +		spin_unlock(walk->ptl);
> >  		if (err)
> >  			break;
> >  	} while (addr = next, addr != end);
> > -- 
> > 1.9.3
> > 
> > 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH -mm v2 02/11] madvise: cleanup swapin_walk_pmd_entry()
@ 2014-06-16 15:59       ` Naoya Horiguchi
  0 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-16 15:59 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-mm, Dave Hansen, Andrew Morton, Kirill A. Shutemov, linux-kernel

On Sun, Jun 15, 2014 at 01:24:30PM -0700, Hugh Dickins wrote:
> On Thu, 12 Jun 2014, Naoya Horiguchi wrote:
> 
> > With the recent update on page table walker, we can use common code for
> > the walking more. Unlike many other users, this swapin_walk expects to
> > handle swap entries. As a result we should be careful about ptl locking.
> > Swapin operation, read_swap_cache_async(), could cause page reclaim, so
> > we can't keep holding ptl throughout this pte loop.
> > In order to properly handle ptl in pte_entry(), this patch adds two new
> > members on struct mm_walk.
> > 
> > This cleanup is necessary to get to the final form of page table walker,
> > where we should do all caller's specific work on leaf entries (IOW, all
> > pmd_entry() should be used for trans_pmd.)
> > 
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Cc: Hugh Dickins <hughd@google.com>
> 
> Sorry, I believe this (and probably several other of your conversions)
> is badly flawed.
> 
> You have a pattern of doing pte_offset_map_lock() inside the page walker,
> then dropping and regetting map and lock inside your pte handler.
> 
> But on, say, x86_32 with CONFIG_HIGHMEM, CONFIG_SMP and CONFIG_PREEMPT,
> you may be preempted then run on a different cpu while atomic kmap and
> lock are dropped: so that the pte pointer then used on return to
> walk_pte_range() will no longer correspond to the right mapping.

Sorry, I didn't handle it correctly.

> Presumably that can be fixed by keeping the pte pointer in the mm_walk
> structure; but I'm not at all sure that's the right thing to do.

orig_pte should be updated if we call pte_offset_map_lock() or pte_offset_map()
inside the callback. So moving orig_pte into mm_walk is what I think I'll
do in the next post.

> I am not nearly so keen as you to reduce all these to per-pte callouts,
> which seem inefficient to me.

Right, we can't do inlining for callbacks, so calling callbacks more is
slower than open code. To make it better, code generator which creates
open page walk code for each users at build time might be one option.
Standardization done in patchset is the first step for doing it.

>  It can be argued both ways on the less
> important functions (like this madvise one); but I hope you don't try
> to make this kind of conversion to fast paths like those in memory.c.

OK. I never touch fast paths until performance concern is solved.

Thanks,
Naoya Horiguchi

> Hugh
> 
> > ---
> >  include/linux/mm.h |  4 ++++
> >  mm/madvise.c       | 54 +++++++++++++++++++++++-------------------------------
> >  mm/pagewalk.c      | 11 +++++------
> >  3 files changed, 32 insertions(+), 37 deletions(-)
> > 
> > diff --git mmotm-2014-05-21-16-57.orig/include/linux/mm.h mmotm-2014-05-21-16-57/include/linux/mm.h
> > index b4aa6579f2b1..aa832161a1ff 100644
> > --- mmotm-2014-05-21-16-57.orig/include/linux/mm.h
> > +++ mmotm-2014-05-21-16-57/include/linux/mm.h
> > @@ -1108,6 +1108,8 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
> >   * @vma:       vma currently walked
> >   * @skip:      internal control flag which is set when we skip the lower
> >   *             level entries.
> > + * @pmd:       current pmd entry
> > + * @ptl:       page table lock associated with current entry
> >   * @private:   private data for callbacks' use
> >   *
> >   * (see the comment on walk_page_range() for more details)
> > @@ -1126,6 +1128,8 @@ struct mm_walk {
> >  	struct mm_struct *mm;
> >  	struct vm_area_struct *vma;
> >  	int skip;
> > +	pmd_t *pmd;
> > +	spinlock_t *ptl;
> >  	void *private;
> >  };
> >  
> > diff --git mmotm-2014-05-21-16-57.orig/mm/madvise.c mmotm-2014-05-21-16-57/mm/madvise.c
> > index a402f8fdc68e..06b390a6fbbd 100644
> > --- mmotm-2014-05-21-16-57.orig/mm/madvise.c
> > +++ mmotm-2014-05-21-16-57/mm/madvise.c
> > @@ -135,38 +135,31 @@ static long madvise_behavior(struct vm_area_struct *vma,
> >  }
> >  
> >  #ifdef CONFIG_SWAP
> > -static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
> > +/*
> > + * Assuming that page table walker holds page table lock.
> > + */
> > +static int swapin_walk_pte_entry(pte_t *pte, unsigned long start,
> >  	unsigned long end, struct mm_walk *walk)
> >  {
> > -	pte_t *orig_pte;
> > -	struct vm_area_struct *vma = walk->private;
> > -	unsigned long index;
> > -
> > -	if (pmd_none_or_trans_huge_or_clear_bad(pmd))
> > -		return 0;
> > -
> > -	for (index = start; index != end; index += PAGE_SIZE) {
> > -		pte_t pte;
> > -		swp_entry_t entry;
> > -		struct page *page;
> > -		spinlock_t *ptl;
> > -
> > -		orig_pte = pte_offset_map_lock(vma->vm_mm, pmd, start, &ptl);
> > -		pte = *(orig_pte + ((index - start) / PAGE_SIZE));
> > -		pte_unmap_unlock(orig_pte, ptl);
> > -
> > -		if (pte_present(pte) || pte_none(pte) || pte_file(pte))
> > -			continue;
> > -		entry = pte_to_swp_entry(pte);
> > -		if (unlikely(non_swap_entry(entry)))
> > -			continue;
> > -
> > -		page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE,
> > -								vma, index);
> > -		if (page)
> > -			page_cache_release(page);
> > -	}
> > +	pte_t ptent;
> > +	pte_t *orig_pte = pte - ((start & (PMD_SIZE - 1)) >> PAGE_SHIFT);
> > +	swp_entry_t entry;
> > +	struct page *page;
> >  
> > +	ptent = *pte;
> > +	pte_unmap_unlock(orig_pte, walk->ptl);
> > +	if (pte_present(ptent) || pte_none(ptent) || pte_file(ptent))
> > +		goto lock;
> > +	entry = pte_to_swp_entry(ptent);
> > +	if (unlikely(non_swap_entry(entry)))
> > +		goto lock;
> > +	page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE,
> > +				     walk->vma, start);
> > +	if (page)
> > +		page_cache_release(page);
> > +lock:
> > +	pte_offset_map(walk->pmd, start & PMD_MASK);
> > +	spin_lock(walk->ptl);
> >  	return 0;
> >  }
> >  
> > @@ -175,8 +168,7 @@ static void force_swapin_readahead(struct vm_area_struct *vma,
> >  {
> >  	struct mm_walk walk = {
> >  		.mm = vma->vm_mm,
> > -		.pmd_entry = swapin_walk_pmd_entry,
> > -		.private = vma,
> > +		.pte_entry = swapin_walk_pte_entry,
> >  	};
> >  
> >  	walk_page_range(start, end, &walk);
> > diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
> > index e734f63276c2..24311d6f5c20 100644
> > --- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
> > +++ mmotm-2014-05-21-16-57/mm/pagewalk.c
> > @@ -27,10 +27,10 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr,
> >  	struct mm_struct *mm = walk->mm;
> >  	pte_t *pte;
> >  	pte_t *orig_pte;
> > -	spinlock_t *ptl;
> >  	int err = 0;
> >  
> > -	orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
> > +	walk->pmd = pmd;
> > +	orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &walk->ptl);
> >  	do {
> >  		if (pte_none(*pte)) {
> >  			if (walk->pte_hole)
> > @@ -48,7 +48,7 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr,
> >  		if (err)
> >  		       break;
> >  	} while (pte++, addr += PAGE_SIZE, addr < end);
> > -	pte_unmap_unlock(orig_pte, ptl);
> > +	pte_unmap_unlock(orig_pte, walk->ptl);
> >  	cond_resched();
> >  	return addr == end ? 0 : err;
> >  }
> > @@ -172,7 +172,6 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
> >  	unsigned long hmask = huge_page_mask(h);
> >  	pte_t *pte;
> >  	int err = 0;
> > -	spinlock_t *ptl;
> >  
> >  	do {
> >  		next = hugetlb_entry_end(h, addr, end);
> > @@ -186,14 +185,14 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
> >  				break;
> >  			continue;
> >  		}
> > -		ptl = huge_pte_lock(h, mm, pte);
> > +		walk->ptl = huge_pte_lock(h, mm, pte);
> >  		/*
> >  		 * Callers should have their own way to handle swap entries
> >  		 * in walk->hugetlb_entry().
> >  		 */
> >  		if (walk->hugetlb_entry)
> >  			err = walk->hugetlb_entry(pte, addr, next, walk);
> > -		spin_unlock(ptl);
> > +		spin_unlock(walk->ptl);
> >  		if (err)
> >  			break;
> >  	} while (addr = next, addr != end);
> > -- 
> > 1.9.3
> > 
> > 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH -mm v2 04/11] pagewalk: move pmd_trans_huge_lock() from callbacks to common code
  2014-06-12 21:48   ` Naoya Horiguchi
@ 2014-06-17 14:27     ` Jerome Marchand
  -1 siblings, 0 replies; 42+ messages in thread
From: Jerome Marchand @ 2014-06-17 14:27 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

On 06/12/2014 11:48 PM, Naoya Horiguchi wrote:
> Now all of current users of page table walker are canonicalized, i.e.
> pmd_entry() handles only trans_pmd entry, and pte_entry() handles pte entry.
> So we can factorize common code more.
> This patch moves pmd_trans_huge_lock() in each pmd_entry() to pagewalk core.
> 
> ChangeLog v2:
> - add null check walk->vma in walk_pmd_range()

An older version of this patch already made it to linux-next (commit
b0e08c5) and I've actually hit the NULL pointer dereference.

Moreover, that patch (or maybe another recent pagewalk patch) breaks
/proc/<pid>/smaps. All fields that should have been filled by
smaps_pte() are almost always zero (and when it isn't, it's always a
multiple of 2MB). It seems to me that the page walk never goes below
pmd level.

Jerome

> - move comment update into a separate patch
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
>  arch/powerpc/mm/subpage-prot.c |  2 ++
>  fs/proc/task_mmu.c             | 66 ++++++++++++++----------------------------
>  mm/memcontrol.c                | 55 ++++++++++-------------------------
>  mm/pagewalk.c                  | 18 ++++++++++--
>  4 files changed, 55 insertions(+), 86 deletions(-)
> 
> diff --git mmotm-2014-05-21-16-57.orig/arch/powerpc/mm/subpage-prot.c mmotm-2014-05-21-16-57/arch/powerpc/mm/subpage-prot.c
> index fa9fb5b4c66c..d0d94ac606f3 100644
> --- mmotm-2014-05-21-16-57.orig/arch/powerpc/mm/subpage-prot.c
> +++ mmotm-2014-05-21-16-57/arch/powerpc/mm/subpage-prot.c
> @@ -135,7 +135,9 @@ static int subpage_walk_pmd_entry(pmd_t *pmd, unsigned long addr,
>  				  unsigned long end, struct mm_walk *walk)
>  {
>  	struct vm_area_struct *vma = walk->vma;
> +	spin_unlock(walk->ptl);
>  	split_huge_page_pmd(vma, addr, pmd);
> +	spin_lock(walk->ptl);
>  	return 0;
>  }
>  
> diff --git mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
> index fa6d6a4e85b3..059206ea3c6b 100644
> --- mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c
> +++ mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
> @@ -496,15 +496,8 @@ static int smaps_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
>  			struct mm_walk *walk)
>  {
>  	struct mem_size_stats *mss = walk->private;
> -	spinlock_t *ptl;
> -
> -	if (pmd_trans_huge_lock(pmd, walk->vma, &ptl) == 1) {
> -		smaps_pte((pte_t *)pmd, addr, addr + HPAGE_PMD_SIZE, walk);
> -		spin_unlock(ptl);
> -		mss->anonymous_thp += HPAGE_PMD_SIZE;
> -		/* don't call smaps_pte() */
> -		walk->skip = 1;
> -	}
> +	smaps_pte((pte_t *)pmd, addr, addr + HPAGE_PMD_SIZE, walk);
> +	mss->anonymous_thp += HPAGE_PMD_SIZE;
>  	return 0;
>  }
>  
> @@ -983,31 +976,21 @@ static int pagemap_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
>  	struct vm_area_struct *vma = walk->vma;
>  	struct pagemapread *pm = walk->private;
>  	pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
> -	spinlock_t *ptl;
> -
> -	if (!vma)
> -		return err;
> -	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> -		int pmd_flags2;
> +	int pmd_flags2;
>  
> -		if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
> -			pmd_flags2 = __PM_SOFT_DIRTY;
> -		else
> -			pmd_flags2 = 0;
> +	if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
> +		pmd_flags2 = __PM_SOFT_DIRTY;
> +	else
> +		pmd_flags2 = 0;
>  
> -		for (; addr != end; addr += PAGE_SIZE) {
> -			unsigned long offset;
> +	for (; addr != end; addr += PAGE_SIZE) {
> +		unsigned long offset;
>  
> -			offset = (addr & ~PAGEMAP_WALK_MASK) >>
> -					PAGE_SHIFT;
> -			thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
> -			err = add_to_pagemap(addr, &pme, pm);
> -			if (err)
> -				break;
> -		}
> -		spin_unlock(ptl);
> -		/* don't call pagemap_pte() */
> -		walk->skip = 1;
> +		offset = (addr & ~PAGEMAP_WALK_MASK) >> PAGE_SHIFT;
> +		thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
> +		err = add_to_pagemap(addr, &pme, pm);
> +		if (err)
> +			break;
>  	}
>  	return err;
>  }
> @@ -1277,20 +1260,13 @@ static int gather_pmd_stats(pmd_t *pmd, unsigned long addr,
>  {
>  	struct numa_maps *md = walk->private;
>  	struct vm_area_struct *vma = walk->vma;
> -	spinlock_t *ptl;
> -
> -	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> -		pte_t huge_pte = *(pte_t *)pmd;
> -		struct page *page;
> -
> -		page = can_gather_numa_stats(huge_pte, vma, addr);
> -		if (page)
> -			gather_stats(page, md, pte_dirty(huge_pte),
> -				     HPAGE_PMD_SIZE/PAGE_SIZE);
> -		spin_unlock(ptl);
> -		/* don't call gather_pte_stats() */
> -		walk->skip = 1;
> -	}
> +	pte_t huge_pte = *(pte_t *)pmd;
> +	struct page *page;
> +
> +	page = can_gather_numa_stats(huge_pte, vma, addr);
> +	if (page)
> +		gather_stats(page, md, pte_dirty(huge_pte),
> +			     HPAGE_PMD_SIZE/PAGE_SIZE);
>  	return 0;
>  }
>  #ifdef CONFIG_HUGETLB_PAGE
> diff --git mmotm-2014-05-21-16-57.orig/mm/memcontrol.c mmotm-2014-05-21-16-57/mm/memcontrol.c
> index 01a66a208769..bb987cb9e043 100644
> --- mmotm-2014-05-21-16-57.orig/mm/memcontrol.c
> +++ mmotm-2014-05-21-16-57/mm/memcontrol.c
> @@ -6723,15 +6723,9 @@ static int mem_cgroup_count_precharge_pmd(pmd_t *pmd,
>  					struct mm_walk *walk)
>  {
>  	struct vm_area_struct *vma = walk->vma;
> -	spinlock_t *ptl;
> -
> -	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> -		if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
> -			mc.precharge += HPAGE_PMD_NR;
> -		spin_unlock(ptl);
> -		/* don't call mem_cgroup_count_precharge_pte() */
> -		walk->skip = 1;
> -	}
> +
> +	if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
> +		mc.precharge += HPAGE_PMD_NR;
>  	return 0;
>  }
>  
> @@ -6952,38 +6946,21 @@ static int mem_cgroup_move_charge_pmd(pmd_t *pmd,
>  	struct page *page;
>  	struct page_cgroup *pc;
>  
> -	/*
> -	 * We don't take compound_lock() here but no race with splitting thp
> -	 * happens because:
> -	 *  - if pmd_trans_huge_lock() returns 1, the relevant thp is not
> -	 *    under splitting, which means there's no concurrent thp split,
> -	 *  - if another thread runs into split_huge_page() just after we
> -	 *    entered this if-block, the thread must wait for page table lock
> -	 *    to be unlocked in __split_huge_page_splitting(), where the main
> -	 *    part of thp split is not executed yet.
> -	 */
> -	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> -		if (mc.precharge < HPAGE_PMD_NR) {
> -			spin_unlock(ptl);
> -			return 0;
> -		}
> -		target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
> -		if (target_type == MC_TARGET_PAGE) {
> -			page = target.page;
> -			if (!isolate_lru_page(page)) {
> -				pc = lookup_page_cgroup(page);
> -				if (!mem_cgroup_move_account(page, HPAGE_PMD_NR,
> -							pc, mc.from, mc.to)) {
> -					mc.precharge -= HPAGE_PMD_NR;
> -					mc.moved_charge += HPAGE_PMD_NR;
> -				}
> -				putback_lru_page(page);
> +	if (mc.precharge < HPAGE_PMD_NR)
> +		return 0;
> +	target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
> +	if (target_type == MC_TARGET_PAGE) {
> +		page = target.page;
> +		if (!isolate_lru_page(page)) {
> +			pc = lookup_page_cgroup(page);
> +			if (!mem_cgroup_move_account(page, HPAGE_PMD_NR,
> +						     pc, mc.from, mc.to)) {
> +				mc.precharge -= HPAGE_PMD_NR;
> +				mc.moved_charge += HPAGE_PMD_NR;
>  			}
> -			put_page(page);
> +			putback_lru_page(page);
>  		}
> -		spin_unlock(ptl);
> -		/* don't call mem_cgroup_move_charge_pte() */
> -		walk->skip = 1;
> +		put_page(page);
>  	}
>  	return 0;
>  }
> diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
> index 24311d6f5c20..f1a3417d0b51 100644
> --- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
> +++ mmotm-2014-05-21-16-57/mm/pagewalk.c
> @@ -73,8 +73,22 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr,
>  			continue;
>  		}
>  
> -		if (walk->pmd_entry) {
> -			err = walk->pmd_entry(pmd, addr, next, walk);
> +		/*
> +		 * We don't take compound_lock() here but no race with splitting
> +		 * thp happens because:
> +		 *  - if pmd_trans_huge_lock() returns 1, the relevant thp is
> +		 *    not under splitting, which means there's no concurrent
> +		 *    thp split,
> +		 *  - if another thread runs into split_huge_page() just after
> +		 *    we entered this if-block, the thread must wait for page
> +		 *    table lock to be unlocked in __split_huge_page_splitting(),
> +		 *    where the main part of thp split is not executed yet.
> +		 */
> +		if (walk->pmd_entry && walk->vma) {
> +			if (pmd_trans_huge_lock(pmd, walk->vma, &walk->ptl) == 1) {
> +				err = walk->pmd_entry(pmd, addr, next, walk);
> +				spin_unlock(walk->ptl);
> +			}
>  			if (skip_lower_level_walking(walk))
>  				continue;
>  			if (err)
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH -mm v2 04/11] pagewalk: move pmd_trans_huge_lock() from callbacks to common code
@ 2014-06-17 14:27     ` Jerome Marchand
  0 siblings, 0 replies; 42+ messages in thread
From: Jerome Marchand @ 2014-06-17 14:27 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm
  Cc: Dave Hansen, Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	linux-kernel

On 06/12/2014 11:48 PM, Naoya Horiguchi wrote:
> Now all of current users of page table walker are canonicalized, i.e.
> pmd_entry() handles only trans_pmd entry, and pte_entry() handles pte entry.
> So we can factorize common code more.
> This patch moves pmd_trans_huge_lock() in each pmd_entry() to pagewalk core.
> 
> ChangeLog v2:
> - add null check walk->vma in walk_pmd_range()

An older version of this patch already made it to linux-next (commit
b0e08c5) and I've actually hit the NULL pointer dereference.

Moreover, that patch (or maybe another recent pagewalk patch) breaks
/proc/<pid>/smaps. All fields that should have been filled by
smaps_pte() are almost always zero (and when it isn't, it's always a
multiple of 2MB). It seems to me that the page walk never goes below
pmd level.

Jerome

> - move comment update into a separate patch
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
>  arch/powerpc/mm/subpage-prot.c |  2 ++
>  fs/proc/task_mmu.c             | 66 ++++++++++++++----------------------------
>  mm/memcontrol.c                | 55 ++++++++++-------------------------
>  mm/pagewalk.c                  | 18 ++++++++++--
>  4 files changed, 55 insertions(+), 86 deletions(-)
> 
> diff --git mmotm-2014-05-21-16-57.orig/arch/powerpc/mm/subpage-prot.c mmotm-2014-05-21-16-57/arch/powerpc/mm/subpage-prot.c
> index fa9fb5b4c66c..d0d94ac606f3 100644
> --- mmotm-2014-05-21-16-57.orig/arch/powerpc/mm/subpage-prot.c
> +++ mmotm-2014-05-21-16-57/arch/powerpc/mm/subpage-prot.c
> @@ -135,7 +135,9 @@ static int subpage_walk_pmd_entry(pmd_t *pmd, unsigned long addr,
>  				  unsigned long end, struct mm_walk *walk)
>  {
>  	struct vm_area_struct *vma = walk->vma;
> +	spin_unlock(walk->ptl);
>  	split_huge_page_pmd(vma, addr, pmd);
> +	spin_lock(walk->ptl);
>  	return 0;
>  }
>  
> diff --git mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
> index fa6d6a4e85b3..059206ea3c6b 100644
> --- mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c
> +++ mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
> @@ -496,15 +496,8 @@ static int smaps_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
>  			struct mm_walk *walk)
>  {
>  	struct mem_size_stats *mss = walk->private;
> -	spinlock_t *ptl;
> -
> -	if (pmd_trans_huge_lock(pmd, walk->vma, &ptl) == 1) {
> -		smaps_pte((pte_t *)pmd, addr, addr + HPAGE_PMD_SIZE, walk);
> -		spin_unlock(ptl);
> -		mss->anonymous_thp += HPAGE_PMD_SIZE;
> -		/* don't call smaps_pte() */
> -		walk->skip = 1;
> -	}
> +	smaps_pte((pte_t *)pmd, addr, addr + HPAGE_PMD_SIZE, walk);
> +	mss->anonymous_thp += HPAGE_PMD_SIZE;
>  	return 0;
>  }
>  
> @@ -983,31 +976,21 @@ static int pagemap_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
>  	struct vm_area_struct *vma = walk->vma;
>  	struct pagemapread *pm = walk->private;
>  	pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
> -	spinlock_t *ptl;
> -
> -	if (!vma)
> -		return err;
> -	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> -		int pmd_flags2;
> +	int pmd_flags2;
>  
> -		if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
> -			pmd_flags2 = __PM_SOFT_DIRTY;
> -		else
> -			pmd_flags2 = 0;
> +	if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
> +		pmd_flags2 = __PM_SOFT_DIRTY;
> +	else
> +		pmd_flags2 = 0;
>  
> -		for (; addr != end; addr += PAGE_SIZE) {
> -			unsigned long offset;
> +	for (; addr != end; addr += PAGE_SIZE) {
> +		unsigned long offset;
>  
> -			offset = (addr & ~PAGEMAP_WALK_MASK) >>
> -					PAGE_SHIFT;
> -			thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
> -			err = add_to_pagemap(addr, &pme, pm);
> -			if (err)
> -				break;
> -		}
> -		spin_unlock(ptl);
> -		/* don't call pagemap_pte() */
> -		walk->skip = 1;
> +		offset = (addr & ~PAGEMAP_WALK_MASK) >> PAGE_SHIFT;
> +		thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
> +		err = add_to_pagemap(addr, &pme, pm);
> +		if (err)
> +			break;
>  	}
>  	return err;
>  }
> @@ -1277,20 +1260,13 @@ static int gather_pmd_stats(pmd_t *pmd, unsigned long addr,
>  {
>  	struct numa_maps *md = walk->private;
>  	struct vm_area_struct *vma = walk->vma;
> -	spinlock_t *ptl;
> -
> -	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> -		pte_t huge_pte = *(pte_t *)pmd;
> -		struct page *page;
> -
> -		page = can_gather_numa_stats(huge_pte, vma, addr);
> -		if (page)
> -			gather_stats(page, md, pte_dirty(huge_pte),
> -				     HPAGE_PMD_SIZE/PAGE_SIZE);
> -		spin_unlock(ptl);
> -		/* don't call gather_pte_stats() */
> -		walk->skip = 1;
> -	}
> +	pte_t huge_pte = *(pte_t *)pmd;
> +	struct page *page;
> +
> +	page = can_gather_numa_stats(huge_pte, vma, addr);
> +	if (page)
> +		gather_stats(page, md, pte_dirty(huge_pte),
> +			     HPAGE_PMD_SIZE/PAGE_SIZE);
>  	return 0;
>  }
>  #ifdef CONFIG_HUGETLB_PAGE
> diff --git mmotm-2014-05-21-16-57.orig/mm/memcontrol.c mmotm-2014-05-21-16-57/mm/memcontrol.c
> index 01a66a208769..bb987cb9e043 100644
> --- mmotm-2014-05-21-16-57.orig/mm/memcontrol.c
> +++ mmotm-2014-05-21-16-57/mm/memcontrol.c
> @@ -6723,15 +6723,9 @@ static int mem_cgroup_count_precharge_pmd(pmd_t *pmd,
>  					struct mm_walk *walk)
>  {
>  	struct vm_area_struct *vma = walk->vma;
> -	spinlock_t *ptl;
> -
> -	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> -		if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
> -			mc.precharge += HPAGE_PMD_NR;
> -		spin_unlock(ptl);
> -		/* don't call mem_cgroup_count_precharge_pte() */
> -		walk->skip = 1;
> -	}
> +
> +	if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
> +		mc.precharge += HPAGE_PMD_NR;
>  	return 0;
>  }
>  
> @@ -6952,38 +6946,21 @@ static int mem_cgroup_move_charge_pmd(pmd_t *pmd,
>  	struct page *page;
>  	struct page_cgroup *pc;
>  
> -	/*
> -	 * We don't take compound_lock() here but no race with splitting thp
> -	 * happens because:
> -	 *  - if pmd_trans_huge_lock() returns 1, the relevant thp is not
> -	 *    under splitting, which means there's no concurrent thp split,
> -	 *  - if another thread runs into split_huge_page() just after we
> -	 *    entered this if-block, the thread must wait for page table lock
> -	 *    to be unlocked in __split_huge_page_splitting(), where the main
> -	 *    part of thp split is not executed yet.
> -	 */
> -	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> -		if (mc.precharge < HPAGE_PMD_NR) {
> -			spin_unlock(ptl);
> -			return 0;
> -		}
> -		target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
> -		if (target_type == MC_TARGET_PAGE) {
> -			page = target.page;
> -			if (!isolate_lru_page(page)) {
> -				pc = lookup_page_cgroup(page);
> -				if (!mem_cgroup_move_account(page, HPAGE_PMD_NR,
> -							pc, mc.from, mc.to)) {
> -					mc.precharge -= HPAGE_PMD_NR;
> -					mc.moved_charge += HPAGE_PMD_NR;
> -				}
> -				putback_lru_page(page);
> +	if (mc.precharge < HPAGE_PMD_NR)
> +		return 0;
> +	target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
> +	if (target_type == MC_TARGET_PAGE) {
> +		page = target.page;
> +		if (!isolate_lru_page(page)) {
> +			pc = lookup_page_cgroup(page);
> +			if (!mem_cgroup_move_account(page, HPAGE_PMD_NR,
> +						     pc, mc.from, mc.to)) {
> +				mc.precharge -= HPAGE_PMD_NR;
> +				mc.moved_charge += HPAGE_PMD_NR;
>  			}
> -			put_page(page);
> +			putback_lru_page(page);
>  		}
> -		spin_unlock(ptl);
> -		/* don't call mem_cgroup_move_charge_pte() */
> -		walk->skip = 1;
> +		put_page(page);
>  	}
>  	return 0;
>  }
> diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
> index 24311d6f5c20..f1a3417d0b51 100644
> --- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
> +++ mmotm-2014-05-21-16-57/mm/pagewalk.c
> @@ -73,8 +73,22 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr,
>  			continue;
>  		}
>  
> -		if (walk->pmd_entry) {
> -			err = walk->pmd_entry(pmd, addr, next, walk);
> +		/*
> +		 * We don't take compound_lock() here but no race with splitting
> +		 * thp happens because:
> +		 *  - if pmd_trans_huge_lock() returns 1, the relevant thp is
> +		 *    not under splitting, which means there's no concurrent
> +		 *    thp split,
> +		 *  - if another thread runs into split_huge_page() just after
> +		 *    we entered this if-block, the thread must wait for page
> +		 *    table lock to be unlocked in __split_huge_page_splitting(),
> +		 *    where the main part of thp split is not executed yet.
> +		 */
> +		if (walk->pmd_entry && walk->vma) {
> +			if (pmd_trans_huge_lock(pmd, walk->vma, &walk->ptl) == 1) {
> +				err = walk->pmd_entry(pmd, addr, next, walk);
> +				spin_unlock(walk->ptl);
> +			}
>  			if (skip_lower_level_walking(walk))
>  				continue;
>  			if (err)
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH -mm v2 04/11] pagewalk: move pmd_trans_huge_lock() from callbacks to common code
  2014-06-17 14:27     ` Jerome Marchand
@ 2014-06-17 15:01       ` Naoya Horiguchi
  -1 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-17 15:01 UTC (permalink / raw)
  To: Jerome Marchand
  Cc: linux-mm, Dave Hansen, Andrew Morton, Hugh Dickins,
	Kirill A. Shutemov, linux-kernel

On Tue, Jun 17, 2014 at 04:27:56PM +0200, Jerome Marchand wrote:
> On 06/12/2014 11:48 PM, Naoya Horiguchi wrote:
> > Now all of current users of page table walker are canonicalized, i.e.
> > pmd_entry() handles only trans_pmd entry, and pte_entry() handles pte entry.
> > So we can factorize common code more.
> > This patch moves pmd_trans_huge_lock() in each pmd_entry() to pagewalk core.
> > 
> > ChangeLog v2:
> > - add null check walk->vma in walk_pmd_range()
> 
> An older version of this patch already made it to linux-next (commit
> b0e08c5) and I've actually hit the NULL pointer dereference.
> 
> Moreover, that patch (or maybe another recent pagewalk patch) breaks
> /proc/<pid>/smaps. All fields that should have been filled by
> smaps_pte() are almost always zero (and when it isn't, it's always a
> multiple of 2MB). It seems to me that the page walk never goes below
> pmd level.

Agreed, I'm now thinking that forcing pte_entry() for every user is not
good idea, so I'll return to the start point and just will do only the
necessary changes (i.e. only iron out the vma handling problem for hugepage.)

Thanks,
Naoya Horiguchi

> Jerome
> 
> > - move comment update into a separate patch
> > 
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > ---
> >  arch/powerpc/mm/subpage-prot.c |  2 ++
> >  fs/proc/task_mmu.c             | 66 ++++++++++++++----------------------------
> >  mm/memcontrol.c                | 55 ++++++++++-------------------------
> >  mm/pagewalk.c                  | 18 ++++++++++--
> >  4 files changed, 55 insertions(+), 86 deletions(-)
> > 
> > diff --git mmotm-2014-05-21-16-57.orig/arch/powerpc/mm/subpage-prot.c mmotm-2014-05-21-16-57/arch/powerpc/mm/subpage-prot.c
> > index fa9fb5b4c66c..d0d94ac606f3 100644
> > --- mmotm-2014-05-21-16-57.orig/arch/powerpc/mm/subpage-prot.c
> > +++ mmotm-2014-05-21-16-57/arch/powerpc/mm/subpage-prot.c
> > @@ -135,7 +135,9 @@ static int subpage_walk_pmd_entry(pmd_t *pmd, unsigned long addr,
> >  				  unsigned long end, struct mm_walk *walk)
> >  {
> >  	struct vm_area_struct *vma = walk->vma;
> > +	spin_unlock(walk->ptl);
> >  	split_huge_page_pmd(vma, addr, pmd);
> > +	spin_lock(walk->ptl);
> >  	return 0;
> >  }
> >  
> > diff --git mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
> > index fa6d6a4e85b3..059206ea3c6b 100644
> > --- mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c
> > +++ mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
> > @@ -496,15 +496,8 @@ static int smaps_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
> >  			struct mm_walk *walk)
> >  {
> >  	struct mem_size_stats *mss = walk->private;
> > -	spinlock_t *ptl;
> > -
> > -	if (pmd_trans_huge_lock(pmd, walk->vma, &ptl) == 1) {
> > -		smaps_pte((pte_t *)pmd, addr, addr + HPAGE_PMD_SIZE, walk);
> > -		spin_unlock(ptl);
> > -		mss->anonymous_thp += HPAGE_PMD_SIZE;
> > -		/* don't call smaps_pte() */
> > -		walk->skip = 1;
> > -	}
> > +	smaps_pte((pte_t *)pmd, addr, addr + HPAGE_PMD_SIZE, walk);
> > +	mss->anonymous_thp += HPAGE_PMD_SIZE;
> >  	return 0;
> >  }
> >  
> > @@ -983,31 +976,21 @@ static int pagemap_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
> >  	struct vm_area_struct *vma = walk->vma;
> >  	struct pagemapread *pm = walk->private;
> >  	pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
> > -	spinlock_t *ptl;
> > -
> > -	if (!vma)
> > -		return err;
> > -	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> > -		int pmd_flags2;
> > +	int pmd_flags2;
> >  
> > -		if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
> > -			pmd_flags2 = __PM_SOFT_DIRTY;
> > -		else
> > -			pmd_flags2 = 0;
> > +	if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
> > +		pmd_flags2 = __PM_SOFT_DIRTY;
> > +	else
> > +		pmd_flags2 = 0;
> >  
> > -		for (; addr != end; addr += PAGE_SIZE) {
> > -			unsigned long offset;
> > +	for (; addr != end; addr += PAGE_SIZE) {
> > +		unsigned long offset;
> >  
> > -			offset = (addr & ~PAGEMAP_WALK_MASK) >>
> > -					PAGE_SHIFT;
> > -			thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
> > -			err = add_to_pagemap(addr, &pme, pm);
> > -			if (err)
> > -				break;
> > -		}
> > -		spin_unlock(ptl);
> > -		/* don't call pagemap_pte() */
> > -		walk->skip = 1;
> > +		offset = (addr & ~PAGEMAP_WALK_MASK) >> PAGE_SHIFT;
> > +		thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
> > +		err = add_to_pagemap(addr, &pme, pm);
> > +		if (err)
> > +			break;
> >  	}
> >  	return err;
> >  }
> > @@ -1277,20 +1260,13 @@ static int gather_pmd_stats(pmd_t *pmd, unsigned long addr,
> >  {
> >  	struct numa_maps *md = walk->private;
> >  	struct vm_area_struct *vma = walk->vma;
> > -	spinlock_t *ptl;
> > -
> > -	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> > -		pte_t huge_pte = *(pte_t *)pmd;
> > -		struct page *page;
> > -
> > -		page = can_gather_numa_stats(huge_pte, vma, addr);
> > -		if (page)
> > -			gather_stats(page, md, pte_dirty(huge_pte),
> > -				     HPAGE_PMD_SIZE/PAGE_SIZE);
> > -		spin_unlock(ptl);
> > -		/* don't call gather_pte_stats() */
> > -		walk->skip = 1;
> > -	}
> > +	pte_t huge_pte = *(pte_t *)pmd;
> > +	struct page *page;
> > +
> > +	page = can_gather_numa_stats(huge_pte, vma, addr);
> > +	if (page)
> > +		gather_stats(page, md, pte_dirty(huge_pte),
> > +			     HPAGE_PMD_SIZE/PAGE_SIZE);
> >  	return 0;
> >  }
> >  #ifdef CONFIG_HUGETLB_PAGE
> > diff --git mmotm-2014-05-21-16-57.orig/mm/memcontrol.c mmotm-2014-05-21-16-57/mm/memcontrol.c
> > index 01a66a208769..bb987cb9e043 100644
> > --- mmotm-2014-05-21-16-57.orig/mm/memcontrol.c
> > +++ mmotm-2014-05-21-16-57/mm/memcontrol.c
> > @@ -6723,15 +6723,9 @@ static int mem_cgroup_count_precharge_pmd(pmd_t *pmd,
> >  					struct mm_walk *walk)
> >  {
> >  	struct vm_area_struct *vma = walk->vma;
> > -	spinlock_t *ptl;
> > -
> > -	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> > -		if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
> > -			mc.precharge += HPAGE_PMD_NR;
> > -		spin_unlock(ptl);
> > -		/* don't call mem_cgroup_count_precharge_pte() */
> > -		walk->skip = 1;
> > -	}
> > +
> > +	if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
> > +		mc.precharge += HPAGE_PMD_NR;
> >  	return 0;
> >  }
> >  
> > @@ -6952,38 +6946,21 @@ static int mem_cgroup_move_charge_pmd(pmd_t *pmd,
> >  	struct page *page;
> >  	struct page_cgroup *pc;
> >  
> > -	/*
> > -	 * We don't take compound_lock() here but no race with splitting thp
> > -	 * happens because:
> > -	 *  - if pmd_trans_huge_lock() returns 1, the relevant thp is not
> > -	 *    under splitting, which means there's no concurrent thp split,
> > -	 *  - if another thread runs into split_huge_page() just after we
> > -	 *    entered this if-block, the thread must wait for page table lock
> > -	 *    to be unlocked in __split_huge_page_splitting(), where the main
> > -	 *    part of thp split is not executed yet.
> > -	 */
> > -	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> > -		if (mc.precharge < HPAGE_PMD_NR) {
> > -			spin_unlock(ptl);
> > -			return 0;
> > -		}
> > -		target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
> > -		if (target_type == MC_TARGET_PAGE) {
> > -			page = target.page;
> > -			if (!isolate_lru_page(page)) {
> > -				pc = lookup_page_cgroup(page);
> > -				if (!mem_cgroup_move_account(page, HPAGE_PMD_NR,
> > -							pc, mc.from, mc.to)) {
> > -					mc.precharge -= HPAGE_PMD_NR;
> > -					mc.moved_charge += HPAGE_PMD_NR;
> > -				}
> > -				putback_lru_page(page);
> > +	if (mc.precharge < HPAGE_PMD_NR)
> > +		return 0;
> > +	target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
> > +	if (target_type == MC_TARGET_PAGE) {
> > +		page = target.page;
> > +		if (!isolate_lru_page(page)) {
> > +			pc = lookup_page_cgroup(page);
> > +			if (!mem_cgroup_move_account(page, HPAGE_PMD_NR,
> > +						     pc, mc.from, mc.to)) {
> > +				mc.precharge -= HPAGE_PMD_NR;
> > +				mc.moved_charge += HPAGE_PMD_NR;
> >  			}
> > -			put_page(page);
> > +			putback_lru_page(page);
> >  		}
> > -		spin_unlock(ptl);
> > -		/* don't call mem_cgroup_move_charge_pte() */
> > -		walk->skip = 1;
> > +		put_page(page);
> >  	}
> >  	return 0;
> >  }
> > diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
> > index 24311d6f5c20..f1a3417d0b51 100644
> > --- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
> > +++ mmotm-2014-05-21-16-57/mm/pagewalk.c
> > @@ -73,8 +73,22 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr,
> >  			continue;
> >  		}
> >  
> > -		if (walk->pmd_entry) {
> > -			err = walk->pmd_entry(pmd, addr, next, walk);
> > +		/*
> > +		 * We don't take compound_lock() here but no race with splitting
> > +		 * thp happens because:
> > +		 *  - if pmd_trans_huge_lock() returns 1, the relevant thp is
> > +		 *    not under splitting, which means there's no concurrent
> > +		 *    thp split,
> > +		 *  - if another thread runs into split_huge_page() just after
> > +		 *    we entered this if-block, the thread must wait for page
> > +		 *    table lock to be unlocked in __split_huge_page_splitting(),
> > +		 *    where the main part of thp split is not executed yet.
> > +		 */
> > +		if (walk->pmd_entry && walk->vma) {
> > +			if (pmd_trans_huge_lock(pmd, walk->vma, &walk->ptl) == 1) {
> > +				err = walk->pmd_entry(pmd, addr, next, walk);
> > +				spin_unlock(walk->ptl);
> > +			}
> >  			if (skip_lower_level_walking(walk))
> >  				continue;
> >  			if (err)
> > 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH -mm v2 04/11] pagewalk: move pmd_trans_huge_lock() from callbacks to common code
@ 2014-06-17 15:01       ` Naoya Horiguchi
  0 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-17 15:01 UTC (permalink / raw)
  To: Jerome Marchand
  Cc: linux-mm, Dave Hansen, Andrew Morton, Hugh Dickins,
	Kirill A. Shutemov, linux-kernel

On Tue, Jun 17, 2014 at 04:27:56PM +0200, Jerome Marchand wrote:
> On 06/12/2014 11:48 PM, Naoya Horiguchi wrote:
> > Now all of current users of page table walker are canonicalized, i.e.
> > pmd_entry() handles only trans_pmd entry, and pte_entry() handles pte entry.
> > So we can factorize common code more.
> > This patch moves pmd_trans_huge_lock() in each pmd_entry() to pagewalk core.
> > 
> > ChangeLog v2:
> > - add null check walk->vma in walk_pmd_range()
> 
> An older version of this patch already made it to linux-next (commit
> b0e08c5) and I've actually hit the NULL pointer dereference.
> 
> Moreover, that patch (or maybe another recent pagewalk patch) breaks
> /proc/<pid>/smaps. All fields that should have been filled by
> smaps_pte() are almost always zero (and when it isn't, it's always a
> multiple of 2MB). It seems to me that the page walk never goes below
> pmd level.

Agreed, I'm now thinking that forcing pte_entry() for every user is not
good idea, so I'll return to the start point and just will do only the
necessary changes (i.e. only iron out the vma handling problem for hugepage.)

Thanks,
Naoya Horiguchi

> Jerome
> 
> > - move comment update into a separate patch
> > 
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > ---
> >  arch/powerpc/mm/subpage-prot.c |  2 ++
> >  fs/proc/task_mmu.c             | 66 ++++++++++++++----------------------------
> >  mm/memcontrol.c                | 55 ++++++++++-------------------------
> >  mm/pagewalk.c                  | 18 ++++++++++--
> >  4 files changed, 55 insertions(+), 86 deletions(-)
> > 
> > diff --git mmotm-2014-05-21-16-57.orig/arch/powerpc/mm/subpage-prot.c mmotm-2014-05-21-16-57/arch/powerpc/mm/subpage-prot.c
> > index fa9fb5b4c66c..d0d94ac606f3 100644
> > --- mmotm-2014-05-21-16-57.orig/arch/powerpc/mm/subpage-prot.c
> > +++ mmotm-2014-05-21-16-57/arch/powerpc/mm/subpage-prot.c
> > @@ -135,7 +135,9 @@ static int subpage_walk_pmd_entry(pmd_t *pmd, unsigned long addr,
> >  				  unsigned long end, struct mm_walk *walk)
> >  {
> >  	struct vm_area_struct *vma = walk->vma;
> > +	spin_unlock(walk->ptl);
> >  	split_huge_page_pmd(vma, addr, pmd);
> > +	spin_lock(walk->ptl);
> >  	return 0;
> >  }
> >  
> > diff --git mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
> > index fa6d6a4e85b3..059206ea3c6b 100644
> > --- mmotm-2014-05-21-16-57.orig/fs/proc/task_mmu.c
> > +++ mmotm-2014-05-21-16-57/fs/proc/task_mmu.c
> > @@ -496,15 +496,8 @@ static int smaps_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
> >  			struct mm_walk *walk)
> >  {
> >  	struct mem_size_stats *mss = walk->private;
> > -	spinlock_t *ptl;
> > -
> > -	if (pmd_trans_huge_lock(pmd, walk->vma, &ptl) == 1) {
> > -		smaps_pte((pte_t *)pmd, addr, addr + HPAGE_PMD_SIZE, walk);
> > -		spin_unlock(ptl);
> > -		mss->anonymous_thp += HPAGE_PMD_SIZE;
> > -		/* don't call smaps_pte() */
> > -		walk->skip = 1;
> > -	}
> > +	smaps_pte((pte_t *)pmd, addr, addr + HPAGE_PMD_SIZE, walk);
> > +	mss->anonymous_thp += HPAGE_PMD_SIZE;
> >  	return 0;
> >  }
> >  
> > @@ -983,31 +976,21 @@ static int pagemap_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
> >  	struct vm_area_struct *vma = walk->vma;
> >  	struct pagemapread *pm = walk->private;
> >  	pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
> > -	spinlock_t *ptl;
> > -
> > -	if (!vma)
> > -		return err;
> > -	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> > -		int pmd_flags2;
> > +	int pmd_flags2;
> >  
> > -		if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
> > -			pmd_flags2 = __PM_SOFT_DIRTY;
> > -		else
> > -			pmd_flags2 = 0;
> > +	if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
> > +		pmd_flags2 = __PM_SOFT_DIRTY;
> > +	else
> > +		pmd_flags2 = 0;
> >  
> > -		for (; addr != end; addr += PAGE_SIZE) {
> > -			unsigned long offset;
> > +	for (; addr != end; addr += PAGE_SIZE) {
> > +		unsigned long offset;
> >  
> > -			offset = (addr & ~PAGEMAP_WALK_MASK) >>
> > -					PAGE_SHIFT;
> > -			thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
> > -			err = add_to_pagemap(addr, &pme, pm);
> > -			if (err)
> > -				break;
> > -		}
> > -		spin_unlock(ptl);
> > -		/* don't call pagemap_pte() */
> > -		walk->skip = 1;
> > +		offset = (addr & ~PAGEMAP_WALK_MASK) >> PAGE_SHIFT;
> > +		thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
> > +		err = add_to_pagemap(addr, &pme, pm);
> > +		if (err)
> > +			break;
> >  	}
> >  	return err;
> >  }
> > @@ -1277,20 +1260,13 @@ static int gather_pmd_stats(pmd_t *pmd, unsigned long addr,
> >  {
> >  	struct numa_maps *md = walk->private;
> >  	struct vm_area_struct *vma = walk->vma;
> > -	spinlock_t *ptl;
> > -
> > -	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> > -		pte_t huge_pte = *(pte_t *)pmd;
> > -		struct page *page;
> > -
> > -		page = can_gather_numa_stats(huge_pte, vma, addr);
> > -		if (page)
> > -			gather_stats(page, md, pte_dirty(huge_pte),
> > -				     HPAGE_PMD_SIZE/PAGE_SIZE);
> > -		spin_unlock(ptl);
> > -		/* don't call gather_pte_stats() */
> > -		walk->skip = 1;
> > -	}
> > +	pte_t huge_pte = *(pte_t *)pmd;
> > +	struct page *page;
> > +
> > +	page = can_gather_numa_stats(huge_pte, vma, addr);
> > +	if (page)
> > +		gather_stats(page, md, pte_dirty(huge_pte),
> > +			     HPAGE_PMD_SIZE/PAGE_SIZE);
> >  	return 0;
> >  }
> >  #ifdef CONFIG_HUGETLB_PAGE
> > diff --git mmotm-2014-05-21-16-57.orig/mm/memcontrol.c mmotm-2014-05-21-16-57/mm/memcontrol.c
> > index 01a66a208769..bb987cb9e043 100644
> > --- mmotm-2014-05-21-16-57.orig/mm/memcontrol.c
> > +++ mmotm-2014-05-21-16-57/mm/memcontrol.c
> > @@ -6723,15 +6723,9 @@ static int mem_cgroup_count_precharge_pmd(pmd_t *pmd,
> >  					struct mm_walk *walk)
> >  {
> >  	struct vm_area_struct *vma = walk->vma;
> > -	spinlock_t *ptl;
> > -
> > -	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> > -		if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
> > -			mc.precharge += HPAGE_PMD_NR;
> > -		spin_unlock(ptl);
> > -		/* don't call mem_cgroup_count_precharge_pte() */
> > -		walk->skip = 1;
> > -	}
> > +
> > +	if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
> > +		mc.precharge += HPAGE_PMD_NR;
> >  	return 0;
> >  }
> >  
> > @@ -6952,38 +6946,21 @@ static int mem_cgroup_move_charge_pmd(pmd_t *pmd,
> >  	struct page *page;
> >  	struct page_cgroup *pc;
> >  
> > -	/*
> > -	 * We don't take compound_lock() here but no race with splitting thp
> > -	 * happens because:
> > -	 *  - if pmd_trans_huge_lock() returns 1, the relevant thp is not
> > -	 *    under splitting, which means there's no concurrent thp split,
> > -	 *  - if another thread runs into split_huge_page() just after we
> > -	 *    entered this if-block, the thread must wait for page table lock
> > -	 *    to be unlocked in __split_huge_page_splitting(), where the main
> > -	 *    part of thp split is not executed yet.
> > -	 */
> > -	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> > -		if (mc.precharge < HPAGE_PMD_NR) {
> > -			spin_unlock(ptl);
> > -			return 0;
> > -		}
> > -		target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
> > -		if (target_type == MC_TARGET_PAGE) {
> > -			page = target.page;
> > -			if (!isolate_lru_page(page)) {
> > -				pc = lookup_page_cgroup(page);
> > -				if (!mem_cgroup_move_account(page, HPAGE_PMD_NR,
> > -							pc, mc.from, mc.to)) {
> > -					mc.precharge -= HPAGE_PMD_NR;
> > -					mc.moved_charge += HPAGE_PMD_NR;
> > -				}
> > -				putback_lru_page(page);
> > +	if (mc.precharge < HPAGE_PMD_NR)
> > +		return 0;
> > +	target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
> > +	if (target_type == MC_TARGET_PAGE) {
> > +		page = target.page;
> > +		if (!isolate_lru_page(page)) {
> > +			pc = lookup_page_cgroup(page);
> > +			if (!mem_cgroup_move_account(page, HPAGE_PMD_NR,
> > +						     pc, mc.from, mc.to)) {
> > +				mc.precharge -= HPAGE_PMD_NR;
> > +				mc.moved_charge += HPAGE_PMD_NR;
> >  			}
> > -			put_page(page);
> > +			putback_lru_page(page);
> >  		}
> > -		spin_unlock(ptl);
> > -		/* don't call mem_cgroup_move_charge_pte() */
> > -		walk->skip = 1;
> > +		put_page(page);
> >  	}
> >  	return 0;
> >  }
> > diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
> > index 24311d6f5c20..f1a3417d0b51 100644
> > --- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
> > +++ mmotm-2014-05-21-16-57/mm/pagewalk.c
> > @@ -73,8 +73,22 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr,
> >  			continue;
> >  		}
> >  
> > -		if (walk->pmd_entry) {
> > -			err = walk->pmd_entry(pmd, addr, next, walk);
> > +		/*
> > +		 * We don't take compound_lock() here but no race with splitting
> > +		 * thp happens because:
> > +		 *  - if pmd_trans_huge_lock() returns 1, the relevant thp is
> > +		 *    not under splitting, which means there's no concurrent
> > +		 *    thp split,
> > +		 *  - if another thread runs into split_huge_page() just after
> > +		 *    we entered this if-block, the thread must wait for page
> > +		 *    table lock to be unlocked in __split_huge_page_splitting(),
> > +		 *    where the main part of thp split is not executed yet.
> > +		 */
> > +		if (walk->pmd_entry && walk->vma) {
> > +			if (pmd_trans_huge_lock(pmd, walk->vma, &walk->ptl) == 1) {
> > +				err = walk->pmd_entry(pmd, addr, next, walk);
> > +				spin_unlock(walk->ptl);
> > +			}
> >  			if (skip_lower_level_walking(walk))
> >  				continue;
> >  			if (err)
> > 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH -mm v2 04/11] pagewalk: move pmd_trans_huge_lock() from callbacks to common code
  2014-06-17 15:01       ` Naoya Horiguchi
@ 2014-06-18 15:13         ` Jerome Marchand
  -1 siblings, 0 replies; 42+ messages in thread
From: Jerome Marchand @ 2014-06-18 15:13 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Dave Hansen, Andrew Morton, Hugh Dickins,
	Kirill A. Shutemov, linux-kernel

On 06/17/2014 05:01 PM, Naoya Horiguchi wrote:
> On Tue, Jun 17, 2014 at 04:27:56PM +0200, Jerome Marchand wrote:
>> On 06/12/2014 11:48 PM, Naoya Horiguchi wrote:
>>> Now all of current users of page table walker are canonicalized, i.e.
>>> pmd_entry() handles only trans_pmd entry, and pte_entry() handles pte entry.
>>> So we can factorize common code more.
>>> This patch moves pmd_trans_huge_lock() in each pmd_entry() to pagewalk core.
>>>
>>> ChangeLog v2:
>>> - add null check walk->vma in walk_pmd_range()
>>
>> An older version of this patch already made it to linux-next (commit
>> b0e08c5) and I've actually hit the NULL pointer dereference.
>>
>> Moreover, that patch (or maybe another recent pagewalk patch) breaks
>> /proc/<pid>/smaps. All fields that should have been filled by
>> smaps_pte() are almost always zero (and when it isn't, it's always a
>> multiple of 2MB). It seems to me that the page walk never goes below
>> pmd level.
> 
> Agreed, I'm now thinking that forcing pte_entry() for every user is not
> good idea, so I'll return to the start point and just will do only the
> necessary changes (i.e. only iron out the vma handling problem for hugepage.)
> 
> Thanks,
> Naoya Horiguchi
> 
>> Jerome
>>
>>> - move comment update into a separate patch
>>>
>>> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>> ---


>>> diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
>>> index 24311d6f5c20..f1a3417d0b51 100644
>>> --- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
>>> +++ mmotm-2014-05-21-16-57/mm/pagewalk.c
>>> @@ -73,8 +73,22 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr,
>>>  			continue;
>>>  		}
>>>  
>>> -		if (walk->pmd_entry) {
>>> -			err = walk->pmd_entry(pmd, addr, next, walk);
>>> +		/*
>>> +		 * We don't take compound_lock() here but no race with splitting
>>> +		 * thp happens because:
>>> +		 *  - if pmd_trans_huge_lock() returns 1, the relevant thp is
>>> +		 *    not under splitting, which means there's no concurrent
>>> +		 *    thp split,
>>> +		 *  - if another thread runs into split_huge_page() just after
>>> +		 *    we entered this if-block, the thread must wait for page
>>> +		 *    table lock to be unlocked in __split_huge_page_splitting(),
>>> +		 *    where the main part of thp split is not executed yet.
>>> +		 */
>>> +		if (walk->pmd_entry && walk->vma) {
>>> +			if (pmd_trans_huge_lock(pmd, walk->vma, &walk->ptl) == 1) {
>>> +				err = walk->pmd_entry(pmd, addr, next, walk);
>>> +				spin_unlock(walk->ptl);
>>> +			}
>>>  			if (skip_lower_level_walking(walk))
>>>  				continue;
>>>  			if (err)

This is the cause of the smaps trouble. This code modifies walk->control
when pmd_entry() is present, even when it is not called. All the control
code should depend on pmd_trans_huge_lock() == 1 too.

Jerome

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH -mm v2 04/11] pagewalk: move pmd_trans_huge_lock() from callbacks to common code
@ 2014-06-18 15:13         ` Jerome Marchand
  0 siblings, 0 replies; 42+ messages in thread
From: Jerome Marchand @ 2014-06-18 15:13 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Dave Hansen, Andrew Morton, Hugh Dickins,
	Kirill A. Shutemov, linux-kernel

On 06/17/2014 05:01 PM, Naoya Horiguchi wrote:
> On Tue, Jun 17, 2014 at 04:27:56PM +0200, Jerome Marchand wrote:
>> On 06/12/2014 11:48 PM, Naoya Horiguchi wrote:
>>> Now all of current users of page table walker are canonicalized, i.e.
>>> pmd_entry() handles only trans_pmd entry, and pte_entry() handles pte entry.
>>> So we can factorize common code more.
>>> This patch moves pmd_trans_huge_lock() in each pmd_entry() to pagewalk core.
>>>
>>> ChangeLog v2:
>>> - add null check walk->vma in walk_pmd_range()
>>
>> An older version of this patch already made it to linux-next (commit
>> b0e08c5) and I've actually hit the NULL pointer dereference.
>>
>> Moreover, that patch (or maybe another recent pagewalk patch) breaks
>> /proc/<pid>/smaps. All fields that should have been filled by
>> smaps_pte() are almost always zero (and when it isn't, it's always a
>> multiple of 2MB). It seems to me that the page walk never goes below
>> pmd level.
> 
> Agreed, I'm now thinking that forcing pte_entry() for every user is not
> good idea, so I'll return to the start point and just will do only the
> necessary changes (i.e. only iron out the vma handling problem for hugepage.)
> 
> Thanks,
> Naoya Horiguchi
> 
>> Jerome
>>
>>> - move comment update into a separate patch
>>>
>>> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>> ---


>>> diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
>>> index 24311d6f5c20..f1a3417d0b51 100644
>>> --- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
>>> +++ mmotm-2014-05-21-16-57/mm/pagewalk.c
>>> @@ -73,8 +73,22 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr,
>>>  			continue;
>>>  		}
>>>  
>>> -		if (walk->pmd_entry) {
>>> -			err = walk->pmd_entry(pmd, addr, next, walk);
>>> +		/*
>>> +		 * We don't take compound_lock() here but no race with splitting
>>> +		 * thp happens because:
>>> +		 *  - if pmd_trans_huge_lock() returns 1, the relevant thp is
>>> +		 *    not under splitting, which means there's no concurrent
>>> +		 *    thp split,
>>> +		 *  - if another thread runs into split_huge_page() just after
>>> +		 *    we entered this if-block, the thread must wait for page
>>> +		 *    table lock to be unlocked in __split_huge_page_splitting(),
>>> +		 *    where the main part of thp split is not executed yet.
>>> +		 */
>>> +		if (walk->pmd_entry && walk->vma) {
>>> +			if (pmd_trans_huge_lock(pmd, walk->vma, &walk->ptl) == 1) {
>>> +				err = walk->pmd_entry(pmd, addr, next, walk);
>>> +				spin_unlock(walk->ptl);
>>> +			}
>>>  			if (skip_lower_level_walking(walk))
>>>  				continue;
>>>  			if (err)

This is the cause of the smaps trouble. This code modifies walk->control
when pmd_entry() is present, even when it is not called. All the control
code should depend on pmd_trans_huge_lock() == 1 too.

Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH -mm v2 04/11] pagewalk: move pmd_trans_huge_lock() from callbacks to common code
  2014-06-18 15:13         ` Jerome Marchand
@ 2014-06-18 15:31           ` Naoya Horiguchi
  -1 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-18 15:31 UTC (permalink / raw)
  To: Jerome Marchand
  Cc: linux-mm, Dave Hansen, Andrew Morton, Hugh Dickins,
	Kirill A. Shutemov, linux-kernel

On Wed, Jun 18, 2014 at 05:13:53PM +0200, Jerome Marchand wrote:
> On 06/17/2014 05:01 PM, Naoya Horiguchi wrote:
> > On Tue, Jun 17, 2014 at 04:27:56PM +0200, Jerome Marchand wrote:
> >> On 06/12/2014 11:48 PM, Naoya Horiguchi wrote:
> >>> Now all of current users of page table walker are canonicalized, i.e.
> >>> pmd_entry() handles only trans_pmd entry, and pte_entry() handles pte entry.
> >>> So we can factorize common code more.
> >>> This patch moves pmd_trans_huge_lock() in each pmd_entry() to pagewalk core.
> >>>
> >>> ChangeLog v2:
> >>> - add null check walk->vma in walk_pmd_range()
> >>
> >> An older version of this patch already made it to linux-next (commit
> >> b0e08c5) and I've actually hit the NULL pointer dereference.
> >>
> >> Moreover, that patch (or maybe another recent pagewalk patch) breaks
> >> /proc/<pid>/smaps. All fields that should have been filled by
> >> smaps_pte() are almost always zero (and when it isn't, it's always a
> >> multiple of 2MB). It seems to me that the page walk never goes below
> >> pmd level.
> > 
> > Agreed, I'm now thinking that forcing pte_entry() for every user is not
> > good idea, so I'll return to the start point and just will do only the
> > necessary changes (i.e. only iron out the vma handling problem for hugepage.)
> > 
> > Thanks,
> > Naoya Horiguchi
> > 
> >> Jerome
> >>
> >>> - move comment update into a separate patch
> >>>
> >>> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> >>> ---
> 
> 
> >>> diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
> >>> index 24311d6f5c20..f1a3417d0b51 100644
> >>> --- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
> >>> +++ mmotm-2014-05-21-16-57/mm/pagewalk.c
> >>> @@ -73,8 +73,22 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr,
> >>>  			continue;
> >>>  		}
> >>>  
> >>> -		if (walk->pmd_entry) {
> >>> -			err = walk->pmd_entry(pmd, addr, next, walk);
> >>> +		/*
> >>> +		 * We don't take compound_lock() here but no race with splitting
> >>> +		 * thp happens because:
> >>> +		 *  - if pmd_trans_huge_lock() returns 1, the relevant thp is
> >>> +		 *    not under splitting, which means there's no concurrent
> >>> +		 *    thp split,
> >>> +		 *  - if another thread runs into split_huge_page() just after
> >>> +		 *    we entered this if-block, the thread must wait for page
> >>> +		 *    table lock to be unlocked in __split_huge_page_splitting(),
> >>> +		 *    where the main part of thp split is not executed yet.
> >>> +		 */
> >>> +		if (walk->pmd_entry && walk->vma) {
> >>> +			if (pmd_trans_huge_lock(pmd, walk->vma, &walk->ptl) == 1) {
> >>> +				err = walk->pmd_entry(pmd, addr, next, walk);
> >>> +				spin_unlock(walk->ptl);
> >>> +			}
> >>>  			if (skip_lower_level_walking(walk))
> >>>  				continue;
> >>>  			if (err)
> 
> This is the cause of the smaps trouble. This code modifies walk->control
> when pmd_entry() is present, even when it is not called. All the control
> code should depend on pmd_trans_huge_lock() == 1 too.

Thank you for pointing out, I have a few objection about doing aggressive
cleanup around this code, and now I'm preparing the next version which
does minimum cleanup without walk->control stuff, so I hope your concern
will be gone in it.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH -mm v2 04/11] pagewalk: move pmd_trans_huge_lock() from callbacks to common code
@ 2014-06-18 15:31           ` Naoya Horiguchi
  0 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2014-06-18 15:31 UTC (permalink / raw)
  To: Jerome Marchand
  Cc: linux-mm, Dave Hansen, Andrew Morton, Hugh Dickins,
	Kirill A. Shutemov, linux-kernel

On Wed, Jun 18, 2014 at 05:13:53PM +0200, Jerome Marchand wrote:
> On 06/17/2014 05:01 PM, Naoya Horiguchi wrote:
> > On Tue, Jun 17, 2014 at 04:27:56PM +0200, Jerome Marchand wrote:
> >> On 06/12/2014 11:48 PM, Naoya Horiguchi wrote:
> >>> Now all of current users of page table walker are canonicalized, i.e.
> >>> pmd_entry() handles only trans_pmd entry, and pte_entry() handles pte entry.
> >>> So we can factorize common code more.
> >>> This patch moves pmd_trans_huge_lock() in each pmd_entry() to pagewalk core.
> >>>
> >>> ChangeLog v2:
> >>> - add null check walk->vma in walk_pmd_range()
> >>
> >> An older version of this patch already made it to linux-next (commit
> >> b0e08c5) and I've actually hit the NULL pointer dereference.
> >>
> >> Moreover, that patch (or maybe another recent pagewalk patch) breaks
> >> /proc/<pid>/smaps. All fields that should have been filled by
> >> smaps_pte() are almost always zero (and when it isn't, it's always a
> >> multiple of 2MB). It seems to me that the page walk never goes below
> >> pmd level.
> > 
> > Agreed, I'm now thinking that forcing pte_entry() for every user is not
> > good idea, so I'll return to the start point and just will do only the
> > necessary changes (i.e. only iron out the vma handling problem for hugepage.)
> > 
> > Thanks,
> > Naoya Horiguchi
> > 
> >> Jerome
> >>
> >>> - move comment update into a separate patch
> >>>
> >>> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> >>> ---
> 
> 
> >>> diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
> >>> index 24311d6f5c20..f1a3417d0b51 100644
> >>> --- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
> >>> +++ mmotm-2014-05-21-16-57/mm/pagewalk.c
> >>> @@ -73,8 +73,22 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr,
> >>>  			continue;
> >>>  		}
> >>>  
> >>> -		if (walk->pmd_entry) {
> >>> -			err = walk->pmd_entry(pmd, addr, next, walk);
> >>> +		/*
> >>> +		 * We don't take compound_lock() here but no race with splitting
> >>> +		 * thp happens because:
> >>> +		 *  - if pmd_trans_huge_lock() returns 1, the relevant thp is
> >>> +		 *    not under splitting, which means there's no concurrent
> >>> +		 *    thp split,
> >>> +		 *  - if another thread runs into split_huge_page() just after
> >>> +		 *    we entered this if-block, the thread must wait for page
> >>> +		 *    table lock to be unlocked in __split_huge_page_splitting(),
> >>> +		 *    where the main part of thp split is not executed yet.
> >>> +		 */
> >>> +		if (walk->pmd_entry && walk->vma) {
> >>> +			if (pmd_trans_huge_lock(pmd, walk->vma, &walk->ptl) == 1) {
> >>> +				err = walk->pmd_entry(pmd, addr, next, walk);
> >>> +				spin_unlock(walk->ptl);
> >>> +			}
> >>>  			if (skip_lower_level_walking(walk))
> >>>  				continue;
> >>>  			if (err)
> 
> This is the cause of the smaps trouble. This code modifies walk->control
> when pmd_entry() is present, even when it is not called. All the control
> code should depend on pmd_trans_huge_lock() == 1 too.

Thank you for pointing out, I have a few objection about doing aggressive
cleanup around this code, and now I'm preparing the next version which
does minimum cleanup without walk->control stuff, so I hope your concern
will be gone in it.

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2014-06-18 15:31 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-12 21:48 [PATCH -mm v2 00/11] pagewalk: standardize current users, move pmd locking, apply to mincore Naoya Horiguchi
2014-06-12 21:48 ` Naoya Horiguchi
2014-06-12 21:48 ` [PATCH -mm v2 01/11] pagewalk: remove pgd_entry() and pud_entry() Naoya Horiguchi
2014-06-12 21:48   ` Naoya Horiguchi
2014-06-12 21:48 ` [PATCH -mm v2 02/11] madvise: cleanup swapin_walk_pmd_entry() Naoya Horiguchi
2014-06-12 21:48   ` Naoya Horiguchi
2014-06-15 20:24   ` Hugh Dickins
2014-06-15 20:24     ` Hugh Dickins
2014-06-16 15:59     ` Naoya Horiguchi
2014-06-16 15:59       ` Naoya Horiguchi
2014-06-12 21:48 ` [PATCH -mm v2 03/11] memcg: separate mem_cgroup_move_charge_pte_range() Naoya Horiguchi
2014-06-12 21:48   ` Naoya Horiguchi
2014-06-12 21:48 ` [PATCH -mm v2 04/11] pagewalk: move pmd_trans_huge_lock() from callbacks to common code Naoya Horiguchi
2014-06-12 21:48   ` Naoya Horiguchi
2014-06-17 14:27   ` Jerome Marchand
2014-06-17 14:27     ` Jerome Marchand
2014-06-17 15:01     ` Naoya Horiguchi
2014-06-17 15:01       ` Naoya Horiguchi
2014-06-18 15:13       ` Jerome Marchand
2014-06-18 15:13         ` Jerome Marchand
2014-06-18 15:31         ` Naoya Horiguchi
2014-06-18 15:31           ` Naoya Horiguchi
2014-06-12 21:48 ` [PATCH -mm v2 05/11] pagewalk: remove mm_walk->skip Naoya Horiguchi
2014-06-12 21:48   ` Naoya Horiguchi
2014-06-12 21:48 ` [PATCH -mm v2 06/11] pagewalk: add size to struct mm_walk Naoya Horiguchi
2014-06-12 21:48   ` Naoya Horiguchi
2014-06-12 22:07   ` Dave Hansen
2014-06-12 22:07     ` Dave Hansen
2014-06-12 22:36     ` Naoya Horiguchi
2014-06-12 21:48 ` [PATCH -mm v2 07/11] pagewalk: change type of arg of callbacks Naoya Horiguchi
2014-06-12 21:48   ` Naoya Horiguchi
2014-06-12 21:48 ` [PATCH -mm v2 08/11] pagewalk: update comment on walk_page_range() Naoya Horiguchi
2014-06-12 21:48   ` Naoya Horiguchi
2014-06-12 21:48 ` [PATCH -mm v2 09/11] fs/proc/task_mmu.c: refactor smaps Naoya Horiguchi
2014-06-12 21:48   ` Naoya Horiguchi
2014-06-12 21:48 ` [PATCH -mm v2 10/11] fs/proc/task_mmu.c: clean up gather_*_stats() Naoya Horiguchi
2014-06-12 21:48   ` Naoya Horiguchi
2014-06-12 21:48 ` [PATCH -mm v2 11/11] mincore: apply page table walker on do_mincore() Naoya Horiguchi
2014-06-12 21:48   ` Naoya Horiguchi
2014-06-12 21:56 ` [PATCH -mm v2 00/11] pagewalk: standardize current users, move pmd locking, apply to mincore Andrew Morton
2014-06-12 21:56   ` Andrew Morton
2014-06-12 22:21   ` Naoya Horiguchi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.