All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Refactor do_wp_page, no functional change
@ 2014-12-01 12:04 Shachar Raindel
  2014-12-01 12:04 ` [PATCH 1/5] mm: Refactor do_wp_page, extract the reuse case Shachar Raindel
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Shachar Raindel @ 2014-12-01 12:04 UTC (permalink / raw)
  To: linux-mm
  Cc: kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken, raindel

Currently do_wp_page contains 265 code lines. It also contains 9 goto
statements, of which 5 are targeting labels which are not cleanup
related. This makes the function extremely difficult to
understand. The following patches are an attempt at breaking the
function to its basic components, and making it easier to understand.

The first 4 patches are straight forward function extractions from
do_wp_page. As we extract functions, we remove unneeded parameters and
simplify the code as much as possible. However, the functionality is
supposed to remain completely unchanged. The patches also attempt to
document the functionality of each extracted function.

The last patch moves the MMU notifier call. Originally, it was
conditionally called from the unlock step. The patch moves it to the
only call site which sets the conditions to call the notifier. This
results in a minor functional change - the notifier for the end of the
invalidation is called after we release the page cache of the old
page, and not before. Given that the notifier is for the end of the
invalidation period, this is supposed to be OK for all users of the
MMU notifiers, who should not be touching the relevant page anyway.

The patches have been tested using trinity on a KVM machine with 4
vCPU, with all possible kernel debugging options enabled. So far, we
have not seen any regressions. We have also tested the patches with
internal tests we have that stress the MMU notifiers, again without
seeing any issues.

Shachar Raindel (5):
  mm: Refactor do_wp_page, extract the reuse case
  mm: Refactor do_wp_page - extract the unlock flow
  mm: refactor do_wp_page, extract the page copy flow
  mm: Refactor do_wp_page handling of shared vma into a function
  mm: Move the MMU-notifier code from wp_page_unlock to wp_page_copy

 mm/memory.c | 418 +++++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 247 insertions(+), 171 deletions(-)

-- 
1.7.11.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5] mm: Refactor do_wp_page, extract the reuse case
  2014-12-01 12:04 [PATCH 0/5] Refactor do_wp_page, no functional change Shachar Raindel
@ 2014-12-01 12:04 ` Shachar Raindel
  2014-12-01 12:30   ` Kirill A. Shutemov
  2014-12-01 12:04 ` [PATCH 2/5] mm: Refactor do_wp_page - extract the unlock flow Shachar Raindel
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Shachar Raindel @ 2014-12-01 12:04 UTC (permalink / raw)
  To: linux-mm
  Cc: kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken, raindel

When do_wp_page is ending, in several cases it needs to reuse the
existing page. This is achieved by making the page table writable,
and possibly updating the page-cache state.

Currently, this logic was "called" by using a goto jump. This makes
following the control flow of the function harder. It is also
against the coding style guidelines for using goto.

As the code can easily be refactored into a specialized function,
refactor it out and simplify the code flow in do_wp_page.

Signed-off-by: Shachar Raindel <raindel@mellanox.com>
---
 mm/memory.c | 136 ++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 78 insertions(+), 58 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 3e50383..61334e9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2020,6 +2020,75 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page,
 }
 
 /*
+ * Handle write page faults for pages that can be reused in the current vma
+ *
+ * This can happen either due to the mapping being with the VM_SHARED flag,
+ * or due to us being the last reference standing to the page. In either
+ * case, all we need to do here is to mark the page as writable and update
+ * any related book-keeping.
+ */
+static int wp_page_reuse(struct mm_struct *mm, struct vm_area_struct *vma,
+			 unsigned long address, pte_t *page_table,
+			 spinlock_t *ptl, pte_t orig_pte,
+			 struct page *recycled_page, int dirty_page,
+			 int page_mkwrite)
+	__releases(ptl)
+{
+	pte_t entry;
+	/*
+	 * Clear the pages cpupid information as the existing
+	 * information potentially belongs to a now completely
+	 * unrelated process.
+	 */
+	if (recycled_page)
+		page_cpupid_xchg_last(recycled_page,
+				      (1 << LAST_CPUPID_SHIFT) - 1);
+
+	flush_cache_page(vma, address, pte_pfn(orig_pte));
+	entry = pte_mkyoung(orig_pte);
+	entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+	if (ptep_set_access_flags(vma, address, page_table, entry, 1))
+		update_mmu_cache(vma, address, page_table);
+	pte_unmap_unlock(page_table, ptl);
+
+	if (!dirty_page || !recycled_page)
+		return VM_FAULT_WRITE;
+
+	/*
+	 * Yes, Virginia, this is actually required to prevent a race
+	 * with clear_page_dirty_for_io() from clearing the page dirty
+	 * bit after it clear all dirty ptes, but before a racing
+	 * do_wp_page installs a dirty pte.
+	 *
+	 * do_shared_fault is protected similarly.
+	 */
+	if (!page_mkwrite) {
+		wait_on_page_locked(recycled_page);
+		set_page_dirty_balance(recycled_page);
+		/* file_update_time outside page_lock */
+		if (vma->vm_file)
+			file_update_time(vma->vm_file);
+	}
+	put_page(recycled_page);
+	if (page_mkwrite) {
+		struct address_space *mapping = recycled_page->mapping;
+
+		set_page_dirty(recycled_page);
+		unlock_page(recycled_page);
+		page_cache_release(recycled_page);
+		if (mapping)	{
+			/*
+			 * Some device drivers do not set page.mapping
+			 * but still dirty their pages
+			 */
+			balance_dirty_pages_ratelimited(mapping);
+		}
+	}
+
+	return VM_FAULT_WRITE;
+}
+
+/*
  * This routine handles present pages, when users try to write
  * to a shared page. It is done by copying the page to a new address
  * and decrementing the shared-page counter for the old page.
@@ -2045,8 +2114,6 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	struct page *old_page, *new_page = NULL;
 	pte_t entry;
 	int ret = 0;
-	int page_mkwrite = 0;
-	struct page *dirty_page = NULL;
 	unsigned long mmun_start = 0;	/* For mmu_notifiers */
 	unsigned long mmun_end = 0;	/* For mmu_notifiers */
 	struct mem_cgroup *memcg;
@@ -2063,7 +2130,8 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		 */
 		if ((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
 				     (VM_WRITE|VM_SHARED))
-			goto reuse;
+			return wp_page_reuse(mm, vma, address, page_table, ptl,
+					     orig_pte, old_page, 0, 0);
 		goto gotten;
 	}
 
@@ -2092,11 +2160,14 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			 */
 			page_move_anon_rmap(old_page, vma, address);
 			unlock_page(old_page);
-			goto reuse;
+			return wp_page_reuse(mm, vma, address, page_table, ptl,
+					     orig_pte, old_page, 0, 0);
 		}
 		unlock_page(old_page);
 	} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
 					(VM_WRITE|VM_SHARED))) {
+		int page_mkwrite = 0;
+
 		/*
 		 * Only catch write-faults on shared writable pages,
 		 * read-only shared pages can get COWed by
@@ -2127,61 +2198,10 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 
 			page_mkwrite = 1;
 		}
-		dirty_page = old_page;
-		get_page(dirty_page);
-
-reuse:
-		/*
-		 * Clear the pages cpupid information as the existing
-		 * information potentially belongs to a now completely
-		 * unrelated process.
-		 */
-		if (old_page)
-			page_cpupid_xchg_last(old_page, (1 << LAST_CPUPID_SHIFT) - 1);
-
-		flush_cache_page(vma, address, pte_pfn(orig_pte));
-		entry = pte_mkyoung(orig_pte);
-		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-		if (ptep_set_access_flags(vma, address, page_table, entry,1))
-			update_mmu_cache(vma, address, page_table);
-		pte_unmap_unlock(page_table, ptl);
-		ret |= VM_FAULT_WRITE;
-
-		if (!dirty_page)
-			return ret;
+		get_page(old_page);
 
-		/*
-		 * Yes, Virginia, this is actually required to prevent a race
-		 * with clear_page_dirty_for_io() from clearing the page dirty
-		 * bit after it clear all dirty ptes, but before a racing
-		 * do_wp_page installs a dirty pte.
-		 *
-		 * do_shared_fault is protected similarly.
-		 */
-		if (!page_mkwrite) {
-			wait_on_page_locked(dirty_page);
-			set_page_dirty_balance(dirty_page);
-			/* file_update_time outside page_lock */
-			if (vma->vm_file)
-				file_update_time(vma->vm_file);
-		}
-		put_page(dirty_page);
-		if (page_mkwrite) {
-			struct address_space *mapping = dirty_page->mapping;
-
-			set_page_dirty(dirty_page);
-			unlock_page(dirty_page);
-			page_cache_release(dirty_page);
-			if (mapping)	{
-				/*
-				 * Some device drivers do not set page.mapping
-				 * but still dirty their pages
-				 */
-				balance_dirty_pages_ratelimited(mapping);
-			}
-		}
-
-		return ret;
+		return wp_page_reuse(mm, vma, address, page_table, ptl,
+				     orig_pte, old_page, 1, page_mkwrite);
 	}
 
 	/*
-- 
1.7.11.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] mm: Refactor do_wp_page - extract the unlock flow
  2014-12-01 12:04 [PATCH 0/5] Refactor do_wp_page, no functional change Shachar Raindel
  2014-12-01 12:04 ` [PATCH 1/5] mm: Refactor do_wp_page, extract the reuse case Shachar Raindel
@ 2014-12-01 12:04 ` Shachar Raindel
  2014-12-01 12:43   ` Kirill A. Shutemov
  2014-12-01 12:04 ` [PATCH 3/5] mm: refactor do_wp_page, extract the page copy flow Shachar Raindel
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Shachar Raindel @ 2014-12-01 12:04 UTC (permalink / raw)
  To: linux-mm
  Cc: kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken, raindel

When do_wp_page is ending, in several cases it needs to unlock the
pages and ptls it was accessing.

Currently, this logic was "called" by using a goto jump. This makes
following the control flow of the function harder. It is also
against the coding style guidelines for using goto.

As the code can easily be refactored into a specialized function,
refactor it out and simplify the callsites code flow.

Using goto for cleanup is generally allowed. However, extracting the
cleanup to a separate function will allow deeper refactoring in the
next patch.

Signed-off-by: Shachar Raindel <raindel@mellanox.com>
---
 mm/memory.c | 66 +++++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 45 insertions(+), 21 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 61334e9..dd3bb13 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2089,6 +2089,40 @@ static int wp_page_reuse(struct mm_struct *mm, struct vm_area_struct *vma,
 }
 
 /*
+ * Release the ptl locking, as well as page references do_wp_page took.
+ *
+ * This function releases any locking and references remaining in the
+ * end of do_wp_page. The ptl lock is taken before do_wp_page is
+ * called. The old_page page reference is taken early in the execution
+ * of do_wp_page. However, in the OOM case we need to cleanup only the
+ * page-cache reference and not the ptl lock, which was dropped
+ * earlier. This results in highly assymetric release path.
+ */
+static int wp_page_unlock(struct mm_struct *mm, struct vm_area_struct *vma,
+			  pte_t *page_table, spinlock_t *ptl,
+			  unsigned long mmun_start, unsigned long mmun_end,
+			  struct page *old_page, int page_copied)
+	__releases(ptl)
+{
+	pte_unmap_unlock(page_table, ptl);
+	if (mmun_end > mmun_start)
+		mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+	if (old_page) {
+		/*
+		 * Don't let another task, with possibly unlocked vma,
+		 * keep the mlocked page.
+		 */
+		if (page_copied && (vma->vm_flags & VM_LOCKED)) {
+			lock_page(old_page);	/* LRU manipulation */
+			munlock_vma_page(old_page);
+			unlock_page(old_page);
+		}
+		page_cache_release(old_page);
+	}
+	return page_copied ? VM_FAULT_WRITE : 0;
+}
+
+/*
  * This routine handles present pages, when users try to write
  * to a shared page. It is done by copying the page to a new address
  * and decrementing the shared-page counter for the old page.
@@ -2113,7 +2147,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 {
 	struct page *old_page, *new_page = NULL;
 	pte_t entry;
-	int ret = 0;
+	int page_copied = 0;
 	unsigned long mmun_start = 0;	/* For mmu_notifiers */
 	unsigned long mmun_end = 0;	/* For mmu_notifiers */
 	struct mem_cgroup *memcg;
@@ -2148,7 +2182,9 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 							 &ptl);
 			if (!pte_same(*page_table, orig_pte)) {
 				unlock_page(old_page);
-				goto unlock;
+				return wp_page_unlock(mm, vma, page_table, ptl,
+						      0, 0,
+						      old_page, 0);
 			}
 			page_cache_release(old_page);
 		}
@@ -2193,7 +2229,9 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 							 &ptl);
 			if (!pte_same(*page_table, orig_pte)) {
 				unlock_page(old_page);
-				goto unlock;
+				return wp_page_unlock(mm, vma, page_table, ptl,
+						      0, 0,
+						      old_page, 0);
 			}
 
 			page_mkwrite = 1;
@@ -2293,29 +2331,15 @@ gotten:
 
 		/* Free the old page.. */
 		new_page = old_page;
-		ret |= VM_FAULT_WRITE;
+		page_copied = 1;
 	} else
 		mem_cgroup_cancel_charge(new_page, memcg);
 
 	if (new_page)
 		page_cache_release(new_page);
-unlock:
-	pte_unmap_unlock(page_table, ptl);
-	if (mmun_end > mmun_start)
-		mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
-	if (old_page) {
-		/*
-		 * Don't let another task, with possibly unlocked vma,
-		 * keep the mlocked page.
-		 */
-		if ((ret & VM_FAULT_WRITE) && (vma->vm_flags & VM_LOCKED)) {
-			lock_page(old_page);	/* LRU manipulation */
-			munlock_vma_page(old_page);
-			unlock_page(old_page);
-		}
-		page_cache_release(old_page);
-	}
-	return ret;
+
+	return wp_page_unlock(mm, vma, page_table, ptl, mmun_start,
+			      mmun_end, old_page, page_copied);
 oom_free_new:
 	page_cache_release(new_page);
 oom:
-- 
1.7.11.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] mm: refactor do_wp_page, extract the page copy flow
  2014-12-01 12:04 [PATCH 0/5] Refactor do_wp_page, no functional change Shachar Raindel
  2014-12-01 12:04 ` [PATCH 1/5] mm: Refactor do_wp_page, extract the reuse case Shachar Raindel
  2014-12-01 12:04 ` [PATCH 2/5] mm: Refactor do_wp_page - extract the unlock flow Shachar Raindel
@ 2014-12-01 12:04 ` Shachar Raindel
  2014-12-01 12:57   ` Kirill A. Shutemov
  2014-12-01 12:04 ` [PATCH 4/5] mm: Refactor do_wp_page handling of shared vma into a function Shachar Raindel
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Shachar Raindel @ 2014-12-01 12:04 UTC (permalink / raw)
  To: linux-mm
  Cc: kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken, raindel

In some cases, do_wp_page had to copy the page suffering a write fault
to a new location. If the function logic decided that to do this, it
was done by jumping with a "goto" operation to the relevant code
block. This made the code really hard to understand. It is also
against the kernel coding style guidelines.

This patch extracts the page copy and page table update logic to a
separate function. It also clean up the naming, from "gotten" to
"wp_page_copy", and adds few comments.

Signed-off-by: Shachar Raindel <raindel@mellanox.com>
---
 mm/memory.c | 238 +++++++++++++++++++++++++++++++++---------------------------
 1 file changed, 131 insertions(+), 107 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index dd3bb13..436012d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2123,6 +2123,132 @@ static int wp_page_unlock(struct mm_struct *mm, struct vm_area_struct *vma,
 }
 
 /*
+ * Handle the case of a page which we actually need to copy to a new page.
+ *
+ * High level logic flow:
+ *
+ * - Drop the PTL, allocate a page, copy the content.
+ * - Handle book keeping and accounting - cgroups, mmu-notifiers, etc.
+ * - Regain the PTL. If the pte changed, bail out and release the allocated page
+ * - If the pte is still the way we remember it, update the page table and all
+ *   relevant references. This includes dropping the reference the page-table
+ *   held to the old page, as well as updating the rmap.
+ * - In any case, unlock the PTL and drop the reference we took to the old page.
+ */
+static int wp_page_copy(struct mm_struct *mm, struct vm_area_struct *vma,
+			unsigned long address, pte_t *page_table, pmd_t *pmd,
+			spinlock_t *ptl, pte_t orig_pte, struct page *old_page)
+	__releases(ptl)
+{
+	struct page *new_page = NULL;
+	pte_t entry;
+	int page_copied = 0;
+	const unsigned long mmun_start = address & PAGE_MASK;	/* For mmu_notifiers */
+	const unsigned long mmun_end = mmun_start + PAGE_SIZE;	/* For mmu_notifiers */
+	struct mem_cgroup *memcg;
+
+	pte_unmap_unlock(page_table, ptl);
+
+	if (unlikely(anon_vma_prepare(vma)))
+		goto oom;
+
+	if (is_zero_pfn(pte_pfn(orig_pte))) {
+		new_page = alloc_zeroed_user_highpage_movable(vma, address);
+		if (!new_page)
+			goto oom;
+	} else {
+		new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
+		if (!new_page)
+			goto oom;
+		cow_user_page(new_page, old_page, address, vma);
+	}
+	__SetPageUptodate(new_page);
+
+	if (mem_cgroup_try_charge(new_page, mm, GFP_KERNEL, &memcg))
+		goto oom_free_new;
+
+	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
+
+	/*
+	 * Re-check the pte - we dropped the lock
+	 */
+	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
+	if (likely(pte_same(*page_table, orig_pte))) {
+		if (old_page) {
+			if (!PageAnon(old_page)) {
+				dec_mm_counter_fast(mm, MM_FILEPAGES);
+				inc_mm_counter_fast(mm, MM_ANONPAGES);
+			}
+		} else {
+			inc_mm_counter_fast(mm, MM_ANONPAGES);
+		}
+		flush_cache_page(vma, address, pte_pfn(orig_pte));
+		entry = mk_pte(new_page, vma->vm_page_prot);
+		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+		/*
+		 * Clear the pte entry and flush it first, before updating the
+		 * pte with the new entry. This will avoid a race condition
+		 * seen in the presence of one thread doing SMC and another
+		 * thread doing COW.
+		 */
+		ptep_clear_flush(vma, address, page_table);
+		page_add_new_anon_rmap(new_page, vma, address);
+		mem_cgroup_commit_charge(new_page, memcg, false);
+		lru_cache_add_active_or_unevictable(new_page, vma);
+		/*
+		 * We call the notify macro here because, when using secondary
+		 * mmu page tables (such as kvm shadow page tables), we want the
+		 * new page to be mapped directly into the secondary page table.
+		 */
+		set_pte_at_notify(mm, address, page_table, entry);
+		update_mmu_cache(vma, address, page_table);
+		if (old_page) {
+			/*
+			 * Only after switching the pte to the new page may
+			 * we remove the mapcount here. Otherwise another
+			 * process may come and find the rmap count decremented
+			 * before the pte is switched to the new page, and
+			 * "reuse" the old page writing into it while our pte
+			 * here still points into it and can be read by other
+			 * threads.
+			 *
+			 * The critical issue is to order this
+			 * page_remove_rmap with the ptp_clear_flush above.
+			 * Those stores are ordered by (if nothing else,)
+			 * the barrier present in the atomic_add_negative
+			 * in page_remove_rmap.
+			 *
+			 * Then the TLB flush in ptep_clear_flush ensures that
+			 * no process can access the old page before the
+			 * decremented mapcount is visible. And the old page
+			 * cannot be reused until after the decremented
+			 * mapcount is visible. So transitively, TLBs to
+			 * old page will be flushed before it can be reused.
+			 */
+			page_remove_rmap(old_page);
+		}
+
+		/* Free the old page.. */
+		new_page = old_page;
+		page_copied = 1;
+	} else {
+		mem_cgroup_cancel_charge(new_page, memcg);
+	}
+
+	if (new_page)
+		page_cache_release(new_page);
+
+	return wp_page_unlock(mm, vma, page_table, ptl, mmun_start,
+			      mmun_end, old_page, page_copied);
+oom_free_new:
+	page_cache_release(new_page);
+oom:
+	if (old_page)
+		page_cache_release(old_page);
+	return VM_FAULT_OOM;
+}
+
+/*
  * This routine handles present pages, when users try to write
  * to a shared page. It is done by copying the page to a new address
  * and decrementing the shared-page counter for the old page.
@@ -2145,12 +2271,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		spinlock_t *ptl, pte_t orig_pte)
 	__releases(ptl)
 {
-	struct page *old_page, *new_page = NULL;
-	pte_t entry;
-	int page_copied = 0;
-	unsigned long mmun_start = 0;	/* For mmu_notifiers */
-	unsigned long mmun_end = 0;	/* For mmu_notifiers */
-	struct mem_cgroup *memcg;
+	struct page *old_page;
 
 	old_page = vm_normal_page(vma, address, orig_pte);
 	if (!old_page) {
@@ -2166,7 +2287,8 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 				     (VM_WRITE|VM_SHARED))
 			return wp_page_reuse(mm, vma, address, page_table, ptl,
 					     orig_pte, old_page, 0, 0);
-		goto gotten;
+		return wp_page_copy(mm, vma, address, page_table, pmd,
+				    ptl, orig_pte, old_page);
 	}
 
 	/*
@@ -2246,106 +2368,8 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * Ok, we need to copy. Oh, well..
 	 */
 	page_cache_get(old_page);
-gotten:
-	pte_unmap_unlock(page_table, ptl);
-
-	if (unlikely(anon_vma_prepare(vma)))
-		goto oom;
-
-	if (is_zero_pfn(pte_pfn(orig_pte))) {
-		new_page = alloc_zeroed_user_highpage_movable(vma, address);
-		if (!new_page)
-			goto oom;
-	} else {
-		new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
-		if (!new_page)
-			goto oom;
-		cow_user_page(new_page, old_page, address, vma);
-	}
-	__SetPageUptodate(new_page);
-
-	if (mem_cgroup_try_charge(new_page, mm, GFP_KERNEL, &memcg))
-		goto oom_free_new;
-
-	mmun_start  = address & PAGE_MASK;
-	mmun_end    = mmun_start + PAGE_SIZE;
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
-
-	/*
-	 * Re-check the pte - we dropped the lock
-	 */
-	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
-	if (likely(pte_same(*page_table, orig_pte))) {
-		if (old_page) {
-			if (!PageAnon(old_page)) {
-				dec_mm_counter_fast(mm, MM_FILEPAGES);
-				inc_mm_counter_fast(mm, MM_ANONPAGES);
-			}
-		} else
-			inc_mm_counter_fast(mm, MM_ANONPAGES);
-		flush_cache_page(vma, address, pte_pfn(orig_pte));
-		entry = mk_pte(new_page, vma->vm_page_prot);
-		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-		/*
-		 * Clear the pte entry and flush it first, before updating the
-		 * pte with the new entry. This will avoid a race condition
-		 * seen in the presence of one thread doing SMC and another
-		 * thread doing COW.
-		 */
-		ptep_clear_flush(vma, address, page_table);
-		page_add_new_anon_rmap(new_page, vma, address);
-		mem_cgroup_commit_charge(new_page, memcg, false);
-		lru_cache_add_active_or_unevictable(new_page, vma);
-		/*
-		 * We call the notify macro here because, when using secondary
-		 * mmu page tables (such as kvm shadow page tables), we want the
-		 * new page to be mapped directly into the secondary page table.
-		 */
-		set_pte_at_notify(mm, address, page_table, entry);
-		update_mmu_cache(vma, address, page_table);
-		if (old_page) {
-			/*
-			 * Only after switching the pte to the new page may
-			 * we remove the mapcount here. Otherwise another
-			 * process may come and find the rmap count decremented
-			 * before the pte is switched to the new page, and
-			 * "reuse" the old page writing into it while our pte
-			 * here still points into it and can be read by other
-			 * threads.
-			 *
-			 * The critical issue is to order this
-			 * page_remove_rmap with the ptp_clear_flush above.
-			 * Those stores are ordered by (if nothing else,)
-			 * the barrier present in the atomic_add_negative
-			 * in page_remove_rmap.
-			 *
-			 * Then the TLB flush in ptep_clear_flush ensures that
-			 * no process can access the old page before the
-			 * decremented mapcount is visible. And the old page
-			 * cannot be reused until after the decremented
-			 * mapcount is visible. So transitively, TLBs to
-			 * old page will be flushed before it can be reused.
-			 */
-			page_remove_rmap(old_page);
-		}
-
-		/* Free the old page.. */
-		new_page = old_page;
-		page_copied = 1;
-	} else
-		mem_cgroup_cancel_charge(new_page, memcg);
-
-	if (new_page)
-		page_cache_release(new_page);
-
-	return wp_page_unlock(mm, vma, page_table, ptl, mmun_start,
-			      mmun_end, old_page, page_copied);
-oom_free_new:
-	page_cache_release(new_page);
-oom:
-	if (old_page)
-		page_cache_release(old_page);
-	return VM_FAULT_OOM;
+	return wp_page_copy(mm, vma, address, page_table, pmd,
+			    ptl, orig_pte, old_page);
 }
 
 static void unmap_mapping_range_vma(struct vm_area_struct *vma,
-- 
1.7.11.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] mm: Refactor do_wp_page handling of shared vma into a function
  2014-12-01 12:04 [PATCH 0/5] Refactor do_wp_page, no functional change Shachar Raindel
                   ` (2 preceding siblings ...)
  2014-12-01 12:04 ` [PATCH 3/5] mm: refactor do_wp_page, extract the page copy flow Shachar Raindel
@ 2014-12-01 12:04 ` Shachar Raindel
  2014-12-01 13:03   ` Kirill A. Shutemov
  2014-12-01 12:04 ` [PATCH 5/5] mm: Move the MMU-notifier code from wp_page_unlock to wp_page_copy Shachar Raindel
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Shachar Raindel @ 2014-12-01 12:04 UTC (permalink / raw)
  To: linux-mm
  Cc: kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken, raindel

The do_wp_page function is extremely long. Extract the logic for
handling a page belonging to a shared vma into a function of its own.

This helps the readability of the code, without doing any functional
change in it.

Signed-off-by: Shachar Raindel <raindel@mellanox.com>
---
 mm/memory.c | 87 ++++++++++++++++++++++++++++++++++---------------------------
 1 file changed, 49 insertions(+), 38 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 436012d..8023cf3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2248,6 +2248,53 @@ oom:
 	return VM_FAULT_OOM;
 }
 
+static int wp_page_shared_vma(struct mm_struct *mm, struct vm_area_struct *vma,
+			      unsigned long address, pte_t *page_table,
+			      pmd_t *pmd, spinlock_t *ptl, pte_t orig_pte,
+			      struct page *old_page)
+	__releases(ptl)
+{
+	int page_mkwrite = 0;
+
+	/*
+	 * Only catch write-faults on shared writable pages,
+	 * read-only shared pages can get COWed by
+	 * get_user_pages(.write=1, .force=1).
+	 */
+	if (vma->vm_ops && vma->vm_ops->page_mkwrite) {
+		int tmp;
+
+		page_cache_get(old_page);
+		pte_unmap_unlock(page_table, ptl);
+		tmp = do_page_mkwrite(vma, old_page, address);
+		if (unlikely(!tmp || (tmp &
+				      (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
+			page_cache_release(old_page);
+			return tmp;
+		}
+		/*
+		 * Since we dropped the lock we need to revalidate
+		 * the PTE as someone else may have changed it.  If
+		 * they did, we just return, as we can count on the
+		 * MMU to tell us if they didn't also make it writable.
+		 */
+		page_table = pte_offset_map_lock(mm, pmd, address,
+						 &ptl);
+		if (!pte_same(*page_table, orig_pte)) {
+			unlock_page(old_page);
+			return wp_page_unlock(mm, vma, page_table, ptl,
+					      0, 0,
+					      old_page, 0);
+		}
+
+		page_mkwrite = 1;
+	}
+	get_page(old_page);
+
+	return wp_page_reuse(mm, vma, address, page_table, ptl, orig_pte,
+			     old_page, 1, page_mkwrite);
+}
+
 /*
  * This routine handles present pages, when users try to write
  * to a shared page. It is done by copying the page to a new address
@@ -2324,44 +2371,8 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		unlock_page(old_page);
 	} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
 					(VM_WRITE|VM_SHARED))) {
-		int page_mkwrite = 0;
-
-		/*
-		 * Only catch write-faults on shared writable pages,
-		 * read-only shared pages can get COWed by
-		 * get_user_pages(.write=1, .force=1).
-		 */
-		if (vma->vm_ops && vma->vm_ops->page_mkwrite) {
-			int tmp;
-			page_cache_get(old_page);
-			pte_unmap_unlock(page_table, ptl);
-			tmp = do_page_mkwrite(vma, old_page, address);
-			if (unlikely(!tmp || (tmp &
-					(VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
-				page_cache_release(old_page);
-				return tmp;
-			}
-			/*
-			 * Since we dropped the lock we need to revalidate
-			 * the PTE as someone else may have changed it.  If
-			 * they did, we just return, as we can count on the
-			 * MMU to tell us if they didn't also make it writable.
-			 */
-			page_table = pte_offset_map_lock(mm, pmd, address,
-							 &ptl);
-			if (!pte_same(*page_table, orig_pte)) {
-				unlock_page(old_page);
-				return wp_page_unlock(mm, vma, page_table, ptl,
-						      0, 0,
-						      old_page, 0);
-			}
-
-			page_mkwrite = 1;
-		}
-		get_page(old_page);
-
-		return wp_page_reuse(mm, vma, address, page_table, ptl,
-				     orig_pte, old_page, 1, page_mkwrite);
+		return wp_page_shared_vma(mm, vma, address, page_table, pmd,
+					  ptl, orig_pte, old_page);
 	}
 
 	/*
-- 
1.7.11.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] mm: Move the MMU-notifier code from wp_page_unlock to wp_page_copy
  2014-12-01 12:04 [PATCH 0/5] Refactor do_wp_page, no functional change Shachar Raindel
                   ` (3 preceding siblings ...)
  2014-12-01 12:04 ` [PATCH 4/5] mm: Refactor do_wp_page handling of shared vma into a function Shachar Raindel
@ 2014-12-01 12:04 ` Shachar Raindel
  2014-12-01 16:47 ` [PATCH 0/5] Refactor do_wp_page, no functional change Linus Torvalds
  2014-12-01 16:53 ` Andi Kleen
  6 siblings, 0 replies; 14+ messages in thread
From: Shachar Raindel @ 2014-12-01 12:04 UTC (permalink / raw)
  To: linux-mm
  Cc: kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken, raindel

The MMU-notifier code is needed only in case we copied a page. In the
original code, as the tail call was explicit, we had to handle it as a
special case there. However, now that the unlock flow is a separate
function, this is not the case. We explicitly call
mmu_notifier_invalidate_range_end in wp_page_copy, after finishing all
of the unlock logic. This also makes it much easier to see the pairing
of mmu_notifier_invalidate_range_start and
mmu_notifier_invalidate_range_end in the same function.

Signed-off-by: Shachar Raindel <raindel@mellanox.com>
---
 mm/memory.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 8023cf3..68fab34 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2100,13 +2100,10 @@ static int wp_page_reuse(struct mm_struct *mm, struct vm_area_struct *vma,
  */
 static int wp_page_unlock(struct mm_struct *mm, struct vm_area_struct *vma,
 			  pte_t *page_table, spinlock_t *ptl,
-			  unsigned long mmun_start, unsigned long mmun_end,
 			  struct page *old_page, int page_copied)
 	__releases(ptl)
 {
 	pte_unmap_unlock(page_table, ptl);
-	if (mmun_end > mmun_start)
-		mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
 	if (old_page) {
 		/*
 		 * Don't let another task, with possibly unlocked vma,
@@ -2143,6 +2140,7 @@ static int wp_page_copy(struct mm_struct *mm, struct vm_area_struct *vma,
 	struct page *new_page = NULL;
 	pte_t entry;
 	int page_copied = 0;
+	int ret = 0;
 	const unsigned long mmun_start = address & PAGE_MASK;	/* For mmu_notifiers */
 	const unsigned long mmun_end = mmun_start + PAGE_SIZE;	/* For mmu_notifiers */
 	struct mem_cgroup *memcg;
@@ -2238,8 +2236,9 @@ static int wp_page_copy(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (new_page)
 		page_cache_release(new_page);
 
-	return wp_page_unlock(mm, vma, page_table, ptl, mmun_start,
-			      mmun_end, old_page, page_copied);
+	ret = wp_page_unlock(mm, vma, page_table, ptl, old_page, page_copied);
+	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+	return ret;
 oom_free_new:
 	page_cache_release(new_page);
 oom:
@@ -2283,7 +2282,6 @@ static int wp_page_shared_vma(struct mm_struct *mm, struct vm_area_struct *vma,
 		if (!pte_same(*page_table, orig_pte)) {
 			unlock_page(old_page);
 			return wp_page_unlock(mm, vma, page_table, ptl,
-					      0, 0,
 					      old_page, 0);
 		}
 
@@ -2352,7 +2350,6 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			if (!pte_same(*page_table, orig_pte)) {
 				unlock_page(old_page);
 				return wp_page_unlock(mm, vma, page_table, ptl,
-						      0, 0,
 						      old_page, 0);
 			}
 			page_cache_release(old_page);
-- 
1.7.11.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] mm: Refactor do_wp_page, extract the reuse case
  2014-12-01 12:04 ` [PATCH 1/5] mm: Refactor do_wp_page, extract the reuse case Shachar Raindel
@ 2014-12-01 12:30   ` Kirill A. Shutemov
  2014-12-01 12:34     ` Shachar Raindel
  0 siblings, 1 reply; 14+ messages in thread
From: Kirill A. Shutemov @ 2014-12-01 12:30 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: linux-mm, kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken

On Mon, Dec 01, 2014 at 02:04:41PM +0200, Shachar Raindel wrote:
> When do_wp_page is ending, in several cases it needs to reuse the
> existing page. This is achieved by making the page table writable,
> and possibly updating the page-cache state.
> 
> Currently, this logic was "called" by using a goto jump. This makes
> following the control flow of the function harder. It is also
> against the coding style guidelines for using goto.
> 
> As the code can easily be refactored into a specialized function,
> refactor it out and simplify the code flow in do_wp_page.
> 
> Signed-off-by: Shachar Raindel <raindel@mellanox.com>
> ---
>  mm/memory.c | 136 ++++++++++++++++++++++++++++++++++--------------------------
>  1 file changed, 78 insertions(+), 58 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 3e50383..61334e9 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2020,6 +2020,75 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page,
>  }
>  
>  /*
> + * Handle write page faults for pages that can be reused in the current vma
> + *
> + * This can happen either due to the mapping being with the VM_SHARED flag,
> + * or due to us being the last reference standing to the page. In either
> + * case, all we need to do here is to mark the page as writable and update
> + * any related book-keeping.
> + */
> +static int wp_page_reuse(struct mm_struct *mm, struct vm_area_struct *vma,
> +			 unsigned long address, pte_t *page_table,
> +			 spinlock_t *ptl, pte_t orig_pte,
> +			 struct page *recycled_page, int dirty_page,

recycled_page? what's wrong with old_page?

Otherwise:

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 1/5] mm: Refactor do_wp_page, extract the reuse case
  2014-12-01 12:30   ` Kirill A. Shutemov
@ 2014-12-01 12:34     ` Shachar Raindel
  2014-12-01 12:50       ` Kirill A. Shutemov
  0 siblings, 1 reply; 14+ messages in thread
From: Shachar Raindel @ 2014-12-01 12:34 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, Haggai Eran, aarcange,
	pfeiner, hannes, Sagi Grimberg, walken



> -----Original Message-----
> From: Kirill A. Shutemov [mailto:kirill@shutemov.name]
> Sent: Monday, December 01, 2014 2:31 PM
> To: Shachar Raindel
> Cc: linux-mm@kvack.org; kirill.shutemov@linux.intel.com;
> mgorman@suse.de; riel@redhat.com; ak@linux.intel.com;
> matthew.r.wilcox@intel.com; dave.hansen@linux.intel.com; n-
> horiguchi@ah.jp.nec.com; akpm@linux-foundation.org; torvalds@linux-
> foundation.org; Haggai Eran; aarcange@redhat.com; pfeiner@google.com;
> hannes@cmpxchg.org; Sagi Grimberg; walken@google.com
> Subject: Re: [PATCH 1/5] mm: Refactor do_wp_page, extract the reuse case
> 
> On Mon, Dec 01, 2014 at 02:04:41PM +0200, Shachar Raindel wrote:
> > When do_wp_page is ending, in several cases it needs to reuse the
> > existing page. This is achieved by making the page table writable,
> > and possibly updating the page-cache state.
> >
> > Currently, this logic was "called" by using a goto jump. This makes
> > following the control flow of the function harder. It is also
> > against the coding style guidelines for using goto.
> >
> > As the code can easily be refactored into a specialized function,
> > refactor it out and simplify the code flow in do_wp_page.
> >
> > Signed-off-by: Shachar Raindel <raindel@mellanox.com>
> > ---
> >  mm/memory.c | 136 ++++++++++++++++++++++++++++++++++-----------------
> ---------
> >  1 file changed, 78 insertions(+), 58 deletions(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 3e50383..61334e9 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -2020,6 +2020,75 @@ static int do_page_mkwrite(struct
> vm_area_struct *vma, struct page *page,
> >  }
> >
> >  /*
> > + * Handle write page faults for pages that can be reused in the
> current vma
> > + *
> > + * This can happen either due to the mapping being with the VM_SHARED
> flag,
> > + * or due to us being the last reference standing to the page. In
> either
> > + * case, all we need to do here is to mark the page as writable and
> update
> > + * any related book-keeping.
> > + */
> > +static int wp_page_reuse(struct mm_struct *mm, struct vm_area_struct
> *vma,
> > +			 unsigned long address, pte_t *page_table,
> > +			 spinlock_t *ptl, pte_t orig_pte,
> > +			 struct page *recycled_page, int dirty_page,
> 
> recycled_page? what's wrong with old_page?
> 

You are reusing the page in this function, so I was feeling that naming it
"old" is less indicative than "recycled". However, if you prefer "old", I
am happy with it.

> Otherwise:
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> 

Thanks :)

--Shachar

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] mm: Refactor do_wp_page - extract the unlock flow
  2014-12-01 12:04 ` [PATCH 2/5] mm: Refactor do_wp_page - extract the unlock flow Shachar Raindel
@ 2014-12-01 12:43   ` Kirill A. Shutemov
  0 siblings, 0 replies; 14+ messages in thread
From: Kirill A. Shutemov @ 2014-12-01 12:43 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: linux-mm, kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken

On Mon, Dec 01, 2014 at 02:04:42PM +0200, Shachar Raindel wrote:
> When do_wp_page is ending, in several cases it needs to unlock the
> pages and ptls it was accessing.
> 
> Currently, this logic was "called" by using a goto jump. This makes
> following the control flow of the function harder. It is also
> against the coding style guidelines for using goto.
> 
> As the code can easily be refactored into a specialized function,
> refactor it out and simplify the callsites code flow.
> 
> Using goto for cleanup is generally allowed. However, extracting the
> cleanup to a separate function will allow deeper refactoring in the
> next patch.
> 
> Signed-off-by: Shachar Raindel <raindel@mellanox.com>
> ---
>  mm/memory.c | 66 +++++++++++++++++++++++++++++++++++++++++--------------------
>  1 file changed, 45 insertions(+), 21 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 61334e9..dd3bb13 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2089,6 +2089,40 @@ static int wp_page_reuse(struct mm_struct *mm, struct vm_area_struct *vma,
>  }
>  
>  /*
> + * Release the ptl locking, as well as page references do_wp_page took.
> + *
> + * This function releases any locking and references remaining in the
> + * end of do_wp_page. The ptl lock is taken before do_wp_page is
> + * called. The old_page page reference is taken early in the execution
> + * of do_wp_page. However, in the OOM case we need to cleanup only the
> + * page-cache reference and not the ptl lock, which was dropped
> + * earlier. This results in highly assymetric release path.
> + */
> +static int wp_page_unlock(struct mm_struct *mm, struct vm_area_struct *vma,
> +			  pte_t *page_table, spinlock_t *ptl,
> +			  unsigned long mmun_start, unsigned long mmun_end,
> +			  struct page *old_page, int page_copied)
> +	__releases(ptl)
> +{
> +	pte_unmap_unlock(page_table, ptl);
> +	if (mmun_end > mmun_start)
> +		mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
> +	if (old_page) {
> +		/*
> +		 * Don't let another task, with possibly unlocked vma,
> +		 * keep the mlocked page.
> +		 */
> +		if (page_copied && (vma->vm_flags & VM_LOCKED)) {
> +			lock_page(old_page);	/* LRU manipulation */
> +			munlock_vma_page(old_page);
> +			unlock_page(old_page);
> +		}
> +		page_cache_release(old_page);
> +	}
> +	return page_copied ? VM_FAULT_WRITE : 0;
> +}
> +
> +/*
>   * This routine handles present pages, when users try to write
>   * to a shared page. It is done by copying the page to a new address
>   * and decrementing the shared-page counter for the old page.
> @@ -2113,7 +2147,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  {
>  	struct page *old_page, *new_page = NULL;
>  	pte_t entry;
> -	int ret = 0;
> +	int page_copied = 0;
>  	unsigned long mmun_start = 0;	/* For mmu_notifiers */
>  	unsigned long mmun_end = 0;	/* For mmu_notifiers */
>  	struct mem_cgroup *memcg;
> @@ -2148,7 +2182,9 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  							 &ptl);
>  			if (!pte_same(*page_table, orig_pte)) {
>  				unlock_page(old_page);
> -				goto unlock;
> +				return wp_page_unlock(mm, vma, page_table, ptl,
> +						      0, 0,
> +						      old_page, 0);

For me, it makes code more cunfusing, not less.
Here we could just do something plain like:

				pte_unmap_unlock(page_table, ptl);
				page_cache_release(old_page);
				return 0;


>  			}
>  			page_cache_release(old_page);
>  		}
> @@ -2193,7 +2229,9 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  							 &ptl);
>  			if (!pte_same(*page_table, orig_pte)) {
>  				unlock_page(old_page);
> -				goto unlock;
> +				return wp_page_unlock(mm, vma, page_table, ptl,
> +						      0, 0,
> +						      old_page, 0);

Ditto.

>  			}
>  
>  			page_mkwrite = 1;
> @@ -2293,29 +2331,15 @@ gotten:
>  
>  		/* Free the old page.. */
>  		new_page = old_page;
> -		ret |= VM_FAULT_WRITE;
> +		page_copied = 1;
>  	} else
>  		mem_cgroup_cancel_charge(new_page, memcg);
>  
>  	if (new_page)
>  		page_cache_release(new_page);
> -unlock:

And now we don't need the label and can leave code in place.

> -	pte_unmap_unlock(page_table, ptl);
> -	if (mmun_end > mmun_start)
> -		mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
> -	if (old_page) {
> -		/*
> -		 * Don't let another task, with possibly unlocked vma,
> -		 * keep the mlocked page.
> -		 */
> -		if ((ret & VM_FAULT_WRITE) && (vma->vm_flags & VM_LOCKED)) {
> -			lock_page(old_page);	/* LRU manipulation */
> -			munlock_vma_page(old_page);
> -			unlock_page(old_page);
> -		}
> -		page_cache_release(old_page);
> -	}
> -	return ret;
> +
> +	return wp_page_unlock(mm, vma, page_table, ptl, mmun_start,
> +			      mmun_end, old_page, page_copied);
>  oom_free_new:
>  	page_cache_release(new_page);
>  oom:
> -- 
> 1.7.11.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] mm: Refactor do_wp_page, extract the reuse case
  2014-12-01 12:34     ` Shachar Raindel
@ 2014-12-01 12:50       ` Kirill A. Shutemov
  0 siblings, 0 replies; 14+ messages in thread
From: Kirill A. Shutemov @ 2014-12-01 12:50 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: linux-mm, kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, Haggai Eran, aarcange,
	pfeiner, hannes, Sagi Grimberg, walken

On Mon, Dec 01, 2014 at 12:34:30PM +0000, Shachar Raindel wrote:
> 
> 
> > -----Original Message-----
> > From: Kirill A. Shutemov [mailto:kirill@shutemov.name]
> > Sent: Monday, December 01, 2014 2:31 PM
> > To: Shachar Raindel
> > Cc: linux-mm@kvack.org; kirill.shutemov@linux.intel.com;
> > mgorman@suse.de; riel@redhat.com; ak@linux.intel.com;
> > matthew.r.wilcox@intel.com; dave.hansen@linux.intel.com; n-
> > horiguchi@ah.jp.nec.com; akpm@linux-foundation.org; torvalds@linux-
> > foundation.org; Haggai Eran; aarcange@redhat.com; pfeiner@google.com;
> > hannes@cmpxchg.org; Sagi Grimberg; walken@google.com
> > Subject: Re: [PATCH 1/5] mm: Refactor do_wp_page, extract the reuse case
> > 
> > On Mon, Dec 01, 2014 at 02:04:41PM +0200, Shachar Raindel wrote:
> > > When do_wp_page is ending, in several cases it needs to reuse the
> > > existing page. This is achieved by making the page table writable,
> > > and possibly updating the page-cache state.
> > >
> > > Currently, this logic was "called" by using a goto jump. This makes
> > > following the control flow of the function harder. It is also
> > > against the coding style guidelines for using goto.
> > >
> > > As the code can easily be refactored into a specialized function,
> > > refactor it out and simplify the code flow in do_wp_page.
> > >
> > > Signed-off-by: Shachar Raindel <raindel@mellanox.com>
> > > ---
> > >  mm/memory.c | 136 ++++++++++++++++++++++++++++++++++-----------------
> > ---------
> > >  1 file changed, 78 insertions(+), 58 deletions(-)
> > >
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index 3e50383..61334e9 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -2020,6 +2020,75 @@ static int do_page_mkwrite(struct
> > vm_area_struct *vma, struct page *page,
> > >  }
> > >
> > >  /*
> > > + * Handle write page faults for pages that can be reused in the
> > current vma
> > > + *
> > > + * This can happen either due to the mapping being with the VM_SHARED
> > flag,
> > > + * or due to us being the last reference standing to the page. In
> > either
> > > + * case, all we need to do here is to mark the page as writable and
> > update
> > > + * any related book-keeping.
> > > + */
> > > +static int wp_page_reuse(struct mm_struct *mm, struct vm_area_struct
> > *vma,
> > > +			 unsigned long address, pte_t *page_table,
> > > +			 spinlock_t *ptl, pte_t orig_pte,
> > > +			 struct page *recycled_page, int dirty_page,
> > 
> > recycled_page? what's wrong with old_page?
> > 
> 
> You are reusing the page in this function, so I was feeling that naming it
> "old" is less indicative than "recycled". However, if you prefer "old", I
> am happy with it.

Sinse it's the only page in the scope, call it simply -- 'page'.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] mm: refactor do_wp_page, extract the page copy flow
  2014-12-01 12:04 ` [PATCH 3/5] mm: refactor do_wp_page, extract the page copy flow Shachar Raindel
@ 2014-12-01 12:57   ` Kirill A. Shutemov
  0 siblings, 0 replies; 14+ messages in thread
From: Kirill A. Shutemov @ 2014-12-01 12:57 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: linux-mm, kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken

On Mon, Dec 01, 2014 at 02:04:43PM +0200, Shachar Raindel wrote:
> In some cases, do_wp_page had to copy the page suffering a write fault
> to a new location. If the function logic decided that to do this, it
> was done by jumping with a "goto" operation to the relevant code
> block. This made the code really hard to understand. It is also
> against the kernel coding style guidelines.
> 
> This patch extracts the page copy and page table update logic to a
> separate function. It also clean up the naming, from "gotten" to
> "wp_page_copy", and adds few comments.
> 
> Signed-off-by: Shachar Raindel <raindel@mellanox.com>
> ---
>  mm/memory.c | 238 +++++++++++++++++++++++++++++++++---------------------------
>  1 file changed, 131 insertions(+), 107 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index dd3bb13..436012d 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2123,6 +2123,132 @@ static int wp_page_unlock(struct mm_struct *mm, struct vm_area_struct *vma,
>  }
>  
>  /*
> + * Handle the case of a page which we actually need to copy to a new page.
> + *
> + * High level logic flow:
> + *
> + * - Drop the PTL, allocate a page, copy the content.
> + * - Handle book keeping and accounting - cgroups, mmu-notifiers, etc.
> + * - Regain the PTL. If the pte changed, bail out and release the allocated page
> + * - If the pte is still the way we remember it, update the page table and all
> + *   relevant references. This includes dropping the reference the page-table
> + *   held to the old page, as well as updating the rmap.
> + * - In any case, unlock the PTL and drop the reference we took to the old page.
> + */
> +static int wp_page_copy(struct mm_struct *mm, struct vm_area_struct *vma,
> +			unsigned long address, pte_t *page_table, pmd_t *pmd,
> +			spinlock_t *ptl, pte_t orig_pte, struct page *old_page)
> +	__releases(ptl)
> +{
> +	struct page *new_page = NULL;
> +	pte_t entry;
> +	int page_copied = 0;
> +	const unsigned long mmun_start = address & PAGE_MASK;	/* For mmu_notifiers */
> +	const unsigned long mmun_end = mmun_start + PAGE_SIZE;	/* For mmu_notifiers */
> +	struct mem_cgroup *memcg;
> +
> +	pte_unmap_unlock(page_table, ptl);

Move ptl unlock to caller. No need in __releases(ptl) and shorter list of
argument.

Otherwise:

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] mm: Refactor do_wp_page handling of shared vma into a function
  2014-12-01 12:04 ` [PATCH 4/5] mm: Refactor do_wp_page handling of shared vma into a function Shachar Raindel
@ 2014-12-01 13:03   ` Kirill A. Shutemov
  0 siblings, 0 replies; 14+ messages in thread
From: Kirill A. Shutemov @ 2014-12-01 13:03 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: linux-mm, kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken

On Mon, Dec 01, 2014 at 02:04:44PM +0200, Shachar Raindel wrote:
> The do_wp_page function is extremely long. Extract the logic for
> handling a page belonging to a shared vma into a function of its own.
> 
> This helps the readability of the code, without doing any functional
> change in it.
> 
> Signed-off-by: Shachar Raindel <raindel@mellanox.com>
> ---
>  mm/memory.c | 87 ++++++++++++++++++++++++++++++++++---------------------------
>  1 file changed, 49 insertions(+), 38 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 436012d..8023cf3 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2248,6 +2248,53 @@ oom:
>  	return VM_FAULT_OOM;
>  }
>  
> +static int wp_page_shared_vma(struct mm_struct *mm, struct vm_area_struct *vma,

wp_page_shared() is enough. no need in _vma.

> +			      unsigned long address, pte_t *page_table,
> +			      pmd_t *pmd, spinlock_t *ptl, pte_t orig_pte,
> +			      struct page *old_page)
> +	__releases(ptl)
> +{
> +	int page_mkwrite = 0;
> +
> +	/*
> +	 * Only catch write-faults on shared writable pages,
> +	 * read-only shared pages can get COWed by
> +	 * get_user_pages(.write=1, .force=1).
> +	 */
> +	if (vma->vm_ops && vma->vm_ops->page_mkwrite) {

Inversion of the check would help indentation level of the code below.

> +		int tmp;
> +
> +		page_cache_get(old_page);
> +		pte_unmap_unlock(page_table, ptl);
> +		tmp = do_page_mkwrite(vma, old_page, address);
> +		if (unlikely(!tmp || (tmp &
> +				      (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
> +			page_cache_release(old_page);
> +			return tmp;
> +		}
> +		/*
> +		 * Since we dropped the lock we need to revalidate
> +		 * the PTE as someone else may have changed it.  If
> +		 * they did, we just return, as we can count on the
> +		 * MMU to tell us if they didn't also make it writable.
> +		 */
> +		page_table = pte_offset_map_lock(mm, pmd, address,
> +						 &ptl);
> +		if (!pte_same(*page_table, orig_pte)) {
> +			unlock_page(old_page);
> +			return wp_page_unlock(mm, vma, page_table, ptl,
> +					      0, 0,
> +					      old_page, 0);
> +		}
> +
> +		page_mkwrite = 1;
> +	}
> +	get_page(old_page);
> +
> +	return wp_page_reuse(mm, vma, address, page_table, ptl, orig_pte,
> +			     old_page, 1, page_mkwrite);
> +}
> +
>  /*
>   * This routine handles present pages, when users try to write
>   * to a shared page. It is done by copying the page to a new address
> @@ -2324,44 +2371,8 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  		unlock_page(old_page);
>  	} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
>  					(VM_WRITE|VM_SHARED))) {
> -		int page_mkwrite = 0;
> -
> -		/*
> -		 * Only catch write-faults on shared writable pages,
> -		 * read-only shared pages can get COWed by
> -		 * get_user_pages(.write=1, .force=1).
> -		 */
> -		if (vma->vm_ops && vma->vm_ops->page_mkwrite) {
> -			int tmp;
> -			page_cache_get(old_page);
> -			pte_unmap_unlock(page_table, ptl);
> -			tmp = do_page_mkwrite(vma, old_page, address);
> -			if (unlikely(!tmp || (tmp &
> -					(VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
> -				page_cache_release(old_page);
> -				return tmp;
> -			}
> -			/*
> -			 * Since we dropped the lock we need to revalidate
> -			 * the PTE as someone else may have changed it.  If
> -			 * they did, we just return, as we can count on the
> -			 * MMU to tell us if they didn't also make it writable.
> -			 */
> -			page_table = pte_offset_map_lock(mm, pmd, address,
> -							 &ptl);
> -			if (!pte_same(*page_table, orig_pte)) {
> -				unlock_page(old_page);
> -				return wp_page_unlock(mm, vma, page_table, ptl,
> -						      0, 0,
> -						      old_page, 0);
> -			}
> -
> -			page_mkwrite = 1;
> -		}
> -		get_page(old_page);
> -
> -		return wp_page_reuse(mm, vma, address, page_table, ptl,
> -				     orig_pte, old_page, 1, page_mkwrite);
> +		return wp_page_shared_vma(mm, vma, address, page_table, pmd,
> +					  ptl, orig_pte, old_page);
>  	}
>  
>  	/*
> -- 
> 1.7.11.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] Refactor do_wp_page, no functional change
  2014-12-01 12:04 [PATCH 0/5] Refactor do_wp_page, no functional change Shachar Raindel
                   ` (4 preceding siblings ...)
  2014-12-01 12:04 ` [PATCH 5/5] mm: Move the MMU-notifier code from wp_page_unlock to wp_page_copy Shachar Raindel
@ 2014-12-01 16:47 ` Linus Torvalds
  2014-12-01 16:53 ` Andi Kleen
  6 siblings, 0 replies; 14+ messages in thread
From: Linus Torvalds @ 2014-12-01 16:47 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: linux-mm, Kirill A. Shutemov, Mel Gorman, Rik van Riel,
	Andi Kleen, Matthew Wilcox, Dave Hansen, Naoya Horiguchi,
	Andrew Morton, Haggai Eran, Andrea Arcangeli, Peter Feiner,
	Johannes Weiner, Sagi Grimberg, Michel Lespinasse

On Mon, Dec 1, 2014 at 4:04 AM, Shachar Raindel <raindel@mellanox.com> wrote:
>
> The patches have been tested using trinity on a KVM machine with 4
> vCPU, with all possible kernel debugging options enabled. So far, we
> have not seen any regressions. We have also tested the patches with
> internal tests we have that stress the MMU notifiers, again without
> seeing any issues.

Looks good. Please take Kirill's feedback, but apart from that:

  Acked-by: Linus Torvalds <torvalds@linux-foundation.org>

(I assume this will come in through -mm as usual - Andrew, holler if not).

                     Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] Refactor do_wp_page, no functional change
  2014-12-01 12:04 [PATCH 0/5] Refactor do_wp_page, no functional change Shachar Raindel
                   ` (5 preceding siblings ...)
  2014-12-01 16:47 ` [PATCH 0/5] Refactor do_wp_page, no functional change Linus Torvalds
@ 2014-12-01 16:53 ` Andi Kleen
  6 siblings, 0 replies; 14+ messages in thread
From: Andi Kleen @ 2014-12-01 16:53 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: linux-mm, kirill.shutemov, mgorman, riel, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken


Looks good to me from a quick read.

Normally we try to keep the unlock in the same function, even
if it needs goto, but I guess it's ok to move it out in this
case.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-12-01 16:53 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-01 12:04 [PATCH 0/5] Refactor do_wp_page, no functional change Shachar Raindel
2014-12-01 12:04 ` [PATCH 1/5] mm: Refactor do_wp_page, extract the reuse case Shachar Raindel
2014-12-01 12:30   ` Kirill A. Shutemov
2014-12-01 12:34     ` Shachar Raindel
2014-12-01 12:50       ` Kirill A. Shutemov
2014-12-01 12:04 ` [PATCH 2/5] mm: Refactor do_wp_page - extract the unlock flow Shachar Raindel
2014-12-01 12:43   ` Kirill A. Shutemov
2014-12-01 12:04 ` [PATCH 3/5] mm: refactor do_wp_page, extract the page copy flow Shachar Raindel
2014-12-01 12:57   ` Kirill A. Shutemov
2014-12-01 12:04 ` [PATCH 4/5] mm: Refactor do_wp_page handling of shared vma into a function Shachar Raindel
2014-12-01 13:03   ` Kirill A. Shutemov
2014-12-01 12:04 ` [PATCH 5/5] mm: Move the MMU-notifier code from wp_page_unlock to wp_page_copy Shachar Raindel
2014-12-01 16:47 ` [PATCH 0/5] Refactor do_wp_page, no functional change Linus Torvalds
2014-12-01 16:53 ` Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.