All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4]  Refactor do_wp_page, no functional change
@ 2014-12-01 20:58 Shachar Raindel
  2014-12-01 20:58 ` [PATCH v2 1/4] mm: Refactor do_wp_page, extract the reuse case Shachar Raindel
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Shachar Raindel @ 2014-12-01 20:58 UTC (permalink / raw)
  To: linux-mm
  Cc: kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken, raindel

Currently do_wp_page contains 265 code lines. It also contains 9 goto
statements, of which 5 are targeting labels which are not cleanup
related. This makes the function extremely difficult to
understand. The following patches are an attempt at breaking the
function to its basic components, and making it easier to understand.

The patches are straight forward function extractions from
do_wp_page. As we extract functions, we remove unneeded parameters and
simplify the code as much as possible. However, the functionality is
supposed to remain completely unchanged. The patches also attempt to
document the functionality of each extracted function. In patch 2, we
split the unlock logic to the contain logic relevant to specific needs
of each use case, instead of having huge number of conditional
decisions in a single unlock flow.


Change log:

v0 -> v1:
- Minor renaming of argument in patch 1
- Instead of having a complex unlock function, unlock the needed parts
  in the relevant call sites. Simplify code accordingly.
- Avoid calling wp_page_copy with the ptl held.
- Rename wp_page_shared_vma to wp_page_shared, flip the logic of a
  check there to goto the end of the function if no function, instead
  of having a large conditional block.

v1 -> v2:
- Cosmetical white space changes in patch 4

Many thanks to Kirill for reviewing the patches.

Shachar Raindel (4):
  mm: Refactor do_wp_page, extract the reuse case
  mm: Refactor do_wp_page - rewrite the unlock flow
  mm: refactor do_wp_page, extract the page copy flow
  mm: Refactor do_wp_page handling of shared vma into a function

 mm/memory.c | 393 +++++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 227 insertions(+), 166 deletions(-)

-- 
1.7.11.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/4] mm: Refactor do_wp_page, extract the reuse case
  2014-12-01 20:58 [PATCH v2 0/4] Refactor do_wp_page, no functional change Shachar Raindel
@ 2014-12-01 20:58 ` Shachar Raindel
  2014-12-01 22:23   ` Rik van Riel
  2014-12-01 20:58 ` [PATCH v2 2/4] mm: Refactor do_wp_page - rewrite the unlock flow Shachar Raindel
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Shachar Raindel @ 2014-12-01 20:58 UTC (permalink / raw)
  To: linux-mm
  Cc: kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken, raindel

When do_wp_page is ending, in several cases it needs to reuse the
existing page. This is achieved by making the page table writable,
and possibly updating the page-cache state.

Currently, this logic was "called" by using a goto jump. This makes
following the control flow of the function harder. It is also
against the coding style guidelines for using goto.

As the code can easily be refactored into a specialized function,
refactor it out and simplify the code flow in do_wp_page.

Signed-off-by: Shachar Raindel <raindel@mellanox.com>
---
 mm/memory.c | 135 ++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 77 insertions(+), 58 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 3e50383..6bb5d42 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2020,6 +2020,74 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page,
 }
 
 /*
+ * Handle write page faults for pages that can be reused in the current vma
+ *
+ * This can happen either due to the mapping being with the VM_SHARED flag,
+ * or due to us being the last reference standing to the page. In either
+ * case, all we need to do here is to mark the page as writable and update
+ * any related book-keeping.
+ */
+static int wp_page_reuse(struct mm_struct *mm, struct vm_area_struct *vma,
+			 unsigned long address, pte_t *page_table,
+			 spinlock_t *ptl, pte_t orig_pte,
+			 struct page *page, int dirty_page,
+			 int page_mkwrite)
+	__releases(ptl)
+{
+	pte_t entry;
+	/*
+	 * Clear the pages cpupid information as the existing
+	 * information potentially belongs to a now completely
+	 * unrelated process.
+	 */
+	if (page)
+		page_cpupid_xchg_last(page, (1 << LAST_CPUPID_SHIFT) - 1);
+
+	flush_cache_page(vma, address, pte_pfn(orig_pte));
+	entry = pte_mkyoung(orig_pte);
+	entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+	if (ptep_set_access_flags(vma, address, page_table, entry, 1))
+		update_mmu_cache(vma, address, page_table);
+	pte_unmap_unlock(page_table, ptl);
+
+	if (!dirty_page || !page)
+		return VM_FAULT_WRITE;
+
+	/*
+	 * Yes, Virginia, this is actually required to prevent a race
+	 * with clear_page_dirty_for_io() from clearing the page dirty
+	 * bit after it clear all dirty ptes, but before a racing
+	 * do_wp_page installs a dirty pte.
+	 *
+	 * do_shared_fault is protected similarly.
+	 */
+	if (!page_mkwrite) {
+		wait_on_page_locked(page);
+		set_page_dirty_balance(page);
+		/* file_update_time outside page_lock */
+		if (vma->vm_file)
+			file_update_time(vma->vm_file);
+	}
+	put_page(page);
+	if (page_mkwrite) {
+		struct address_space *mapping = page->mapping;
+
+		set_page_dirty(page);
+		unlock_page(page);
+		page_cache_release(page);
+		if (mapping)	{
+			/*
+			 * Some device drivers do not set page.mapping
+			 * but still dirty their pages
+			 */
+			balance_dirty_pages_ratelimited(mapping);
+		}
+	}
+
+	return VM_FAULT_WRITE;
+}
+
+/*
  * This routine handles present pages, when users try to write
  * to a shared page. It is done by copying the page to a new address
  * and decrementing the shared-page counter for the old page.
@@ -2045,8 +2113,6 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	struct page *old_page, *new_page = NULL;
 	pte_t entry;
 	int ret = 0;
-	int page_mkwrite = 0;
-	struct page *dirty_page = NULL;
 	unsigned long mmun_start = 0;	/* For mmu_notifiers */
 	unsigned long mmun_end = 0;	/* For mmu_notifiers */
 	struct mem_cgroup *memcg;
@@ -2063,7 +2129,8 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		 */
 		if ((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
 				     (VM_WRITE|VM_SHARED))
-			goto reuse;
+			return wp_page_reuse(mm, vma, address, page_table, ptl,
+					     orig_pte, old_page, 0, 0);
 		goto gotten;
 	}
 
@@ -2092,11 +2159,14 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			 */
 			page_move_anon_rmap(old_page, vma, address);
 			unlock_page(old_page);
-			goto reuse;
+			return wp_page_reuse(mm, vma, address, page_table, ptl,
+					     orig_pte, old_page, 0, 0);
 		}
 		unlock_page(old_page);
 	} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
 					(VM_WRITE|VM_SHARED))) {
+		int page_mkwrite = 0;
+
 		/*
 		 * Only catch write-faults on shared writable pages,
 		 * read-only shared pages can get COWed by
@@ -2127,61 +2197,10 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 
 			page_mkwrite = 1;
 		}
-		dirty_page = old_page;
-		get_page(dirty_page);
-
-reuse:
-		/*
-		 * Clear the pages cpupid information as the existing
-		 * information potentially belongs to a now completely
-		 * unrelated process.
-		 */
-		if (old_page)
-			page_cpupid_xchg_last(old_page, (1 << LAST_CPUPID_SHIFT) - 1);
-
-		flush_cache_page(vma, address, pte_pfn(orig_pte));
-		entry = pte_mkyoung(orig_pte);
-		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-		if (ptep_set_access_flags(vma, address, page_table, entry,1))
-			update_mmu_cache(vma, address, page_table);
-		pte_unmap_unlock(page_table, ptl);
-		ret |= VM_FAULT_WRITE;
-
-		if (!dirty_page)
-			return ret;
-
-		/*
-		 * Yes, Virginia, this is actually required to prevent a race
-		 * with clear_page_dirty_for_io() from clearing the page dirty
-		 * bit after it clear all dirty ptes, but before a racing
-		 * do_wp_page installs a dirty pte.
-		 *
-		 * do_shared_fault is protected similarly.
-		 */
-		if (!page_mkwrite) {
-			wait_on_page_locked(dirty_page);
-			set_page_dirty_balance(dirty_page);
-			/* file_update_time outside page_lock */
-			if (vma->vm_file)
-				file_update_time(vma->vm_file);
-		}
-		put_page(dirty_page);
-		if (page_mkwrite) {
-			struct address_space *mapping = dirty_page->mapping;
-
-			set_page_dirty(dirty_page);
-			unlock_page(dirty_page);
-			page_cache_release(dirty_page);
-			if (mapping)	{
-				/*
-				 * Some device drivers do not set page.mapping
-				 * but still dirty their pages
-				 */
-				balance_dirty_pages_ratelimited(mapping);
-			}
-		}
+		get_page(old_page);
 
-		return ret;
+		return wp_page_reuse(mm, vma, address, page_table, ptl,
+				     orig_pte, old_page, 1, page_mkwrite);
 	}
 
 	/*
-- 
1.7.11.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/4] mm: Refactor do_wp_page - rewrite the unlock flow
  2014-12-01 20:58 [PATCH v2 0/4] Refactor do_wp_page, no functional change Shachar Raindel
  2014-12-01 20:58 ` [PATCH v2 1/4] mm: Refactor do_wp_page, extract the reuse case Shachar Raindel
@ 2014-12-01 20:58 ` Shachar Raindel
  2014-12-01 22:46   ` Rik van Riel
  2014-12-01 20:58 ` [PATCH v2 3/4] mm: refactor do_wp_page, extract the page copy flow Shachar Raindel
  2014-12-01 20:58 ` [PATCH v2 4/4] mm: Refactor do_wp_page handling of shared vma into a function Shachar Raindel
  3 siblings, 1 reply; 11+ messages in thread
From: Shachar Raindel @ 2014-12-01 20:58 UTC (permalink / raw)
  To: linux-mm
  Cc: kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken, raindel

When do_wp_page is ending, in several cases it needs to unlock the
pages and ptls it was accessing.

Currently, this logic was "called" by using a goto jump. This makes
following the control flow of the function harder. Readability was
further hampered by the unlock case containing large amount of logic
needed only in one of the 3 cases.

Using goto for cleanup is generally allowed. However, moving the
trivial unlocking flows to the relevant call sites allow deeper
refactoring in the next patch.

Signed-off-by: Shachar Raindel <raindel@mellanox.com>
---
 mm/memory.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 6bb5d42..b42bec0 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2112,7 +2112,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 {
 	struct page *old_page, *new_page = NULL;
 	pte_t entry;
-	int ret = 0;
+	int page_copied = 0;
 	unsigned long mmun_start = 0;	/* For mmu_notifiers */
 	unsigned long mmun_end = 0;	/* For mmu_notifiers */
 	struct mem_cgroup *memcg;
@@ -2147,7 +2147,9 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 							 &ptl);
 			if (!pte_same(*page_table, orig_pte)) {
 				unlock_page(old_page);
-				goto unlock;
+				pte_unmap_unlock(page_table, ptl);
+				page_cache_release(old_page);
+				return 0;
 			}
 			page_cache_release(old_page);
 		}
@@ -2192,7 +2194,9 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 							 &ptl);
 			if (!pte_same(*page_table, orig_pte)) {
 				unlock_page(old_page);
-				goto unlock;
+				pte_unmap_unlock(page_table, ptl);
+				page_cache_release(old_page);
+				return 0;
 			}
 
 			page_mkwrite = 1;
@@ -2292,29 +2296,28 @@ gotten:
 
 		/* Free the old page.. */
 		new_page = old_page;
-		ret |= VM_FAULT_WRITE;
+		page_copied = 1;
 	} else
 		mem_cgroup_cancel_charge(new_page, memcg);
 
 	if (new_page)
 		page_cache_release(new_page);
-unlock:
+
 	pte_unmap_unlock(page_table, ptl);
-	if (mmun_end > mmun_start)
-		mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
 	if (old_page) {
 		/*
 		 * Don't let another task, with possibly unlocked vma,
 		 * keep the mlocked page.
 		 */
-		if ((ret & VM_FAULT_WRITE) && (vma->vm_flags & VM_LOCKED)) {
+		if (page_copied && (vma->vm_flags & VM_LOCKED)) {
 			lock_page(old_page);	/* LRU manipulation */
 			munlock_vma_page(old_page);
 			unlock_page(old_page);
 		}
 		page_cache_release(old_page);
 	}
-	return ret;
+	return page_copied ? VM_FAULT_WRITE : 0;
 oom_free_new:
 	page_cache_release(new_page);
 oom:
-- 
1.7.11.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/4] mm: refactor do_wp_page, extract the page copy flow
  2014-12-01 20:58 [PATCH v2 0/4] Refactor do_wp_page, no functional change Shachar Raindel
  2014-12-01 20:58 ` [PATCH v2 1/4] mm: Refactor do_wp_page, extract the reuse case Shachar Raindel
  2014-12-01 20:58 ` [PATCH v2 2/4] mm: Refactor do_wp_page - rewrite the unlock flow Shachar Raindel
@ 2014-12-01 20:58 ` Shachar Raindel
  2014-12-02  2:53   ` Rik van Riel
  2014-12-01 20:58 ` [PATCH v2 4/4] mm: Refactor do_wp_page handling of shared vma into a function Shachar Raindel
  3 siblings, 1 reply; 11+ messages in thread
From: Shachar Raindel @ 2014-12-01 20:58 UTC (permalink / raw)
  To: linux-mm
  Cc: kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken, raindel

In some cases, do_wp_page had to copy the page suffering a write fault
to a new location. If the function logic decided that to do this, it
was done by jumping with a "goto" operation to the relevant code
block. This made the code really hard to understand. It is also
against the kernel coding style guidelines.

This patch extracts the page copy and page table update logic to a
separate function. It also clean up the naming, from "gotten" to
"wp_page_copy", and adds few comments.

Signed-off-by: Shachar Raindel <raindel@mellanox.com>
---
 mm/memory.c | 265 +++++++++++++++++++++++++++++++++---------------------------
 1 file changed, 147 insertions(+), 118 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index b42bec0..c7c0df2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2088,6 +2088,146 @@ static int wp_page_reuse(struct mm_struct *mm, struct vm_area_struct *vma,
 }
 
 /*
+ * Handle the case of a page which we actually need to copy to a new page.
+ *
+ * Called with mmap_sem locked and the old page referenced, but
+ * without the ptl held.
+ *
+ * High level logic flow:
+ *
+ * - Allocate a page, copy the content of the old page to the new one.
+ * - Handle book keeping and accounting - cgroups, mmu-notifiers, etc.
+ * - Take the PTL. If the pte changed, bail out and release the allocated page
+ * - If the pte is still the way we remember it, update the page table and all
+ *   relevant references. This includes dropping the reference the page-table
+ *   held to the old page, as well as updating the rmap.
+ * - In any case, unlock the PTL and drop the reference we took to the old page.
+ */
+static int wp_page_copy(struct mm_struct *mm, struct vm_area_struct *vma,
+			unsigned long address, pte_t *page_table, pmd_t *pmd,
+			pte_t orig_pte, struct page *old_page)
+{
+	struct page *new_page = NULL;
+	spinlock_t *ptl = NULL;
+	pte_t entry;
+	int page_copied = 0;
+	const unsigned long mmun_start = address & PAGE_MASK;	/* For mmu_notifiers */
+	const unsigned long mmun_end = mmun_start + PAGE_SIZE;	/* For mmu_notifiers */
+	struct mem_cgroup *memcg;
+
+	if (unlikely(anon_vma_prepare(vma)))
+		goto oom;
+
+	if (is_zero_pfn(pte_pfn(orig_pte))) {
+		new_page = alloc_zeroed_user_highpage_movable(vma, address);
+		if (!new_page)
+			goto oom;
+	} else {
+		new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
+		if (!new_page)
+			goto oom;
+		cow_user_page(new_page, old_page, address, vma);
+	}
+	__SetPageUptodate(new_page);
+
+	if (mem_cgroup_try_charge(new_page, mm, GFP_KERNEL, &memcg))
+		goto oom_free_new;
+
+	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
+
+	/*
+	 * Re-check the pte - we dropped the lock
+	 */
+	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
+	if (likely(pte_same(*page_table, orig_pte))) {
+		if (old_page) {
+			if (!PageAnon(old_page)) {
+				dec_mm_counter_fast(mm, MM_FILEPAGES);
+				inc_mm_counter_fast(mm, MM_ANONPAGES);
+			}
+		} else {
+			inc_mm_counter_fast(mm, MM_ANONPAGES);
+		}
+		flush_cache_page(vma, address, pte_pfn(orig_pte));
+		entry = mk_pte(new_page, vma->vm_page_prot);
+		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+		/*
+		 * Clear the pte entry and flush it first, before updating the
+		 * pte with the new entry. This will avoid a race condition
+		 * seen in the presence of one thread doing SMC and another
+		 * thread doing COW.
+		 */
+		ptep_clear_flush(vma, address, page_table);
+		page_add_new_anon_rmap(new_page, vma, address);
+		mem_cgroup_commit_charge(new_page, memcg, false);
+		lru_cache_add_active_or_unevictable(new_page, vma);
+		/*
+		 * We call the notify macro here because, when using secondary
+		 * mmu page tables (such as kvm shadow page tables), we want the
+		 * new page to be mapped directly into the secondary page table.
+		 */
+		set_pte_at_notify(mm, address, page_table, entry);
+		update_mmu_cache(vma, address, page_table);
+		if (old_page) {
+			/*
+			 * Only after switching the pte to the new page may
+			 * we remove the mapcount here. Otherwise another
+			 * process may come and find the rmap count decremented
+			 * before the pte is switched to the new page, and
+			 * "reuse" the old page writing into it while our pte
+			 * here still points into it and can be read by other
+			 * threads.
+			 *
+			 * The critical issue is to order this
+			 * page_remove_rmap with the ptp_clear_flush above.
+			 * Those stores are ordered by (if nothing else,)
+			 * the barrier present in the atomic_add_negative
+			 * in page_remove_rmap.
+			 *
+			 * Then the TLB flush in ptep_clear_flush ensures that
+			 * no process can access the old page before the
+			 * decremented mapcount is visible. And the old page
+			 * cannot be reused until after the decremented
+			 * mapcount is visible. So transitively, TLBs to
+			 * old page will be flushed before it can be reused.
+			 */
+			page_remove_rmap(old_page);
+		}
+
+		/* Free the old page.. */
+		new_page = old_page;
+		page_copied = 1;
+	} else {
+		mem_cgroup_cancel_charge(new_page, memcg);
+	}
+
+	if (new_page)
+		page_cache_release(new_page);
+
+	pte_unmap_unlock(page_table, ptl);
+	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+	if (old_page) {
+		/*
+		 * Don't let another task, with possibly unlocked vma,
+		 * keep the mlocked page.
+		 */
+		if (page_copied && (vma->vm_flags & VM_LOCKED)) {
+			lock_page(old_page);	/* LRU manipulation */
+			munlock_vma_page(old_page);
+			unlock_page(old_page);
+		}
+		page_cache_release(old_page);
+	}
+	return page_copied ? VM_FAULT_WRITE : 0;
+oom_free_new:
+	page_cache_release(new_page);
+oom:
+	if (old_page)
+		page_cache_release(old_page);
+	return VM_FAULT_OOM;
+}
+
+/*
  * This routine handles present pages, when users try to write
  * to a shared page. It is done by copying the page to a new address
  * and decrementing the shared-page counter for the old page.
@@ -2110,12 +2250,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		spinlock_t *ptl, pte_t orig_pte)
 	__releases(ptl)
 {
-	struct page *old_page, *new_page = NULL;
-	pte_t entry;
-	int page_copied = 0;
-	unsigned long mmun_start = 0;	/* For mmu_notifiers */
-	unsigned long mmun_end = 0;	/* For mmu_notifiers */
-	struct mem_cgroup *memcg;
+	struct page *old_page;
 
 	old_page = vm_normal_page(vma, address, orig_pte);
 	if (!old_page) {
@@ -2131,7 +2266,10 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 				     (VM_WRITE|VM_SHARED))
 			return wp_page_reuse(mm, vma, address, page_table, ptl,
 					     orig_pte, old_page, 0, 0);
-		goto gotten;
+
+		pte_unmap_unlock(page_table, ptl);
+		return wp_page_copy(mm, vma, address, page_table, pmd,
+				    orig_pte, old_page);
 	}
 
 	/*
@@ -2211,119 +2349,10 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * Ok, we need to copy. Oh, well..
 	 */
 	page_cache_get(old_page);
-gotten:
-	pte_unmap_unlock(page_table, ptl);
-
-	if (unlikely(anon_vma_prepare(vma)))
-		goto oom;
-
-	if (is_zero_pfn(pte_pfn(orig_pte))) {
-		new_page = alloc_zeroed_user_highpage_movable(vma, address);
-		if (!new_page)
-			goto oom;
-	} else {
-		new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
-		if (!new_page)
-			goto oom;
-		cow_user_page(new_page, old_page, address, vma);
-	}
-	__SetPageUptodate(new_page);
-
-	if (mem_cgroup_try_charge(new_page, mm, GFP_KERNEL, &memcg))
-		goto oom_free_new;
-
-	mmun_start  = address & PAGE_MASK;
-	mmun_end    = mmun_start + PAGE_SIZE;
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
-
-	/*
-	 * Re-check the pte - we dropped the lock
-	 */
-	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
-	if (likely(pte_same(*page_table, orig_pte))) {
-		if (old_page) {
-			if (!PageAnon(old_page)) {
-				dec_mm_counter_fast(mm, MM_FILEPAGES);
-				inc_mm_counter_fast(mm, MM_ANONPAGES);
-			}
-		} else
-			inc_mm_counter_fast(mm, MM_ANONPAGES);
-		flush_cache_page(vma, address, pte_pfn(orig_pte));
-		entry = mk_pte(new_page, vma->vm_page_prot);
-		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-		/*
-		 * Clear the pte entry and flush it first, before updating the
-		 * pte with the new entry. This will avoid a race condition
-		 * seen in the presence of one thread doing SMC and another
-		 * thread doing COW.
-		 */
-		ptep_clear_flush(vma, address, page_table);
-		page_add_new_anon_rmap(new_page, vma, address);
-		mem_cgroup_commit_charge(new_page, memcg, false);
-		lru_cache_add_active_or_unevictable(new_page, vma);
-		/*
-		 * We call the notify macro here because, when using secondary
-		 * mmu page tables (such as kvm shadow page tables), we want the
-		 * new page to be mapped directly into the secondary page table.
-		 */
-		set_pte_at_notify(mm, address, page_table, entry);
-		update_mmu_cache(vma, address, page_table);
-		if (old_page) {
-			/*
-			 * Only after switching the pte to the new page may
-			 * we remove the mapcount here. Otherwise another
-			 * process may come and find the rmap count decremented
-			 * before the pte is switched to the new page, and
-			 * "reuse" the old page writing into it while our pte
-			 * here still points into it and can be read by other
-			 * threads.
-			 *
-			 * The critical issue is to order this
-			 * page_remove_rmap with the ptp_clear_flush above.
-			 * Those stores are ordered by (if nothing else,)
-			 * the barrier present in the atomic_add_negative
-			 * in page_remove_rmap.
-			 *
-			 * Then the TLB flush in ptep_clear_flush ensures that
-			 * no process can access the old page before the
-			 * decremented mapcount is visible. And the old page
-			 * cannot be reused until after the decremented
-			 * mapcount is visible. So transitively, TLBs to
-			 * old page will be flushed before it can be reused.
-			 */
-			page_remove_rmap(old_page);
-		}
-
-		/* Free the old page.. */
-		new_page = old_page;
-		page_copied = 1;
-	} else
-		mem_cgroup_cancel_charge(new_page, memcg);
-
-	if (new_page)
-		page_cache_release(new_page);
 
 	pte_unmap_unlock(page_table, ptl);
-	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
-	if (old_page) {
-		/*
-		 * Don't let another task, with possibly unlocked vma,
-		 * keep the mlocked page.
-		 */
-		if (page_copied && (vma->vm_flags & VM_LOCKED)) {
-			lock_page(old_page);	/* LRU manipulation */
-			munlock_vma_page(old_page);
-			unlock_page(old_page);
-		}
-		page_cache_release(old_page);
-	}
-	return page_copied ? VM_FAULT_WRITE : 0;
-oom_free_new:
-	page_cache_release(new_page);
-oom:
-	if (old_page)
-		page_cache_release(old_page);
-	return VM_FAULT_OOM;
+	return wp_page_copy(mm, vma, address, page_table, pmd,
+			    orig_pte, old_page);
 }
 
 static void unmap_mapping_range_vma(struct vm_area_struct *vma,
-- 
1.7.11.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 4/4] mm: Refactor do_wp_page handling of shared vma into a function
  2014-12-01 20:58 [PATCH v2 0/4] Refactor do_wp_page, no functional change Shachar Raindel
                   ` (2 preceding siblings ...)
  2014-12-01 20:58 ` [PATCH v2 3/4] mm: refactor do_wp_page, extract the page copy flow Shachar Raindel
@ 2014-12-01 20:58 ` Shachar Raindel
  2014-12-02  2:57   ` Rik van Riel
  3 siblings, 1 reply; 11+ messages in thread
From: Shachar Raindel @ 2014-12-01 20:58 UTC (permalink / raw)
  To: linux-mm
  Cc: kirill.shutemov, mgorman, riel, ak, matthew.r.wilcox,
	dave.hansen, n-horiguchi, akpm, torvalds, haggaie, aarcange,
	pfeiner, hannes, sagig, walken, raindel

The do_wp_page function is extremely long. Extract the logic for
handling a page belonging to a shared vma into a function of its own.

This helps the readability of the code, without doing any functional
change in it.

Signed-off-by: Shachar Raindel <raindel@mellanox.com>
---
 mm/memory.c | 86 ++++++++++++++++++++++++++++++++++---------------------------
 1 file changed, 48 insertions(+), 38 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index c7c0df2..e730628 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2227,6 +2227,52 @@ oom:
 	return VM_FAULT_OOM;
 }
 
+static int wp_page_shared(struct mm_struct *mm, struct vm_area_struct *vma,
+			  unsigned long address, pte_t *page_table,
+			  pmd_t *pmd, spinlock_t *ptl, pte_t orig_pte,
+			  struct page *old_page)
+	__releases(ptl)
+{
+	int page_mkwrite = 0;
+	int ret;
+
+	/*
+	 * Only catch write-faults on shared writable pages, read-only shared
+	 * pages can get COWed by get_user_pages(.write=1, .force=1).
+	 */
+	if (!vma->vm_ops || !vma->vm_ops->page_mkwrite)
+		goto no_mkwrite;
+
+	page_cache_get(old_page);
+	pte_unmap_unlock(page_table, ptl);
+	ret = do_page_mkwrite(vma, old_page, address);
+	if (unlikely(!ret || (ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
+		page_cache_release(old_page);
+		return ret;
+	}
+	/*
+	 * Since we dropped the lock we need to revalidate
+	 * the PTE as someone else may have changed it.  If
+	 * they did, we just return, as we can count on the
+	 * MMU to tell us if they didn't also make it writable.
+	 */
+	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
+	if (!pte_same(*page_table, orig_pte)) {
+		unlock_page(old_page);
+		pte_unmap_unlock(page_table, ptl);
+		page_cache_release(old_page);
+		return 0;
+	}
+
+	page_mkwrite = 1;
+
+no_mkwrite:
+	get_page(old_page);
+
+	return wp_page_reuse(mm, vma, address, page_table, ptl, orig_pte,
+			     old_page, 1, page_mkwrite);
+}
+
 /*
  * This routine handles present pages, when users try to write
  * to a shared page. It is done by copying the page to a new address
@@ -2305,44 +2351,8 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		unlock_page(old_page);
 	} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
 					(VM_WRITE|VM_SHARED))) {
-		int page_mkwrite = 0;
-
-		/*
-		 * Only catch write-faults on shared writable pages,
-		 * read-only shared pages can get COWed by
-		 * get_user_pages(.write=1, .force=1).
-		 */
-		if (vma->vm_ops && vma->vm_ops->page_mkwrite) {
-			int tmp;
-			page_cache_get(old_page);
-			pte_unmap_unlock(page_table, ptl);
-			tmp = do_page_mkwrite(vma, old_page, address);
-			if (unlikely(!tmp || (tmp &
-					(VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
-				page_cache_release(old_page);
-				return tmp;
-			}
-			/*
-			 * Since we dropped the lock we need to revalidate
-			 * the PTE as someone else may have changed it.  If
-			 * they did, we just return, as we can count on the
-			 * MMU to tell us if they didn't also make it writable.
-			 */
-			page_table = pte_offset_map_lock(mm, pmd, address,
-							 &ptl);
-			if (!pte_same(*page_table, orig_pte)) {
-				unlock_page(old_page);
-				pte_unmap_unlock(page_table, ptl);
-				page_cache_release(old_page);
-				return 0;
-			}
-
-			page_mkwrite = 1;
-		}
-		get_page(old_page);
-
-		return wp_page_reuse(mm, vma, address, page_table, ptl,
-				     orig_pte, old_page, 1, page_mkwrite);
+		return wp_page_shared(mm, vma, address, page_table, pmd,
+				      ptl, orig_pte, old_page);
 	}
 
 	/*
-- 
1.7.11.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/4] mm: Refactor do_wp_page, extract the reuse case
  2014-12-01 20:58 ` [PATCH v2 1/4] mm: Refactor do_wp_page, extract the reuse case Shachar Raindel
@ 2014-12-01 22:23   ` Rik van Riel
  0 siblings, 0 replies; 11+ messages in thread
From: Rik van Riel @ 2014-12-01 22:23 UTC (permalink / raw)
  To: Shachar Raindel, linux-mm
  Cc: kirill.shutemov, mgorman, ak, matthew.r.wilcox, dave.hansen,
	n-horiguchi, akpm, torvalds, haggaie, aarcange, pfeiner, hannes,
	sagig, walken

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/01/2014 03:58 PM, Shachar Raindel wrote:
> When do_wp_page is ending, in several cases it needs to reuse the 
> existing page. This is achieved by making the page table writable, 
> and possibly updating the page-cache state.
> 
> Currently, this logic was "called" by using a goto jump. This
> makes following the control flow of the function harder. It is
> also against the coding style guidelines for using goto.
> 
> As the code can easily be refactored into a specialized function, 
> refactor it out and simplify the code flow in do_wp_page.
> 
> Signed-off-by: Shachar Raindel <raindel@mellanox.com>

Acked-by: Rik van Riel <riel@redhat.com>


- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUfOpnAAoJEM553pKExN6DOB4H/iMUoIehI8cC+RF9mrwBlTg2
JUO7aH/Bgkgc/jSTWrFaBwjNPrrWuRQIKnQl/G48W9oLcj8njutH8lk6C8i337tK
PVhwmPZ+0cIZcSWJ7TCL2kFkovb4vA4pDnKGW78+QppjvoMRZRpuia4NVtRnmIp5
wuD7vKztZtfo4G10gnc09KfBYFZkWGn4NwJ+2cRei74K95anX0uXhq8cIOXf2fSJ
APDm+oZX8C/jdW7083k9yHPE46Ite2kZC9C6vLzv4kbHdH3D9lnT3mYkLUtyMeWh
a4pEJNnwAxkTdE5ghRUK+aqhMmq3k3VmAUo9QXcy7gNBZ7s02elruqIH6GjjE28=
=rVzj
-----END PGP SIGNATURE-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/4] mm: Refactor do_wp_page - rewrite the unlock flow
  2014-12-01 20:58 ` [PATCH v2 2/4] mm: Refactor do_wp_page - rewrite the unlock flow Shachar Raindel
@ 2014-12-01 22:46   ` Rik van Riel
  0 siblings, 0 replies; 11+ messages in thread
From: Rik van Riel @ 2014-12-01 22:46 UTC (permalink / raw)
  To: Shachar Raindel, linux-mm
  Cc: kirill.shutemov, mgorman, ak, matthew.r.wilcox, dave.hansen,
	n-horiguchi, akpm, torvalds, haggaie, aarcange, pfeiner, hannes,
	sagig, walken

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/01/2014 03:58 PM, Shachar Raindel wrote:
> When do_wp_page is ending, in several cases it needs to unlock the 
> pages and ptls it was accessing.
> 
> Currently, this logic was "called" by using a goto jump. This
> makes following the control flow of the function harder.
> Readability was further hampered by the unlock case containing
> large amount of logic needed only in one of the 3 cases.
> 
> Using goto for cleanup is generally allowed. However, moving the 
> trivial unlocking flows to the relevant call sites allow deeper 
> refactoring in the next patch.
> 
> Signed-off-by: Shachar Raindel <raindel@mellanox.com>

Acked-by: Rik van Riel <riel@redhat.com>

- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUfO+qAAoJEM553pKExN6D4zMIAJCXpbwTi/aPFnes03x5/VVY
NRxhhWUessVxK4gM0jwG8JU/MKisrZ1bNbL997yd8Vv8H6UScoLvJNjfUYYpvsy1
WdmBZJzUmq5QH3pemNnEooz50cPWxVzcHhtMXFf+3UQ0NG/5MqIaUGNN+tjs7+rU
ynW4oCB1jHbIRCLlPhvydW5lc1Z5+7h9I2wkHfN+A9p7JF1wExH6jc8Qc5mJcPy2
xmNJViTep7C43JC8KYXqOnS6FtW10vPBC/hMs0/6DTasaox5ztD+qoEtotIpne1U
OaDWyZkUxLyyYl1BRaujxQzEwaS/Z3cHH30WOuKiJhwnWOX8PfjykQ/tBzKIAe0=
=d25J
-----END PGP SIGNATURE-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/4] mm: refactor do_wp_page, extract the page copy flow
  2014-12-01 20:58 ` [PATCH v2 3/4] mm: refactor do_wp_page, extract the page copy flow Shachar Raindel
@ 2014-12-02  2:53   ` Rik van Riel
  2014-12-02  9:09     ` Shachar Raindel
  0 siblings, 1 reply; 11+ messages in thread
From: Rik van Riel @ 2014-12-02  2:53 UTC (permalink / raw)
  To: Shachar Raindel, linux-mm
  Cc: kirill.shutemov, mgorman, ak, matthew.r.wilcox, dave.hansen,
	n-horiguchi, akpm, torvalds, haggaie, aarcange, pfeiner, hannes,
	sagig, walken

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/01/2014 03:58 PM, Shachar Raindel wrote:
> In some cases, do_wp_page had to copy the page suffering a write
> fault to a new location. If the function logic decided that to do
> this, it was done by jumping with a "goto" operation to the
> relevant code block. This made the code really hard to understand.
> It is also against the kernel coding style guidelines.
> 
> This patch extracts the page copy and page table update logic to a 
> separate function. It also clean up the naming, from "gotten" to 
> "wp_page_copy", and adds few comments.
> 
> Signed-off-by: Shachar Raindel <raindel@mellanox.com> --- 
> mm/memory.c | 265
> +++++++++++++++++++++++++++++++++--------------------------- 1 file
> changed, 147 insertions(+), 118 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c index b42bec0..c7c0df2
> 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2088,6 +2088,146 @@
> static int wp_page_reuse(struct mm_struct *mm, struct
> vm_area_struct *vma, }
> 
> /* + * Handle the case of a page which we actually need to copy to
> a new page. + * + * Called with mmap_sem locked and the old page
> referenced, but + * without the ptl held. + * + * High level logic
> flow: + * + * - Allocate a page, copy the content of the old page
> to the new one. + * - Handle book keeping and accounting - cgroups,
> mmu-notifiers, etc. + * - Take the PTL. If the pte changed, bail
> out and release the allocated page + * - If the pte is still the
> way we remember it, update the page table and all + *   relevant
> references. This includes dropping the reference the page-table + *
> held to the old page, as well as updating the rmap. + * - In any
> case, unlock the PTL and drop the reference we took to the old
> page. + */ +static int wp_page_copy(struct mm_struct *mm, struct
> vm_area_struct *vma, +			unsigned long address, pte_t *page_table,
> pmd_t *pmd, +			pte_t orig_pte, struct page *old_page) +{ +	struct
> page *new_page = NULL; +	spinlock_t *ptl = NULL; +	pte_t entry; +
> int page_copied = 0; +	const unsigned long mmun_start = address &
> PAGE_MASK;	/* For mmu_notifiers */ +	const unsigned long mmun_end =
> mmun_start + PAGE_SIZE;	/* For mmu_notifiers */ +	struct mem_cgroup
> *memcg; + +	if (unlikely(anon_vma_prepare(vma))) +		goto oom; + +
> if (is_zero_pfn(pte_pfn(orig_pte))) { +		new_page =
> alloc_zeroed_user_highpage_movable(vma, address); +		if
> (!new_page) +			goto oom; +	} else { +		new_page =
> alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address); +		if
> (!new_page) +			goto oom; +		cow_user_page(new_page, old_page,
> address, vma); +	} +	__SetPageUptodate(new_page); + +	if
> (mem_cgroup_try_charge(new_page, mm, GFP_KERNEL, &memcg)) +		goto
> oom_free_new; + +	mmu_notifier_invalidate_range_start(mm,
> mmun_start, mmun_end);

I believe the mmu_notifier_invalidate_range_start & _end
functions can be moved inside the pte_same(*page_table, orig_pte)
branch. There is no reason to call those functions if we do not
modify the page table entry.

This is not something introduced by your patch, but you might as
well fix it while you're touching the code :)

Other than that:

Acked-by: Rik van Riel <riel@redhat.com>

- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUfSmjAAoJEM553pKExN6DhhIIAI72W6J2jKD9EXulDTF2TXXW
AyiKUvJHg8GiCRExvurHkUwZ+y9WdzrEEjy8ZKZvh76uhvZZpyytRTYysiFTc4Hs
du5qsdxbn/FejukO9hygPGoQnwL7aFG6S6B48syaolR5xpwLXHgI54+5GJNurmY9
mqcfitfojqbQK39d18GvwHl4HkJ4T/Cfg/mf5oRSwlsf9Yc8gcrKGlfrdoHjFAWH
oHXFdVQVw98Khlkpw6cmw/ga9TgTWGipZxQyx2SRVAkq52XhPNivPov+agNWH8Fh
79YbqTqetWYkMdiJXlnnk3V/7bi3fGmSxoA8KM/miheKiDY8ECr0E9qhd3VOegI=
=yuS6
-----END PGP SIGNATURE-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 4/4] mm: Refactor do_wp_page handling of shared vma into a function
  2014-12-01 20:58 ` [PATCH v2 4/4] mm: Refactor do_wp_page handling of shared vma into a function Shachar Raindel
@ 2014-12-02  2:57   ` Rik van Riel
  0 siblings, 0 replies; 11+ messages in thread
From: Rik van Riel @ 2014-12-02  2:57 UTC (permalink / raw)
  To: Shachar Raindel, linux-mm
  Cc: kirill.shutemov, mgorman, ak, matthew.r.wilcox, dave.hansen,
	n-horiguchi, akpm, torvalds, haggaie, aarcange, pfeiner, hannes,
	sagig, walken

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/01/2014 03:58 PM, Shachar Raindel wrote:
> The do_wp_page function is extremely long. Extract the logic for 
> handling a page belonging to a shared vma into a function of its
> own.
> 
> This helps the readability of the code, without doing any
> functional change in it.
> 
> Signed-off-by: Shachar Raindel <raindel@mellanox.com>

Acked-by: Rik van Riel <riel@redhat.com>

- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUfSq0AAoJEM553pKExN6Dww4H+wX6I6UOeQoOtDZqxrpTyvSL
nDXpJ5ltj8eHOdkZKiq5tEzfQygXCAKKqbkfzD0Csqoj96HZFJ7V+uHcxtg8g6+g
0HUpj91XEkVqenBNEuJnlTrbs3XxxUn1fHecm+jD5konfVPaexSNONINsgvArZPd
a0YLMur9PXtuCRhEppfVwRB160BxpkIm4iTnnyF/oaZAz1S/pkiKrb1qhxOanrEP
o2Zry1f4cALiD/yT6+tQCs78pTt23BP0ig7qNQLDiriX2tFwCI37LnYhJzsawF5t
I3B2MD28GdwXYq+RBE3iA+FXdB7cxQIrCHh9OuH+yqm4coPFwJOYET9nCrBruns=
=JXx3
-----END PGP SIGNATURE-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH v2 3/4] mm: refactor do_wp_page, extract the page copy flow
  2014-12-02  2:53   ` Rik van Riel
@ 2014-12-02  9:09     ` Shachar Raindel
  2014-12-02 14:56       ` Rik van Riel
  0 siblings, 1 reply; 11+ messages in thread
From: Shachar Raindel @ 2014-12-02  9:09 UTC (permalink / raw)
  To: Rik van Riel, linux-mm
  Cc: kirill.shutemov, mgorman, ak, matthew.r.wilcox, dave.hansen,
	n-horiguchi, akpm, torvalds, Haggai Eran, aarcange, pfeiner,
	hannes, Sagi Grimberg, walken, Jerome Glisse

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 5085 bytes --]



> -----Original Message-----
> From: owner-linux-mm@kvack.org [mailto:owner-linux-mm@kvack.org] On
> Behalf Of Rik van Riel
> Sent: Tuesday, December 02, 2014 4:53 AM
> To: Shachar Raindel; linux-mm@kvack.org
> Cc: kirill.shutemov@linux.intel.com; mgorman@suse.de;
> ak@linux.intel.com; matthew.r.wilcox@intel.com;
> dave.hansen@linux.intel.com; n-horiguchi@ah.jp.nec.com; akpm@linux-
> foundation.org; torvalds@linux-foundation.org; Haggai Eran;
> aarcange@redhat.com; pfeiner@google.com; hannes@cmpxchg.org; Sagi
> Grimberg; walken@google.com
> Subject: Re: [PATCH v2 3/4] mm: refactor do_wp_page, extract the page
> copy flow
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 12/01/2014 03:58 PM, Shachar Raindel wrote:
> > In some cases, do_wp_page had to copy the page suffering a write
> > fault to a new location. If the function logic decided that to do
> > this, it was done by jumping with a "goto" operation to the
> > relevant code block. This made the code really hard to understand.
> > It is also against the kernel coding style guidelines.
> >
> > This patch extracts the page copy and page table update logic to a
> > separate function. It also clean up the naming, from "gotten" to
> > "wp_page_copy", and adds few comments.
> >
> > Signed-off-by: Shachar Raindel <raindel@mellanox.com> ---
> > mm/memory.c | 265
> > +++++++++++++++++++++++++++++++++--------------------------- 1 file
> > changed, 147 insertions(+), 118 deletions(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c index b42bec0..c7c0df2
> > 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2088,6 +2088,146 @@
> > static int wp_page_reuse(struct mm_struct *mm, struct
> > vm_area_struct *vma, }
> >
> > /* + * Handle the case of a page which we actually need to copy to
> > a new page. + * + * Called with mmap_sem locked and the old page
> > referenced, but + * without the ptl held. + * + * High level logic
> > flow: + * + * - Allocate a page, copy the content of the old page
> > to the new one. + * - Handle book keeping and accounting - cgroups,
> > mmu-notifiers, etc. + * - Take the PTL. If the pte changed, bail
> > out and release the allocated page + * - If the pte is still the
> > way we remember it, update the page table and all + *   relevant
> > references. This includes dropping the reference the page-table + *
> > held to the old page, as well as updating the rmap. + * - In any
> > case, unlock the PTL and drop the reference we took to the old
> > page. + */ +static int wp_page_copy(struct mm_struct *mm, struct
> > vm_area_struct *vma, +			unsigned long address, pte_t
> *page_table,
> > pmd_t *pmd, +			pte_t orig_pte, struct page *old_page) +{ +
> 	struct
> > page *new_page = NULL; +	spinlock_t *ptl = NULL; +	pte_t entry; +
> > int page_copied = 0; +	const unsigned long mmun_start = address &
> > PAGE_MASK;	/* For mmu_notifiers */ +	const unsigned long mmun_end
> =
> > mmun_start + PAGE_SIZE;	/* For mmu_notifiers */ +	struct mem_cgroup
> > *memcg; + +	if (unlikely(anon_vma_prepare(vma))) +		goto
> oom; + +
> > if (is_zero_pfn(pte_pfn(orig_pte))) { +		new_page =
> > alloc_zeroed_user_highpage_movable(vma, address); +		if
> > (!new_page) +			goto oom; +	} else { +		new_page =
> > alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address); +		if
> > (!new_page) +			goto oom; +		cow_user_page(new_page,
> old_page,
> > address, vma); +	} +	__SetPageUptodate(new_page); + +	if
> > (mem_cgroup_try_charge(new_page, mm, GFP_KERNEL, &memcg)) +
> 	goto
> > oom_free_new; + +	mmu_notifier_invalidate_range_start(mm,
> > mmun_start, mmun_end);
> 
> I believe the mmu_notifier_invalidate_range_start & _end
> functions can be moved inside the pte_same(*page_table, orig_pte)
> branch. There is no reason to call those functions if we do not
> modify the page table entry.
> 

There is a critical reason for this - moving the MMU notifiers there will
make them unsleepable. This will prevent hardware devices that keep secondary
PTEs from waiting for an interrupt to signal that the invalidation was completed. 
This is required for example by the ODP patch set 
(http://www.spinics.net/lists/linux-rdma/msg22044.html ) and
by the HMM patch set (http://comments.gmane.org/gmane.linux.kernel.mm/116584 ).

> This is not something introduced by your patch, but you might as
> well fix it while you're touching the code :)

This happened to be introduced by Haggai Eran 
(http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6bdb913f0a ),
who kindly reviewed this patchset before v0 was sent.

As mentioned above, I would prefer not to break the sleepability of the MMU notifiers in
this patchset which is targeting coding style cleanup. If you want to further discuss the
issue, can we split it to a different thread, not blocking the ack on this patch?

> 
> Other than that:
> 
> Acked-by: Rik van Riel <riel@redhat.com>
> 
Thanks.

--Shachar
N‹§²æìr¸›zǧu©ž²Æ {\b­†éì¹»\x1c®&Þ–)îÆi¢žØ^n‡r¶‰šŽŠÝ¢j$½§$¢¸\x05¢¹¨­è§~Š'.)îÄÃ,yèm¶ŸÿÃ\f%Š{±šj+ƒðèž×¦j)Z†·Ÿ

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/4] mm: refactor do_wp_page, extract the page copy flow
  2014-12-02  9:09     ` Shachar Raindel
@ 2014-12-02 14:56       ` Rik van Riel
  0 siblings, 0 replies; 11+ messages in thread
From: Rik van Riel @ 2014-12-02 14:56 UTC (permalink / raw)
  To: Shachar Raindel, linux-mm
  Cc: kirill.shutemov, mgorman, ak, matthew.r.wilcox, dave.hansen,
	n-horiguchi, akpm, torvalds, Haggai Eran, aarcange, pfeiner,
	hannes, Sagi Grimberg, walken, Jerome Glisse

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/02/2014 04:09 AM, Shachar Raindel wrote:

>> I believe the mmu_notifier_invalidate_range_start & _end 
>> functions can be moved inside the pte_same(*page_table,
>> orig_pte) branch. There is no reason to call those functions if
>> we do not modify the page table entry.
>> 
> 
> There is a critical reason for this - moving the MMU notifiers
> there will make them unsleepable. This will prevent hardware
> devices that keep secondary PTEs from waiting for an interrupt to
> signal that the invalidation was completed. This is required for
> example by the ODP patch set 
> (http://www.spinics.net/lists/linux-rdma/msg22044.html ) and by the
> HMM patch set
> (http://comments.gmane.org/gmane.linux.kernel.mm/116584 ).

Ahhhh, that explains!

Acked-by: Rik van Riel <riel@redhat.com>

- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUfdMnAAoJEM553pKExN6D6HQH/jYio5UBhlPhjp9XjWxwrDHy
7Pcf9nATYQhSN5IuxWp265yHMbFu9CwJefW2DLZLXbynImiy8rkl0HkaaDXEZnM4
ZizjCcxjgNVxD+F+EAsi/bj6kCtxNfmM0YxLCNHjOp835JQzQTbx94Joy1B10ba+
42sTbGBArBVVuDOfHpiUMCj8HFiRT2BNwpfu44eDLAJQiTZIYU5OlXmWnSJQXDDF
c648arGq75fyA8RHRZ/cTf0pztT+Gx5q/2LAxy+MkhiZjX9kXc1e98gWTOO70Qj+
IwP4YAfgScts+uqL2Q+EUVo0nBYAT1amyZft6j3aLQRrDcFhCkITk2VW0CdZHdE=
=IA5v
-----END PGP SIGNATURE-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-12-02 14:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-01 20:58 [PATCH v2 0/4] Refactor do_wp_page, no functional change Shachar Raindel
2014-12-01 20:58 ` [PATCH v2 1/4] mm: Refactor do_wp_page, extract the reuse case Shachar Raindel
2014-12-01 22:23   ` Rik van Riel
2014-12-01 20:58 ` [PATCH v2 2/4] mm: Refactor do_wp_page - rewrite the unlock flow Shachar Raindel
2014-12-01 22:46   ` Rik van Riel
2014-12-01 20:58 ` [PATCH v2 3/4] mm: refactor do_wp_page, extract the page copy flow Shachar Raindel
2014-12-02  2:53   ` Rik van Riel
2014-12-02  9:09     ` Shachar Raindel
2014-12-02 14:56       ` Rik van Riel
2014-12-01 20:58 ` [PATCH v2 4/4] mm: Refactor do_wp_page handling of shared vma into a function Shachar Raindel
2014-12-02  2:57   ` Rik van Riel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.