linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH uprobe, thp v2 0/5] THP aware uprobe
@ 2019-06-04 16:51 Song Liu
  2019-06-04 16:51 ` [PATCH uprobe, thp v2 1/5] mm: move memcmp_pages() and pages_identical() Song Liu
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Song Liu @ 2019-06-04 16:51 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: peterz, oleg, rostedt, mhiramat, kirill.shutemov, kernel-team,
	william.kucharski, Song Liu

This set makes uprobe aware of THPs.

Currently, when uprobe is attached to text on THP, the page is split by
FOLL_SPLIT. As a result, uprobe eliminates the performance benefit of THP.

This set makes uprobe THP-aware. Instead of FOLL_SPLIT, we introduces
FOLL_SPLIT_PMD, which only split PMD for uprobe. After all uprobes within
the THP are removed, the PTEs are regrouped into huge PMD.

Note that, with uprobes attached, the process runs with PTEs for the huge
page. The performance benefit of THP is recovered _after_ all uprobes on
the huge page are detached.

This set (plus a few small debug patches) is also available at

   https://github.com/liu-song-6/linux/tree/uprobe-thp

Changes since v1:
1. introduces FOLL_SPLIT_PMD, instead of modifying split_huge_pmd*();
2. reuse pages_identical() from ksm.c;
3. rewrite most of try_collapse_huge_pmd().

Song Liu (5):
  mm: move memcmp_pages() and pages_identical()
  uprobe: use original page when all uprobes are removed
  mm, thp: introduce FOLL_SPLIT_PMD
  uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT
  uprobe: collapse THP pmd after removing all uprobes

 include/linux/huge_mm.h |  7 +++++
 include/linux/mm.h      |  8 +++++
 kernel/events/uprobes.c | 53 +++++++++++++++++++++++--------
 mm/gup.c                | 15 +++++++--
 mm/huge_memory.c        | 70 +++++++++++++++++++++++++++++++++++++++++
 mm/ksm.c                | 18 -----------
 mm/util.c               | 13 ++++++++
 7 files changed, 150 insertions(+), 34 deletions(-)

--
2.17.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH uprobe, thp v2 1/5] mm: move memcmp_pages() and pages_identical()
  2019-06-04 16:51 [PATCH uprobe, thp v2 0/5] THP aware uprobe Song Liu
@ 2019-06-04 16:51 ` Song Liu
  2019-06-04 16:51 ` [PATCH uprobe, thp v2 2/5] uprobe: use original page when all uprobes are removed Song Liu
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Song Liu @ 2019-06-04 16:51 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: peterz, oleg, rostedt, mhiramat, kirill.shutemov, kernel-team,
	william.kucharski, Song Liu

This patch moves memcmp_pages() to mm/util.c and pages_identical() to
mm.h, so that we can use them in other files.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 include/linux/mm.h |  7 +++++++
 mm/ksm.c           | 18 ------------------
 mm/util.c          | 13 +++++++++++++
 3 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0f57b5dfb331..1bdaf1872492 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2881,5 +2881,12 @@ void __init setup_nr_node_ids(void);
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+extern int memcmp_pages(struct page *page1, struct page *page2);
+
+static inline int pages_identical(struct page *page1, struct page *page2)
+{
+	return !memcmp_pages(page1, page2);
+}
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/mm/ksm.c b/mm/ksm.c
index 81c20ed57bf6..6f153f976c4c 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1030,24 +1030,6 @@ static u32 calc_checksum(struct page *page)
 	return checksum;
 }
 
-static int memcmp_pages(struct page *page1, struct page *page2)
-{
-	char *addr1, *addr2;
-	int ret;
-
-	addr1 = kmap_atomic(page1);
-	addr2 = kmap_atomic(page2);
-	ret = memcmp(addr1, addr2, PAGE_SIZE);
-	kunmap_atomic(addr2);
-	kunmap_atomic(addr1);
-	return ret;
-}
-
-static inline int pages_identical(struct page *page1, struct page *page2)
-{
-	return !memcmp_pages(page1, page2);
-}
-
 static int write_protect_page(struct vm_area_struct *vma, struct page *page,
 			      pte_t *orig_pte)
 {
diff --git a/mm/util.c b/mm/util.c
index c2fb8fd807df..c122718de550 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -801,3 +801,16 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen)
 out:
 	return res;
 }
+
+int memcmp_pages(struct page *page1, struct page *page2)
+{
+	char *addr1, *addr2;
+	int ret;
+
+	addr1 = kmap_atomic(page1);
+	addr2 = kmap_atomic(page2);
+	ret = memcmp(addr1, addr2, PAGE_SIZE);
+	kunmap_atomic(addr2);
+	kunmap_atomic(addr1);
+	return ret;
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH uprobe, thp v2 2/5] uprobe: use original page when all uprobes are removed
  2019-06-04 16:51 [PATCH uprobe, thp v2 0/5] THP aware uprobe Song Liu
  2019-06-04 16:51 ` [PATCH uprobe, thp v2 1/5] mm: move memcmp_pages() and pages_identical() Song Liu
@ 2019-06-04 16:51 ` Song Liu
  2019-06-05 10:02   ` Oleg Nesterov
  2019-06-04 16:51 ` [PATCH uprobe, thp v2 3/5] mm, thp: introduce FOLL_SPLIT_PMD Song Liu
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 8+ messages in thread
From: Song Liu @ 2019-06-04 16:51 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: peterz, oleg, rostedt, mhiramat, kirill.shutemov, kernel-team,
	william.kucharski, Song Liu

Currently, uprobe swaps the target page with a anonymous page in both
install_breakpoint() and remove_breakpoint(). When all uprobes on a page
are removed, the given mm is still using an anonymous page (not the
original page).

This patch allows uprobe to use original page when possible (all uprobes
on the page are already removed).

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 kernel/events/uprobes.c | 42 ++++++++++++++++++++++++++++++++---------
 1 file changed, 33 insertions(+), 9 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 78f61bfc6b79..3fca7c55d370 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -160,16 +160,19 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	int err;
 	struct mmu_notifier_range range;
 	struct mem_cgroup *memcg;
+	bool orig = new_page->mapping != NULL;  /* new_page == orig_page */
 
 	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, addr,
 				addr + PAGE_SIZE);
 
 	VM_BUG_ON_PAGE(PageTransHuge(old_page), old_page);
 
-	err = mem_cgroup_try_charge(new_page, vma->vm_mm, GFP_KERNEL, &memcg,
-			false);
-	if (err)
-		return err;
+	if (!orig) {
+		err = mem_cgroup_try_charge(new_page, vma->vm_mm, GFP_KERNEL,
+					    &memcg, false);
+		if (err)
+			return err;
+	}
 
 	/* For try_to_free_swap() and munlock_vma_page() below */
 	lock_page(old_page);
@@ -177,15 +180,22 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	mmu_notifier_invalidate_range_start(&range);
 	err = -EAGAIN;
 	if (!page_vma_mapped_walk(&pvmw)) {
-		mem_cgroup_cancel_charge(new_page, memcg, false);
+		if (!orig)
+			mem_cgroup_cancel_charge(new_page, memcg, false);
 		goto unlock;
 	}
 	VM_BUG_ON_PAGE(addr != pvmw.address, old_page);
 
 	get_page(new_page);
-	page_add_new_anon_rmap(new_page, vma, addr, false);
-	mem_cgroup_commit_charge(new_page, memcg, false, false);
-	lru_cache_add_active_or_unevictable(new_page, vma);
+	if (orig) {
+		page_add_file_rmap(new_page, false);
+		inc_mm_counter(mm, mm_counter_file(new_page));
+		dec_mm_counter(mm, MM_ANONPAGES);
+	} else {
+		page_add_new_anon_rmap(new_page, vma, addr, false);
+		mem_cgroup_commit_charge(new_page, memcg, false, false);
+		lru_cache_add_active_or_unevictable(new_page, vma);
+	}
 
 	if (!PageAnon(old_page)) {
 		dec_mm_counter(mm, mm_counter_file(old_page));
@@ -461,9 +471,10 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 			unsigned long vaddr, uprobe_opcode_t opcode)
 {
 	struct uprobe *uprobe;
-	struct page *old_page, *new_page;
+	struct page *old_page, *new_page, *orig_page = NULL;
 	struct vm_area_struct *vma;
 	int ret, is_register, ref_ctr_updated = 0;
+	pgoff_t index;
 
 	is_register = is_swbp_insn(&opcode);
 	uprobe = container_of(auprobe, struct uprobe, arch);
@@ -501,6 +512,19 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	copy_highpage(new_page, old_page);
 	copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
 
+	index = vaddr_to_offset(vma, vaddr & PAGE_MASK) >> PAGE_SHIFT;
+	orig_page = find_get_page(vma->vm_file->f_inode->i_mapping, index);
+	if (orig_page) {
+		if (pages_identical(new_page, orig_page)) {
+			/* if new_page matches orig_page, use orig_page */
+			put_page(new_page);
+			new_page = orig_page;
+		} else {
+			put_page(orig_page);
+			orig_page = NULL;
+		}
+	}
+
 	ret = __replace_page(vma, vaddr, old_page, new_page);
 	put_page(new_page);
 put_old:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH uprobe, thp v2 3/5] mm, thp: introduce FOLL_SPLIT_PMD
  2019-06-04 16:51 [PATCH uprobe, thp v2 0/5] THP aware uprobe Song Liu
  2019-06-04 16:51 ` [PATCH uprobe, thp v2 1/5] mm: move memcmp_pages() and pages_identical() Song Liu
  2019-06-04 16:51 ` [PATCH uprobe, thp v2 2/5] uprobe: use original page when all uprobes are removed Song Liu
@ 2019-06-04 16:51 ` Song Liu
  2019-06-04 16:51 ` [PATCH uprobe, thp v2 4/5] uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT Song Liu
  2019-06-04 16:51 ` [PATCH uprobe, thp v2 5/5] uprobe: collapse THP pmd after removing all uprobes Song Liu
  4 siblings, 0 replies; 8+ messages in thread
From: Song Liu @ 2019-06-04 16:51 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: peterz, oleg, rostedt, mhiramat, kirill.shutemov, kernel-team,
	william.kucharski, Song Liu

This patches introduces a new foll_flag: FOLL_SPLIT_PMD. As the name says
FOLL_SPLIT_PMD splits huge pmd for given mm_struct, the underlining huge
page stays as-is.

FOLL_SPLIT_PMD is useful for cases where we need to use regular pages,
but would switch back to huge page and huge pmd on. One of such example
is uprobe. The following patches use FOLL_SPLIT_PMD in uprobe.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 include/linux/mm.h |  1 +
 mm/gup.c           | 15 ++++++++++++---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1bdaf1872492..8b5f4a9aea0b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2633,6 +2633,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
 #define FOLL_COW	0x4000	/* internal GUP flag */
 #define FOLL_ANON	0x8000	/* don't do file mappings */
 #define FOLL_LONGTERM	0x10000	/* mapping lifetime is indefinite: see below */
+#define FOLL_SPLIT_PMD	0x20000	/* split huge pmd before returning */
 
 /*
  * NOTE on FOLL_LONGTERM:
diff --git a/mm/gup.c b/mm/gup.c
index 63ac50e48072..bdc350d95d99 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -398,7 +398,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
 		spin_unlock(ptl);
 		return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
 	}
-	if (flags & FOLL_SPLIT) {
+	if (flags & (FOLL_SPLIT | FOLL_SPLIT_PMD)) {
 		int ret;
 		page = pmd_page(*pmd);
 		if (is_huge_zero_page(page)) {
@@ -407,7 +407,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
 			split_huge_pmd(vma, pmd, address);
 			if (pmd_trans_unstable(pmd))
 				ret = -EBUSY;
-		} else {
+		} else if (flags & FOLL_SPLIT) {
 			if (unlikely(!try_get_page(page))) {
 				spin_unlock(ptl);
 				return ERR_PTR(-ENOMEM);
@@ -419,8 +419,17 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
 			put_page(page);
 			if (pmd_none(*pmd))
 				return no_page_table(vma, flags);
-		}
+		} else {  /* flags & FOLL_SPLIT_PMD */
+			pte_t *pte;
 
+			spin_unlock(ptl);
+			split_huge_pmd(vma, pmd, address);
+			pte = get_locked_pte(mm, address, &ptl);
+			if (!pte)
+				return no_page_table(vma, flags);
+			spin_unlock(ptl);
+			ret = 0;
+		}
 		return ret ? ERR_PTR(ret) :
 			follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH uprobe, thp v2 4/5] uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT
  2019-06-04 16:51 [PATCH uprobe, thp v2 0/5] THP aware uprobe Song Liu
                   ` (2 preceding siblings ...)
  2019-06-04 16:51 ` [PATCH uprobe, thp v2 3/5] mm, thp: introduce FOLL_SPLIT_PMD Song Liu
@ 2019-06-04 16:51 ` Song Liu
  2019-06-04 16:51 ` [PATCH uprobe, thp v2 5/5] uprobe: collapse THP pmd after removing all uprobes Song Liu
  4 siblings, 0 replies; 8+ messages in thread
From: Song Liu @ 2019-06-04 16:51 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: peterz, oleg, rostedt, mhiramat, kirill.shutemov, kernel-team,
	william.kucharski, Song Liu

This patches uses newly added FOLL_SPLIT_PMD in uprobe. This enables easy
regroup of huge pmd after the uprobe is disabled (in next patch).

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 kernel/events/uprobes.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 3fca7c55d370..88a8e1624bfa 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -153,7 +153,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 {
 	struct mm_struct *mm = vma->vm_mm;
 	struct page_vma_mapped_walk pvmw = {
-		.page = old_page,
+		.page = compound_head(old_page),
 		.vma = vma,
 		.address = addr,
 	};
@@ -165,8 +165,6 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, addr,
 				addr + PAGE_SIZE);
 
-	VM_BUG_ON_PAGE(PageTransHuge(old_page), old_page);
-
 	if (!orig) {
 		err = mem_cgroup_try_charge(new_page, vma->vm_mm, GFP_KERNEL,
 					    &memcg, false);
@@ -188,7 +186,9 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 
 	get_page(new_page);
 	if (orig) {
+		lock_page(new_page);  /* for page_add_file_rmap() */
 		page_add_file_rmap(new_page, false);
+		unlock_page(new_page);
 		inc_mm_counter(mm, mm_counter_file(new_page));
 		dec_mm_counter(mm, MM_ANONPAGES);
 	} else {
@@ -482,7 +482,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 retry:
 	/* Read the page with vaddr into memory */
 	ret = get_user_pages_remote(NULL, mm, vaddr, 1,
-			FOLL_FORCE | FOLL_SPLIT, &old_page, &vma, NULL);
+			FOLL_FORCE | FOLL_SPLIT_PMD, &old_page, &vma, NULL);
 	if (ret <= 0)
 		return ret;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH uprobe, thp v2 5/5] uprobe: collapse THP pmd after removing all uprobes
  2019-06-04 16:51 [PATCH uprobe, thp v2 0/5] THP aware uprobe Song Liu
                   ` (3 preceding siblings ...)
  2019-06-04 16:51 ` [PATCH uprobe, thp v2 4/5] uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT Song Liu
@ 2019-06-04 16:51 ` Song Liu
  4 siblings, 0 replies; 8+ messages in thread
From: Song Liu @ 2019-06-04 16:51 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: peterz, oleg, rostedt, mhiramat, kirill.shutemov, kernel-team,
	william.kucharski, Song Liu

After all uprobes are removed from the huge page (with PTE pgtable), it
is possible to collapse the pmd and benefit from THP again. This patch
does the collapse.

An issue on earlier version was discovered by kbuild test robot.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 include/linux/huge_mm.h |  7 +++++
 kernel/events/uprobes.c |  3 ++
 mm/huge_memory.c        | 70 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 80 insertions(+)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 7cd5c150c21d..b969022dc922 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -250,6 +250,9 @@ static inline bool thp_migration_supported(void)
 	return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION);
 }
 
+extern inline void try_collapse_huge_pmd(struct vm_area_struct *vma,
+					 struct page *page);
+
 #else /* CONFIG_TRANSPARENT_HUGEPAGE */
 #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
 #define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; })
@@ -368,6 +371,10 @@ static inline bool thp_migration_supported(void)
 {
 	return false;
 }
+
+static inline void try_collapse_huge_pmd(struct vm_area_struct *vma,
+					 struct page *page) {}
+
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #endif /* _LINUX_HUGE_MM_H */
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 88a8e1624bfa..0c8e2358dbf5 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -537,6 +537,9 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	if (ret && is_register && ref_ctr_updated)
 		update_ref_ctr(uprobe, mm, -1);
 
+	if (!ret && orig_page && PageTransCompound(orig_page))
+		try_collapse_huge_pmd(vma, orig_page);
+
 	return ret;
 }
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9f8bce9a6b32..03855a480fd2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2886,6 +2886,76 @@ static struct shrinker deferred_split_shrinker = {
 	.flags = SHRINKER_NUMA_AWARE,
 };
 
+/**
+ * try_collapse_huge_pmd - try collapse pmd for a pte mapped huge page
+ * @vma: vma containing the huge page
+ * @page: any sub page of the huge page
+ */
+void try_collapse_huge_pmd(struct vm_area_struct *vma,
+			   struct page *page)
+{
+	struct page *hpage = compound_head(page);
+	struct mm_struct *mm = vma->vm_mm;
+	struct mmu_notifier_range range;
+	unsigned long haddr;
+	unsigned long addr;
+	pmd_t *pmd, _pmd;
+	int i, count = 0;
+	spinlock_t *ptl;
+
+	VM_BUG_ON_PAGE(!PageCompound(page), page);
+
+	haddr = page_address_in_vma(hpage, vma);
+	pmd = mm_find_pmd(mm, haddr);
+	if (!pmd)
+		return;
+
+	ptl = pmd_lock(mm, pmd);
+
+	/* step 1: check all mapped PTEs */
+	for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) {
+		pte_t *pte = pte_offset_map(pmd, addr);
+
+		if (pte_none(*pte))
+			continue;
+		if (hpage + i != vm_normal_page(vma, addr, *pte)) {
+			spin_unlock(ptl);
+			return;
+		}
+		count++;
+	}
+
+	/* step 2: adjust rmap and refcount */
+	for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) {
+		pte_t *pte = pte_offset_map(pmd, addr);
+		struct page *p;
+
+		if (pte_none(*pte))
+			continue;
+		p = vm_normal_page(vma, addr, *pte);
+		lock_page(p);
+		page_remove_rmap(p, false);
+		unlock_page(p);
+		put_page(p);
+	}
+
+	/* step 3: flip page table */
+	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm,
+				haddr, haddr + HPAGE_PMD_SIZE);
+	mmu_notifier_invalidate_range_start(&range);
+
+	_pmd = pmdp_collapse_flush(vma, haddr, pmd);
+	spin_unlock(ptl);
+	mmu_notifier_invalidate_range_end(&range);
+
+	/* step 4: free pgtable, clean up counters, etc. */
+	mm_dec_nr_ptes(mm);
+	pte_free(mm, pmd_pgtable(_pmd));
+	add_mm_counter(mm,
+		       shmem_file(vma->vm_file) ? MM_SHMEMPAGES : MM_FILEPAGES,
+		       -count);
+}
+
 #ifdef CONFIG_DEBUG_FS
 static int split_huge_pages_set(void *data, u64 val)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH uprobe, thp v2 2/5] uprobe: use original page when all uprobes are removed
  2019-06-04 16:51 ` [PATCH uprobe, thp v2 2/5] uprobe: use original page when all uprobes are removed Song Liu
@ 2019-06-05 10:02   ` Oleg Nesterov
  2019-06-05 16:29     ` Song Liu
  0 siblings, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2019-06-05 10:02 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, linux-mm, peterz, rostedt, mhiramat,
	kirill.shutemov, kernel-team, william.kucharski

On 06/04, Song Liu wrote:
>
> Currently, uprobe swaps the target page with a anonymous page in both
> install_breakpoint() and remove_breakpoint(). When all uprobes on a page
> are removed, the given mm is still using an anonymous page (not the
> original page).

Agreed, it would be nice to avoid this,

> @@ -461,9 +471,10 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
>  			unsigned long vaddr, uprobe_opcode_t opcode)
>  {
>  	struct uprobe *uprobe;
> -	struct page *old_page, *new_page;
> +	struct page *old_page, *new_page, *orig_page = NULL;
>  	struct vm_area_struct *vma;
>  	int ret, is_register, ref_ctr_updated = 0;
> +	pgoff_t index;
>  
>  	is_register = is_swbp_insn(&opcode);
>  	uprobe = container_of(auprobe, struct uprobe, arch);
> @@ -501,6 +512,19 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
>  	copy_highpage(new_page, old_page);
>  	copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
>  
> +	index = vaddr_to_offset(vma, vaddr & PAGE_MASK) >> PAGE_SHIFT;
> +	orig_page = find_get_page(vma->vm_file->f_inode->i_mapping, index);

I think you should take is_register into account, if it is true we are going
to install the breakpoint so we can avoid find_get_page/pages_identical.

> +	if (orig_page) {
> +		if (pages_identical(new_page, orig_page)) {
> +			/* if new_page matches orig_page, use orig_page */
> +			put_page(new_page);
> +			new_page = orig_page;

Hmm. can't we simply unmap the page in this case?

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH uprobe, thp v2 2/5] uprobe: use original page when all uprobes are removed
  2019-06-05 10:02   ` Oleg Nesterov
@ 2019-06-05 16:29     ` Song Liu
  0 siblings, 0 replies; 8+ messages in thread
From: Song Liu @ 2019-06-05 16:29 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: LKML, Linux-MM, Peter Zijlstra, Steven Rostedt, Masami Hiramatsu,
	Kirill A. Shutemov, Kernel Team, william.kucharski

Hi Oleg,

Thanks for your kind review!

> On Jun 5, 2019, at 3:02 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> 
> On 06/04, Song Liu wrote:
>> 
>> Currently, uprobe swaps the target page with a anonymous page in both
>> install_breakpoint() and remove_breakpoint(). When all uprobes on a page
>> are removed, the given mm is still using an anonymous page (not the
>> original page).
> 
> Agreed, it would be nice to avoid this,
> 
>> @@ -461,9 +471,10 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
>> 			unsigned long vaddr, uprobe_opcode_t opcode)
>> {
>> 	struct uprobe *uprobe;
>> -	struct page *old_page, *new_page;
>> +	struct page *old_page, *new_page, *orig_page = NULL;
>> 	struct vm_area_struct *vma;
>> 	int ret, is_register, ref_ctr_updated = 0;
>> +	pgoff_t index;
>> 
>> 	is_register = is_swbp_insn(&opcode);
>> 	uprobe = container_of(auprobe, struct uprobe, arch);
>> @@ -501,6 +512,19 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
>> 	copy_highpage(new_page, old_page);
>> 	copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
>> 
>> +	index = vaddr_to_offset(vma, vaddr & PAGE_MASK) >> PAGE_SHIFT;
>> +	orig_page = find_get_page(vma->vm_file->f_inode->i_mapping, index);
> 
> I think you should take is_register into account, if it is true we are going
> to install the breakpoint so we can avoid find_get_page/pages_identical.

Good idea! I will add this to v3. 

> 
>> +	if (orig_page) {
>> +		if (pages_identical(new_page, orig_page)) {
>> +			/* if new_page matches orig_page, use orig_page */
>> +			put_page(new_page);
>> +			new_page = orig_page;
> 
> Hmm. can't we simply unmap the page in this case?

I haven't found an easier way here. I tried with zap_vma_ptes() and 
unmap_page_range(). But neither of them works well here. 

Also, we need to deal with *_mm_counter, rmap, etc. So I guess reusing
__replace_page() (as current patch) is probably the easiest solution. 

Did I miss anything? 

Thanks again,
Song





^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-06-05 16:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-04 16:51 [PATCH uprobe, thp v2 0/5] THP aware uprobe Song Liu
2019-06-04 16:51 ` [PATCH uprobe, thp v2 1/5] mm: move memcmp_pages() and pages_identical() Song Liu
2019-06-04 16:51 ` [PATCH uprobe, thp v2 2/5] uprobe: use original page when all uprobes are removed Song Liu
2019-06-05 10:02   ` Oleg Nesterov
2019-06-05 16:29     ` Song Liu
2019-06-04 16:51 ` [PATCH uprobe, thp v2 3/5] mm, thp: introduce FOLL_SPLIT_PMD Song Liu
2019-06-04 16:51 ` [PATCH uprobe, thp v2 4/5] uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT Song Liu
2019-06-04 16:51 ` [PATCH uprobe, thp v2 5/5] uprobe: collapse THP pmd after removing all uprobes Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).