All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/4] THP aware uprobe
@ 2019-07-24  8:35 Song Liu
  2019-07-24  8:35 ` [PATCH v8 1/4] mm: move memcmp_pages() and pages_identical() Song Liu
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Song Liu @ 2019-07-24  8:35 UTC (permalink / raw)
  To: linux-kernel, linux-mm, akpm
  Cc: matthew.wilcox, kirill.shutemov, peterz, oleg, rostedt,
	kernel-team, william.kucharski, Song Liu

This set makes uprobe aware of THPs.

Currently, when uprobe is attached to text on THP, the page is split by
FOLL_SPLIT. As a result, uprobe eliminates the performance benefit of THP.

This set makes uprobe THP-aware. Instead of FOLL_SPLIT, we introduces
FOLL_SPLIT_PMD, which only split PMD for uprobe.

TODO (temporarily removed in v7):
After all uprobes within the THP are removed, regroup the PTE-mapped pages
into huge PMD.

This set (plus a few THP patches) is also available at

   https://github.com/liu-song-6/linux/tree/uprobe-thp

Changes v7 => v8:
1. check PageUptodate() for orig_page (Oleg Nesterov).

Changes v6 => v7:
1. Include Acked-by from Kirill A. Shutemov for the first 4 patches;
2. Keep only the first 4 patches (while I working on improving the last 2).

Changes v5 => v6:
1. Enable khugepaged to collapse pmd for pte-mapped THP
   (Kirill A. Shutemov).
2. uprobe asks khuagepaged to collaspe pmd. (Kirill A. Shutemov)

Note: Theast two patches in v6 the set apply _after_ v7 of set "Enable THP
      for text section of non-shmem files"

Changes v4 => v5:
1. Propagate pte_alloc() error out of follow_pmd_mask().

Changes since v3:
1. Simplify FOLL_SPLIT_PMD case in follow_pmd_mask(), (Kirill A. Shutemov)
2. Fix try_collapse_huge_pmd() to match change in follow_pmd_mask().

Changes since v2:
1. For FOLL_SPLIT_PMD, populated the page table in follow_pmd_mask().
2. Simplify logic in uprobe_write_opcode. (Oleg Nesterov)
3. Fix page refcount handling with FOLL_SPLIT_PMD.
4. Much more testing, together with THP on ext4 and btrfs (sending in
   separate set).
5. Rebased.

Changes since v1:
1. introduces FOLL_SPLIT_PMD, instead of modifying split_huge_pmd*();
2. reuse pages_identical() from ksm.c;
3. rewrite most of try_collapse_huge_pmd().

Song Liu (4):
  mm: move memcmp_pages() and pages_identical()
  uprobe: use original page when all uprobes are removed
  mm, thp: introduce FOLL_SPLIT_PMD
  uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT

 include/linux/mm.h      |  8 +++++++
 kernel/events/uprobes.c | 52 +++++++++++++++++++++++++++++++----------
 mm/gup.c                |  8 +++++--
 mm/ksm.c                | 18 --------------
 mm/util.c               | 13 +++++++++++
 5 files changed, 67 insertions(+), 32 deletions(-)

--
2.17.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v8 1/4] mm: move memcmp_pages() and pages_identical()
  2019-07-24  8:35 [PATCH v8 0/4] THP aware uprobe Song Liu
@ 2019-07-24  8:35 ` Song Liu
  2019-07-24  8:35 ` [PATCH v8 2/4] uprobe: use original page when all uprobes are removed Song Liu
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Song Liu @ 2019-07-24  8:35 UTC (permalink / raw)
  To: linux-kernel, linux-mm, akpm
  Cc: matthew.wilcox, kirill.shutemov, peterz, oleg, rostedt,
	kernel-team, william.kucharski, Song Liu

This patch moves memcmp_pages() to mm/util.c and pages_identical() to
mm.h, so that we can use them in other files.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 include/linux/mm.h |  7 +++++++
 mm/ksm.c           | 18 ------------------
 mm/util.c          | 13 +++++++++++++
 3 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0334ca97c584..f189176dabed 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2891,5 +2891,12 @@ void __init setup_nr_node_ids(void);
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+extern int memcmp_pages(struct page *page1, struct page *page2);
+
+static inline int pages_identical(struct page *page1, struct page *page2)
+{
+	return !memcmp_pages(page1, page2);
+}
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/mm/ksm.c b/mm/ksm.c
index 3dc4346411e4..dbee2eb4dd05 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1029,24 +1029,6 @@ static u32 calc_checksum(struct page *page)
 	return checksum;
 }
 
-static int memcmp_pages(struct page *page1, struct page *page2)
-{
-	char *addr1, *addr2;
-	int ret;
-
-	addr1 = kmap_atomic(page1);
-	addr2 = kmap_atomic(page2);
-	ret = memcmp(addr1, addr2, PAGE_SIZE);
-	kunmap_atomic(addr2);
-	kunmap_atomic(addr1);
-	return ret;
-}
-
-static inline int pages_identical(struct page *page1, struct page *page2)
-{
-	return !memcmp_pages(page1, page2);
-}
-
 static int write_protect_page(struct vm_area_struct *vma, struct page *page,
 			      pte_t *orig_pte)
 {
diff --git a/mm/util.c b/mm/util.c
index e6351a80f248..0d5e2f425612 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -783,3 +783,16 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen)
 out:
 	return res;
 }
+
+int memcmp_pages(struct page *page1, struct page *page2)
+{
+	char *addr1, *addr2;
+	int ret;
+
+	addr1 = kmap_atomic(page1);
+	addr2 = kmap_atomic(page2);
+	ret = memcmp(addr1, addr2, PAGE_SIZE);
+	kunmap_atomic(addr2);
+	kunmap_atomic(addr1);
+	return ret;
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v8 2/4] uprobe: use original page when all uprobes are removed
  2019-07-24  8:35 [PATCH v8 0/4] THP aware uprobe Song Liu
  2019-07-24  8:35 ` [PATCH v8 1/4] mm: move memcmp_pages() and pages_identical() Song Liu
@ 2019-07-24  8:35 ` Song Liu
  2019-07-24  9:07   ` Oleg Nesterov
  2019-07-24 11:37   ` Oleg Nesterov
  2019-07-24  8:35 ` [PATCH v8 3/4] mm, thp: introduce FOLL_SPLIT_PMD Song Liu
  2019-07-24  8:36 ` [PATCH v8 4/4] uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT Song Liu
  3 siblings, 2 replies; 15+ messages in thread
From: Song Liu @ 2019-07-24  8:35 UTC (permalink / raw)
  To: linux-kernel, linux-mm, akpm
  Cc: matthew.wilcox, kirill.shutemov, peterz, oleg, rostedt,
	kernel-team, william.kucharski, Song Liu

Currently, uprobe swaps the target page with a anonymous page in both
install_breakpoint() and remove_breakpoint(). When all uprobes on a page
are removed, the given mm is still using an anonymous page (not the
original page).

This patch allows uprobe to use original page when possible (all uprobes
on the page are already removed).

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 kernel/events/uprobes.c | 46 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 38 insertions(+), 8 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 84fa00497c49..6b217bd031ef 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -160,16 +160,19 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	int err;
 	struct mmu_notifier_range range;
 	struct mem_cgroup *memcg;
+	bool orig = new_page->mapping != NULL;  /* new_page == orig_page */
 
 	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, addr,
 				addr + PAGE_SIZE);
 
 	VM_BUG_ON_PAGE(PageTransHuge(old_page), old_page);
 
-	err = mem_cgroup_try_charge(new_page, vma->vm_mm, GFP_KERNEL, &memcg,
-			false);
-	if (err)
-		return err;
+	if (!orig) {
+		err = mem_cgroup_try_charge(new_page, vma->vm_mm, GFP_KERNEL,
+					    &memcg, false);
+		if (err)
+			return err;
+	}
 
 	/* For try_to_free_swap() and munlock_vma_page() below */
 	lock_page(old_page);
@@ -177,15 +180,24 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	mmu_notifier_invalidate_range_start(&range);
 	err = -EAGAIN;
 	if (!page_vma_mapped_walk(&pvmw)) {
-		mem_cgroup_cancel_charge(new_page, memcg, false);
+		if (!orig)
+			mem_cgroup_cancel_charge(new_page, memcg, false);
 		goto unlock;
 	}
 	VM_BUG_ON_PAGE(addr != pvmw.address, old_page);
 
 	get_page(new_page);
-	page_add_new_anon_rmap(new_page, vma, addr, false);
-	mem_cgroup_commit_charge(new_page, memcg, false, false);
-	lru_cache_add_active_or_unevictable(new_page, vma);
+	if (orig) {
+		lock_page(new_page);  /* for page_add_file_rmap() */
+		page_add_file_rmap(new_page, false);
+		unlock_page(new_page);
+		inc_mm_counter(mm, mm_counter_file(new_page));
+		dec_mm_counter(mm, MM_ANONPAGES);
+	} else {
+		page_add_new_anon_rmap(new_page, vma, addr, false);
+		mem_cgroup_commit_charge(new_page, memcg, false, false);
+		lru_cache_add_active_or_unevictable(new_page, vma);
+	}
 
 	if (!PageAnon(old_page)) {
 		dec_mm_counter(mm, mm_counter_file(old_page));
@@ -501,6 +513,24 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	copy_highpage(new_page, old_page);
 	copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
 
+	if (!is_register) {
+		struct page *orig_page;
+		pgoff_t index;
+
+		index = vaddr_to_offset(vma, vaddr & PAGE_MASK) >> PAGE_SHIFT;
+		orig_page = find_get_page(vma->vm_file->f_inode->i_mapping,
+					  index);
+
+		if (orig_page) {
+			if (PageUptodate(orig_page) &&
+			    pages_identical(new_page, orig_page)) {
+				put_page(new_page);
+				new_page = orig_page;
+			} else
+				put_page(orig_page);
+		}
+	}
+
 	ret = __replace_page(vma, vaddr, old_page, new_page);
 	put_page(new_page);
 put_old:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v8 3/4] mm, thp: introduce FOLL_SPLIT_PMD
  2019-07-24  8:35 [PATCH v8 0/4] THP aware uprobe Song Liu
  2019-07-24  8:35 ` [PATCH v8 1/4] mm: move memcmp_pages() and pages_identical() Song Liu
  2019-07-24  8:35 ` [PATCH v8 2/4] uprobe: use original page when all uprobes are removed Song Liu
@ 2019-07-24  8:35 ` Song Liu
  2019-07-24  8:36 ` [PATCH v8 4/4] uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT Song Liu
  3 siblings, 0 replies; 15+ messages in thread
From: Song Liu @ 2019-07-24  8:35 UTC (permalink / raw)
  To: linux-kernel, linux-mm, akpm
  Cc: matthew.wilcox, kirill.shutemov, peterz, oleg, rostedt,
	kernel-team, william.kucharski, Song Liu

This patches introduces a new foll_flag: FOLL_SPLIT_PMD. As the name says
FOLL_SPLIT_PMD splits huge pmd for given mm_struct, the underlining huge
page stays as-is.

FOLL_SPLIT_PMD is useful for cases where we need to use regular pages,
but would switch back to huge page and huge pmd on. One of such example
is uprobe. The following patches use FOLL_SPLIT_PMD in uprobe.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 include/linux/mm.h | 1 +
 mm/gup.c           | 8 ++++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f189176dabed..74db879711eb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2614,6 +2614,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
 #define FOLL_COW	0x4000	/* internal GUP flag */
 #define FOLL_ANON	0x8000	/* don't do file mappings */
 #define FOLL_LONGTERM	0x10000	/* mapping lifetime is indefinite: see below */
+#define FOLL_SPLIT_PMD	0x20000	/* split huge pmd before returning */
 
 /*
  * NOTE on FOLL_LONGTERM:
diff --git a/mm/gup.c b/mm/gup.c
index 98f13ab37bac..3c514e223ce3 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -399,7 +399,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
 		spin_unlock(ptl);
 		return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
 	}
-	if (flags & FOLL_SPLIT) {
+	if (flags & (FOLL_SPLIT | FOLL_SPLIT_PMD)) {
 		int ret;
 		page = pmd_page(*pmd);
 		if (is_huge_zero_page(page)) {
@@ -408,7 +408,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
 			split_huge_pmd(vma, pmd, address);
 			if (pmd_trans_unstable(pmd))
 				ret = -EBUSY;
-		} else {
+		} else if (flags & FOLL_SPLIT) {
 			if (unlikely(!try_get_page(page))) {
 				spin_unlock(ptl);
 				return ERR_PTR(-ENOMEM);
@@ -420,6 +420,10 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
 			put_page(page);
 			if (pmd_none(*pmd))
 				return no_page_table(vma, flags);
+		} else {  /* flags & FOLL_SPLIT_PMD */
+			spin_unlock(ptl);
+			split_huge_pmd(vma, pmd, address);
+			ret = pte_alloc(mm, pmd);
 		}
 
 		return ret ? ERR_PTR(ret) :
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v8 4/4] uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT
  2019-07-24  8:35 [PATCH v8 0/4] THP aware uprobe Song Liu
                   ` (2 preceding siblings ...)
  2019-07-24  8:35 ` [PATCH v8 3/4] mm, thp: introduce FOLL_SPLIT_PMD Song Liu
@ 2019-07-24  8:36 ` Song Liu
  3 siblings, 0 replies; 15+ messages in thread
From: Song Liu @ 2019-07-24  8:36 UTC (permalink / raw)
  To: linux-kernel, linux-mm, akpm
  Cc: matthew.wilcox, kirill.shutemov, peterz, oleg, rostedt,
	kernel-team, william.kucharski, Song Liu

This patches uses newly added FOLL_SPLIT_PMD in uprobe. This enables easy
regroup of huge pmd after the uprobe is disabled (in next patch).

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 kernel/events/uprobes.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 6b217bd031ef..7d11ea16d471 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -153,7 +153,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 {
 	struct mm_struct *mm = vma->vm_mm;
 	struct page_vma_mapped_walk pvmw = {
-		.page = old_page,
+		.page = compound_head(old_page),
 		.vma = vma,
 		.address = addr,
 	};
@@ -165,8 +165,6 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, addr,
 				addr + PAGE_SIZE);
 
-	VM_BUG_ON_PAGE(PageTransHuge(old_page), old_page);
-
 	if (!orig) {
 		err = mem_cgroup_try_charge(new_page, vma->vm_mm, GFP_KERNEL,
 					    &memcg, false);
@@ -483,7 +481,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 retry:
 	/* Read the page with vaddr into memory */
 	ret = get_user_pages_remote(NULL, mm, vaddr, 1,
-			FOLL_FORCE | FOLL_SPLIT, &old_page, &vma, NULL);
+			FOLL_FORCE | FOLL_SPLIT_PMD, &old_page, &vma, NULL);
 	if (ret <= 0)
 		return ret;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 2/4] uprobe: use original page when all uprobes are removed
  2019-07-24  8:35 ` [PATCH v8 2/4] uprobe: use original page when all uprobes are removed Song Liu
@ 2019-07-24  9:07   ` Oleg Nesterov
  2019-07-24  9:17     ` Oleg Nesterov
  2019-07-24 11:37   ` Oleg Nesterov
  1 sibling, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2019-07-24  9:07 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, linux-mm, akpm, matthew.wilcox, kirill.shutemov,
	peterz, rostedt, kernel-team, william.kucharski

On 07/24, Song Liu wrote:
>
> This patch allows uprobe to use original page when possible (all uprobes
> on the page are already removed).

and only if the original page is already in the page cache and uptodate,
right?

another reason why I think unmap makes more sense... but I won't argue.

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 2/4] uprobe: use original page when all uprobes are removed
  2019-07-24  9:07   ` Oleg Nesterov
@ 2019-07-24  9:17     ` Oleg Nesterov
  2019-07-24  9:20       ` Song Liu
  0 siblings, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2019-07-24  9:17 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, linux-mm, akpm, matthew.wilcox, kirill.shutemov,
	peterz, rostedt, kernel-team, william.kucharski

On 07/24, Oleg Nesterov wrote:
>
> On 07/24, Song Liu wrote:
> >
> > This patch allows uprobe to use original page when possible (all uprobes
> > on the page are already removed).
>
> and only if the original page is already in the page cache and uptodate,
> right?
>
> another reason why I think unmap makes more sense... but I won't argue.

but somehow I forgot we need to read the original page anyway to check
pages_identical(), so unmap is not really better, please forget.

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 2/4] uprobe: use original page when all uprobes are removed
  2019-07-24  9:17     ` Oleg Nesterov
@ 2019-07-24  9:20       ` Song Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Song Liu @ 2019-07-24  9:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: lkml, linux-mm, akpm, matthew.wilcox, kirill.shutemov, peterz,
	rostedt, Kernel Team, william.kucharski



> On Jul 24, 2019, at 2:17 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> 
> On 07/24, Oleg Nesterov wrote:
>> 
>> On 07/24, Song Liu wrote:
>>> 
>>> This patch allows uprobe to use original page when possible (all uprobes
>>> on the page are already removed).
>> 
>> and only if the original page is already in the page cache and uptodate,
>> right?
>> 
>> another reason why I think unmap makes more sense... but I won't argue.
> 
> but somehow I forgot we need to read the original page anyway to check
> pages_identical(), so unmap is not really better, please forget.
> 

Yeah, I was trying to explain this. :)

Thanks for your feedbacks. 
Song


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 2/4] uprobe: use original page when all uprobes are removed
  2019-07-24  8:35 ` [PATCH v8 2/4] uprobe: use original page when all uprobes are removed Song Liu
  2019-07-24  9:07   ` Oleg Nesterov
@ 2019-07-24 11:37   ` Oleg Nesterov
  2019-07-24 18:52     ` Song Liu
  1 sibling, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2019-07-24 11:37 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, linux-mm, akpm, matthew.wilcox, kirill.shutemov,
	peterz, rostedt, kernel-team, william.kucharski

On 07/24, Song Liu wrote:
>
>  	lock_page(old_page);
> @@ -177,15 +180,24 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
>  	mmu_notifier_invalidate_range_start(&range);
>  	err = -EAGAIN;
>  	if (!page_vma_mapped_walk(&pvmw)) {
> -		mem_cgroup_cancel_charge(new_page, memcg, false);
> +		if (!orig)
> +			mem_cgroup_cancel_charge(new_page, memcg, false);
>  		goto unlock;
>  	}
>  	VM_BUG_ON_PAGE(addr != pvmw.address, old_page);
>  
>  	get_page(new_page);
> -	page_add_new_anon_rmap(new_page, vma, addr, false);
> -	mem_cgroup_commit_charge(new_page, memcg, false, false);
> -	lru_cache_add_active_or_unevictable(new_page, vma);
> +	if (orig) {
> +		lock_page(new_page);  /* for page_add_file_rmap() */
> +		page_add_file_rmap(new_page, false);


Shouldn't we re-check new_page->mapping after lock_page() ? Or we can't
race with truncate?


and I am worried this code can try to lock the same page twice...
Say, the probed application does MADV_DONTNEED and then writes "int3"
into vma->vm_file at the same address to fool verify_opcode().

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 2/4] uprobe: use original page when all uprobes are removed
  2019-07-24 11:37   ` Oleg Nesterov
@ 2019-07-24 18:52     ` Song Liu
  2019-07-25  8:14       ` Oleg Nesterov
  0 siblings, 1 reply; 15+ messages in thread
From: Song Liu @ 2019-07-24 18:52 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: lkml, Linux-MM, Andrew Morton, matthew.wilcox, kirill.shutemov,
	peterz, rostedt, Kernel Team, william.kucharski



> On Jul 24, 2019, at 4:37 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> 
> On 07/24, Song Liu wrote:
>> 
>> 	lock_page(old_page);
>> @@ -177,15 +180,24 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
>> 	mmu_notifier_invalidate_range_start(&range);
>> 	err = -EAGAIN;
>> 	if (!page_vma_mapped_walk(&pvmw)) {
>> -		mem_cgroup_cancel_charge(new_page, memcg, false);
>> +		if (!orig)
>> +			mem_cgroup_cancel_charge(new_page, memcg, false);
>> 		goto unlock;
>> 	}
>> 	VM_BUG_ON_PAGE(addr != pvmw.address, old_page);
>> 
>> 	get_page(new_page);
>> -	page_add_new_anon_rmap(new_page, vma, addr, false);
>> -	mem_cgroup_commit_charge(new_page, memcg, false, false);
>> -	lru_cache_add_active_or_unevictable(new_page, vma);
>> +	if (orig) {
>> +		lock_page(new_page);  /* for page_add_file_rmap() */
>> +		page_add_file_rmap(new_page, false);
> 
> 
> Shouldn't we re-check new_page->mapping after lock_page() ? Or we can't
> race with truncate?

We can't race with truncate, because the file is open as binary and 
protected with DENYWRITE (ETXTBSY). 

> 
> 
> and I am worried this code can try to lock the same page twice...
> Say, the probed application does MADV_DONTNEED and then writes "int3"
> into vma->vm_file at the same address to fool verify_opcode().
> 

Do you mean the case where old_page == new_page? I think this won't 
happen, because in uprobe_write_opcode() we only do orig_page for 
!is_register case. 

Thanks,
Song


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 2/4] uprobe: use original page when all uprobes are removed
  2019-07-24 18:52     ` Song Liu
@ 2019-07-25  8:14       ` Oleg Nesterov
  2019-07-25 18:17         ` Song Liu
  0 siblings, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2019-07-25  8:14 UTC (permalink / raw)
  To: Song Liu
  Cc: lkml, Linux-MM, Andrew Morton, matthew.wilcox, kirill.shutemov,
	peterz, rostedt, Kernel Team, william.kucharski

On 07/24, Song Liu wrote:
>
>
> > On Jul 24, 2019, at 4:37 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > On 07/24, Song Liu wrote:
> >>
> >> 	lock_page(old_page);
> >> @@ -177,15 +180,24 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
> >> 	mmu_notifier_invalidate_range_start(&range);
> >> 	err = -EAGAIN;
> >> 	if (!page_vma_mapped_walk(&pvmw)) {
> >> -		mem_cgroup_cancel_charge(new_page, memcg, false);
> >> +		if (!orig)
> >> +			mem_cgroup_cancel_charge(new_page, memcg, false);
> >> 		goto unlock;
> >> 	}
> >> 	VM_BUG_ON_PAGE(addr != pvmw.address, old_page);
> >>
> >> 	get_page(new_page);
> >> -	page_add_new_anon_rmap(new_page, vma, addr, false);
> >> -	mem_cgroup_commit_charge(new_page, memcg, false, false);
> >> -	lru_cache_add_active_or_unevictable(new_page, vma);
> >> +	if (orig) {
> >> +		lock_page(new_page);  /* for page_add_file_rmap() */
> >> +		page_add_file_rmap(new_page, false);
> >
> >
> > Shouldn't we re-check new_page->mapping after lock_page() ? Or we can't
> > race with truncate?
>
> We can't race with truncate, because the file is open as binary and
> protected with DENYWRITE (ETXTBSY).

No. Yes, deny_write_access() protects mm->exe_file, but not the dynamic
libraries or other files which can be mmaped.

> > and I am worried this code can try to lock the same page twice...
> > Say, the probed application does MADV_DONTNEED and then writes "int3"
> > into vma->vm_file at the same address to fool verify_opcode().
> >
>
> Do you mean the case where old_page == new_page?

Yes,

> I think this won't
> happen, because in uprobe_write_opcode() we only do orig_page for
> !is_register case.

See above.

!is_register doesn't necessarily mean the original page was previously cow'ed.
And even if it was cow'ed, MADV_DONTNEED can restore the original mapping.

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 2/4] uprobe: use original page when all uprobes are removed
  2019-07-25  8:14       ` Oleg Nesterov
@ 2019-07-25 18:17         ` Song Liu
  2019-07-26  6:07           ` Song Liu
  2019-07-26  8:44           ` Oleg Nesterov
  0 siblings, 2 replies; 15+ messages in thread
From: Song Liu @ 2019-07-25 18:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: lkml, Linux-MM, Andrew Morton, matthew.wilcox, kirill.shutemov,
	peterz, rostedt, Kernel Team, william.kucharski



> On Jul 25, 2019, at 1:14 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> 
> On 07/24, Song Liu wrote:
>> 
>> 
>>> On Jul 24, 2019, at 4:37 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>> 
>>> On 07/24, Song Liu wrote:
>>>> 
>>>> 	lock_page(old_page);
>>>> @@ -177,15 +180,24 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
>>>> 	mmu_notifier_invalidate_range_start(&range);
>>>> 	err = -EAGAIN;
>>>> 	if (!page_vma_mapped_walk(&pvmw)) {
>>>> -		mem_cgroup_cancel_charge(new_page, memcg, false);
>>>> +		if (!orig)
>>>> +			mem_cgroup_cancel_charge(new_page, memcg, false);
>>>> 		goto unlock;
>>>> 	}
>>>> 	VM_BUG_ON_PAGE(addr != pvmw.address, old_page);
>>>> 
>>>> 	get_page(new_page);
>>>> -	page_add_new_anon_rmap(new_page, vma, addr, false);
>>>> -	mem_cgroup_commit_charge(new_page, memcg, false, false);
>>>> -	lru_cache_add_active_or_unevictable(new_page, vma);
>>>> +	if (orig) {
>>>> +		lock_page(new_page);  /* for page_add_file_rmap() */
>>>> +		page_add_file_rmap(new_page, false);
>>> 
>>> 
>>> Shouldn't we re-check new_page->mapping after lock_page() ? Or we can't
>>> race with truncate?
>> 
>> We can't race with truncate, because the file is open as binary and
>> protected with DENYWRITE (ETXTBSY).
> 
> No. Yes, deny_write_access() protects mm->exe_file, but not the dynamic
> libraries or other files which can be mmaped.

I see. Let me see how we can cover this. 

> 
>>> and I am worried this code can try to lock the same page twice...
>>> Say, the probed application does MADV_DONTNEED and then writes "int3"
>>> into vma->vm_file at the same address to fool verify_opcode().
>>> 
>> 
>> Do you mean the case where old_page == new_page?
> 
> Yes,
> 
>> I think this won't
>> happen, because in uprobe_write_opcode() we only do orig_page for
>> !is_register case.
> 
> See above.
> 
> !is_register doesn't necessarily mean the original page was previously cow'ed.
> And even if it was cow'ed, MADV_DONTNEED can restore the original mapping.

I guess I know the case now. We can probably avoid this with an simp\x10le 
check for old_page == new_page?

Thanks,
Song


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 2/4] uprobe: use original page when all uprobes are removed
  2019-07-25 18:17         ` Song Liu
@ 2019-07-26  6:07           ` Song Liu
  2019-07-26  8:44           ` Oleg Nesterov
  1 sibling, 0 replies; 15+ messages in thread
From: Song Liu @ 2019-07-26  6:07 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: lkml, Linux-MM, Andrew Morton, matthew.wilcox, kirill.shutemov,
	peterz, rostedt, Kernel Team, william.kucharski


Hi Oleg, 

>> 
>> No. Yes, deny_write_access() protects mm->exe_file, but not the dynamic
>> libraries or other files which can be mmaped.
> 
> I see. Let me see how we can cover this. 
> 
>> 
>>>> and I am worried this code can try to lock the same page twice...
>>>> Say, the probed application does MADV_DONTNEED and then writes "int3"
>>>> into vma->vm_file at the same address to fool verify_opcode().
>>>> 
>>> 
>>> Do you mean the case where old_page == new_page?
>> 
>> Yes,
>> 
>>> I think this won't
>>> happen, because in uprobe_write_opcode() we only do orig_page for
>>> !is_register case.
>> 
>> See above.
>> 
>> !is_register doesn't necessarily mean the original page was previously cow'ed.
>> And even if it was cow'ed, MADV_DONTNEED can restore the original mapping.
> 
> I guess I know the case now. We can probably avoid this with an simp\x10le 
> check for old_page == new_page?

I decided to follow your suggestion of "unmap old_page; fault in orig_page". 
Please see v9 of the set. 

Thanks,
Song


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 2/4] uprobe: use original page when all uprobes are removed
  2019-07-25 18:17         ` Song Liu
  2019-07-26  6:07           ` Song Liu
@ 2019-07-26  8:44           ` Oleg Nesterov
  2019-07-26 21:19             ` Song Liu
  1 sibling, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2019-07-26  8:44 UTC (permalink / raw)
  To: Song Liu
  Cc: lkml, Linux-MM, Andrew Morton, matthew.wilcox, kirill.shutemov,
	peterz, rostedt, Kernel Team, william.kucharski

On 07/25, Song Liu wrote:
>
> I guess I know the case now. We can probably avoid this with an simp\x10le 
> check for old_page == new_page?

better yet, I think we can check PageAnon(old_page) and avoid the unnecessary
__replace_page() in this case. See the patch below.

Anyway, why __replace_page() needs to lock both pages? This doesn't look nice
even if it were correct. I think it can do lock_page(old_page) later.

Oleg.


--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -488,6 +488,10 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 		ref_ctr_updated = 1;
 	}
 
+	ret = 0;
+	if (!is_register && !PageAnon(old_page))
+		goto put_old;
+
 	ret = anon_vma_prepare(vma);
 	if (ret)
 		goto put_old;


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 2/4] uprobe: use original page when all uprobes are removed
  2019-07-26  8:44           ` Oleg Nesterov
@ 2019-07-26 21:19             ` Song Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Song Liu @ 2019-07-26 21:19 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: lkml, Linux-MM, Andrew Morton, matthew.wilcox, kirill.shutemov,
	peterz, rostedt, Kernel Team, william.kucharski



> On Jul 26, 2019, at 1:44 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> 
> On 07/25, Song Liu wrote:
>> 
>> I guess I know the case now. We can probably avoid this with an simp\x10le 
>> check for old_page == new_page?
> 
> better yet, I think we can check PageAnon(old_page) and avoid the unnecessary
> __replace_page() in this case. See the patch below.

I added PageAnon(old_page) check in v9 of the patch. 

> 
> Anyway, why __replace_page() needs to lock both pages? This doesn't look nice
> even if it were correct. I think it can do lock_page(old_page) later.
> 

Agreed. I have changed the v9 to only unmap old_page. So it should be cleaner. 

Thanks again for these good suggestions,
Song

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-07-26 21:20 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-24  8:35 [PATCH v8 0/4] THP aware uprobe Song Liu
2019-07-24  8:35 ` [PATCH v8 1/4] mm: move memcmp_pages() and pages_identical() Song Liu
2019-07-24  8:35 ` [PATCH v8 2/4] uprobe: use original page when all uprobes are removed Song Liu
2019-07-24  9:07   ` Oleg Nesterov
2019-07-24  9:17     ` Oleg Nesterov
2019-07-24  9:20       ` Song Liu
2019-07-24 11:37   ` Oleg Nesterov
2019-07-24 18:52     ` Song Liu
2019-07-25  8:14       ` Oleg Nesterov
2019-07-25 18:17         ` Song Liu
2019-07-26  6:07           ` Song Liu
2019-07-26  8:44           ` Oleg Nesterov
2019-07-26 21:19             ` Song Liu
2019-07-24  8:35 ` [PATCH v8 3/4] mm, thp: introduce FOLL_SPLIT_PMD Song Liu
2019-07-24  8:36 ` [PATCH v8 4/4] uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT Song Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.