All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] khugepaged: collapse pmd for pte-mapped THP
@ 2019-07-31 18:33 Song Liu
  2019-07-31 18:33 ` [PATCH v2 1/2] khugepaged: enable " Song Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Song Liu @ 2019-07-31 18:33 UTC (permalink / raw)
  To: linux-kernel, linux-mm, akpm
  Cc: matthew.wilcox, kirill.shutemov, oleg, kernel-team,
	william.kucharski, srikar, Song Liu

Changes v1 => v2:
1. Call collapse_pte_mapped_thp() directly from uprobe_write_opcode();
2. Add VM_BUG_ON() for addr alignment in khugepaged_add_pte_mapped_thp()
   and collapse_pte_mapped_thp().

This set is the newer version of 5/6 and 6/6 of [1]. Newer version of
1-4 of the work [2] was recently picked by Andrew.

Patch 1 enables khugepaged to handle pte-mapped THP. These THPs are left
in such state when khugepaged failed to get exclusive lock of mmap_sem.

Patch 2 leverages work in 1 for uprobe on THP. After [2], uprobe only
splits the PMD. When the uprobe is disabled, we get pte-mapped THP.
After this set, these pte-mapped THP will be collapsed as pmd-mapped.

[1] https://lkml.org/lkml/2019/6/23/23
[2] https://www.spinics.net/lists/linux-mm/msg185889.html

Song Liu (2):
  khugepaged: enable collapse pmd for pte-mapped THP
  uprobe: collapse THP pmd after removing all uprobes

 include/linux/khugepaged.h |  12 ++++
 kernel/events/uprobes.c    |   9 +++
 mm/khugepaged.c            | 138 +++++++++++++++++++++++++++++++++++++
 3 files changed, 159 insertions(+)

--
2.17.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/2] khugepaged: enable collapse pmd for pte-mapped THP
  2019-07-31 18:33 [PATCH v2 0/2] khugepaged: collapse pmd for pte-mapped THP Song Liu
@ 2019-07-31 18:33 ` Song Liu
  2019-08-01 12:43   ` Oleg Nesterov
  2019-08-01 14:50   ` Oleg Nesterov
  2019-07-31 18:33 ` [PATCH v2 2/2] uprobe: collapse THP pmd after removing all uprobes Song Liu
  2019-08-01 11:19 ` [PATCH v2 0/2] khugepaged: collapse pmd for pte-mapped THP Kirill A. Shutemov
  2 siblings, 2 replies; 10+ messages in thread
From: Song Liu @ 2019-07-31 18:33 UTC (permalink / raw)
  To: linux-kernel, linux-mm, akpm
  Cc: matthew.wilcox, kirill.shutemov, oleg, kernel-team,
	william.kucharski, srikar, Song Liu

khugepaged needs exclusive mmap_sem to access page table. When it fails
to lock mmap_sem, the page will fault in as pte-mapped THP. As the page
is already a THP, khugepaged will not handle this pmd again.

This patch enables the khugepaged to retry collapse the page table.

struct mm_slot (in khugepaged.c) is extended with an array, containing
addresses of pte-mapped THPs. We use array here for simplicity. We can
easily replace it with more advanced data structures when needed. This
array is protected by khugepaged_mm_lock.

In khugepaged_scan_mm_slot(), if the mm contains pte-mapped THP, we try
to collapse the page table.

Since collapse may happen at an later time, some pages may already fault
in. collapse_pte_mapped_thp() is added to properly handle these pages.
collapse_pte_mapped_thp() also double checks whether all ptes in this pmd
are mapping to the same THP. This is necessary because some subpage of
the THP may be replaced, for example by uprobe. In such cases, it is not
possible to collapse the pmd.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 include/linux/khugepaged.h |  12 ++++
 mm/khugepaged.c            | 140 +++++++++++++++++++++++++++++++++++++
 2 files changed, 152 insertions(+)

diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
index 082d1d2a5216..a75693b95071 100644
--- a/include/linux/khugepaged.h
+++ b/include/linux/khugepaged.h
@@ -15,6 +15,14 @@ extern int __khugepaged_enter(struct mm_struct *mm);
 extern void __khugepaged_exit(struct mm_struct *mm);
 extern int khugepaged_enter_vma_merge(struct vm_area_struct *vma,
 				      unsigned long vm_flags);
+#ifdef CONFIG_SHMEM
+extern void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long haddr);
+#else
+static inline void collapse_pte_mapped_thp(struct mm_struct *mm,
+					   unsigned long haddr)
+{
+}
+#endif
 
 #define khugepaged_enabled()					       \
 	(transparent_hugepage_flags &				       \
@@ -73,6 +81,10 @@ static inline int khugepaged_enter_vma_merge(struct vm_area_struct *vma,
 {
 	return 0;
 }
+static inline void collapse_pte_mapped_thp(struct mm_struct *mm,
+					   unsigned long haddr)
+{
+}
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #endif /* _LINUX_KHUGEPAGED_H */
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index eaaa21b23215..2b580e264cc7 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -76,6 +76,8 @@ static __read_mostly DEFINE_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS);
 
 static struct kmem_cache *mm_slot_cache __read_mostly;
 
+#define MAX_PTE_MAPPED_THP 8
+
 /**
  * struct mm_slot - hash lookup from mm to mm_slot
  * @hash: hash collision list
@@ -86,6 +88,10 @@ struct mm_slot {
 	struct hlist_node hash;
 	struct list_head mm_node;
 	struct mm_struct *mm;
+
+	/* pte-mapped THP in this mm */
+	int nr_pte_mapped_thp;
+	unsigned long pte_mapped_thp[MAX_PTE_MAPPED_THP];
 };
 
 /**
@@ -1248,6 +1254,135 @@ static void collect_mm_slot(struct mm_slot *mm_slot)
 }
 
 #if defined(CONFIG_SHMEM) && defined(CONFIG_TRANSPARENT_HUGE_PAGECACHE)
+/*
+ * Notify khugepaged that given addr of the mm is pte-mapped THP. Then
+ * khugepaged should try to collapse the page table.
+ */
+static int khugepaged_add_pte_mapped_thp(struct mm_struct *mm,
+					 unsigned long addr)
+{
+	struct mm_slot *mm_slot;
+	int ret = 0;
+
+	/* hold mmap_sem for khugepaged_test_exit() */
+	VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_sem), mm);
+	VM_BUG_ON(addr & ~HPAGE_PMD_MASK);
+
+	if (unlikely(khugepaged_test_exit(mm)))
+		return 0;
+
+	if (!test_bit(MMF_VM_HUGEPAGE, &mm->flags) &&
+	    !test_bit(MMF_DISABLE_THP, &mm->flags)) {
+		ret = __khugepaged_enter(mm);
+		if (ret)
+			return ret;
+	}
+
+	spin_lock(&khugepaged_mm_lock);
+	mm_slot = get_mm_slot(mm);
+	if (likely(mm_slot && mm_slot->nr_pte_mapped_thp < MAX_PTE_MAPPED_THP))
+		mm_slot->pte_mapped_thp[mm_slot->nr_pte_mapped_thp++] = addr;
+	spin_unlock(&khugepaged_mm_lock);
+	return 0;
+}
+
+/**
+ * Try to collapse a pte-mapped THP for mm at address haddr.
+ *
+ * This function checks whether all the PTEs in the PMD are pointing to the
+ * right THP. If so, retract the page table so the THP can refault in with
+ * as pmd-mapped.
+ */
+void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long haddr)
+{
+	struct vm_area_struct *vma = find_vma(mm, haddr);
+	pmd_t *pmd = mm_find_pmd(mm, haddr);
+	struct page *hpage = NULL;
+	unsigned long addr;
+	spinlock_t *ptl;
+	int count = 0;
+	pmd_t _pmd;
+	int i;
+
+	VM_BUG_ON(haddr & ~HPAGE_PMD_MASK);
+
+	if (!vma || !pmd || pmd_trans_huge(*pmd))
+		return;
+
+	/* step 1: check all mapped PTEs are to the right huge page */
+	for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) {
+		pte_t *pte = pte_offset_map(pmd, addr);
+		struct page *page;
+
+		if (pte_none(*pte))
+			continue;
+
+		page = vm_normal_page(vma, addr, *pte);
+
+		if (!PageCompound(page))
+			return;
+
+		if (!hpage) {
+			hpage = compound_head(page);
+			if (hpage->mapping != vma->vm_file->f_mapping)
+				return;
+		}
+
+		if (hpage + i != page)
+			return;
+		count++;
+	}
+
+	/* step 2: adjust rmap */
+	for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) {
+		pte_t *pte = pte_offset_map(pmd, addr);
+		struct page *page;
+
+		if (pte_none(*pte))
+			continue;
+		page = vm_normal_page(vma, addr, *pte);
+		page_remove_rmap(page, false);
+	}
+
+	/* step 3: set proper refcount and mm_counters. */
+	if (hpage) {
+		page_ref_sub(hpage, count);
+		add_mm_counter(vma->vm_mm, mm_counter_file(hpage), -count);
+	}
+
+	/* step 4: collapse pmd */
+	ptl = pmd_lock(vma->vm_mm, pmd);
+	_pmd = pmdp_collapse_flush(vma, addr, pmd);
+	spin_unlock(ptl);
+	mm_dec_nr_ptes(mm);
+	pte_free(mm, pmd_pgtable(_pmd));
+}
+
+static int khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot)
+{
+	struct mm_struct *mm = mm_slot->mm;
+	int i;
+
+	lockdep_assert_held(&khugepaged_mm_lock);
+
+	if (likely(mm_slot->nr_pte_mapped_thp == 0))
+		return 0;
+
+	if (!down_write_trylock(&mm->mmap_sem))
+		return -EBUSY;
+
+	if (unlikely(khugepaged_test_exit(mm)))
+		goto out;
+
+	for (i = 0; i < mm_slot->nr_pte_mapped_thp; i++)
+		collapse_pte_mapped_thp(mm, mm_slot->pte_mapped_thp[i]);
+
+out:
+	mm_slot->nr_pte_mapped_thp = 0;
+	up_write(&mm->mmap_sem);
+	return 0;
+}
+
 static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
 {
 	struct vm_area_struct *vma;
@@ -1281,6 +1416,10 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
 			up_write(&vma->vm_mm->mmap_sem);
 			mm_dec_nr_ptes(vma->vm_mm);
 			pte_free(vma->vm_mm, pmd_pgtable(_pmd));
+		} else if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
+			/* need down_read for khugepaged_test_exit() */
+			khugepaged_add_pte_mapped_thp(vma->vm_mm, addr);
+			up_read(&vma->vm_mm->mmap_sem);
 		}
 	}
 	i_mmap_unlock_write(mapping);
@@ -1667,6 +1806,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
 		khugepaged_scan.address = 0;
 		khugepaged_scan.mm_slot = mm_slot;
 	}
+	khugepaged_collapse_pte_mapped_thps(mm_slot);
 	spin_unlock(&khugepaged_mm_lock);
 
 	mm = mm_slot->mm;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/2] uprobe: collapse THP pmd after removing all uprobes
  2019-07-31 18:33 [PATCH v2 0/2] khugepaged: collapse pmd for pte-mapped THP Song Liu
  2019-07-31 18:33 ` [PATCH v2 1/2] khugepaged: enable " Song Liu
@ 2019-07-31 18:33 ` Song Liu
  2019-08-01 11:19 ` [PATCH v2 0/2] khugepaged: collapse pmd for pte-mapped THP Kirill A. Shutemov
  2 siblings, 0 replies; 10+ messages in thread
From: Song Liu @ 2019-07-31 18:33 UTC (permalink / raw)
  To: linux-kernel, linux-mm, akpm
  Cc: matthew.wilcox, kirill.shutemov, oleg, kernel-team,
	william.kucharski, srikar, Song Liu

After all uprobes are removed from the huge page (with PTE pgtable), it
is possible to collapse the pmd and benefit from THP again. This patch
does the collapse by calling collapse_pte_mapped_thp().

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 kernel/events/uprobes.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 27b596f14463..e5c30941ea04 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -26,6 +26,7 @@
 #include <linux/percpu-rwsem.h>
 #include <linux/task_work.h>
 #include <linux/shmem_fs.h>
+#include <linux/khugepaged.h>
 
 #include <linux/uprobes.h>
 
@@ -472,6 +473,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	struct page *old_page, *new_page;
 	struct vm_area_struct *vma;
 	int ret, is_register, ref_ctr_updated = 0;
+	bool orig_page_huge = false;
 
 	is_register = is_swbp_insn(&opcode);
 	uprobe = container_of(auprobe, struct uprobe, arch);
@@ -529,6 +531,9 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 				/* let go new_page */
 				put_page(new_page);
 				new_page = NULL;
+
+				if (PageCompound(orig_page))
+					orig_page_huge = true;
 			}
 			put_page(orig_page);
 		}
@@ -547,6 +552,10 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	if (ret && is_register && ref_ctr_updated)
 		update_ref_ctr(uprobe, mm, -1);
 
+	/* try collapse pmd for compound page */
+	if (!ret && orig_page_huge)
+		collapse_pte_mapped_thp(mm, vaddr & HPAGE_PMD_MASK);
+
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/2] khugepaged: collapse pmd for pte-mapped THP
  2019-07-31 18:33 [PATCH v2 0/2] khugepaged: collapse pmd for pte-mapped THP Song Liu
  2019-07-31 18:33 ` [PATCH v2 1/2] khugepaged: enable " Song Liu
  2019-07-31 18:33 ` [PATCH v2 2/2] uprobe: collapse THP pmd after removing all uprobes Song Liu
@ 2019-08-01 11:19 ` Kirill A. Shutemov
  2 siblings, 0 replies; 10+ messages in thread
From: Kirill A. Shutemov @ 2019-08-01 11:19 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, linux-mm, akpm, matthew.wilcox, kirill.shutemov,
	oleg, kernel-team, william.kucharski, srikar

On Wed, Jul 31, 2019 at 11:33:29AM -0700, Song Liu wrote:
> Changes v1 => v2:
> 1. Call collapse_pte_mapped_thp() directly from uprobe_write_opcode();
> 2. Add VM_BUG_ON() for addr alignment in khugepaged_add_pte_mapped_thp()
>    and collapse_pte_mapped_thp().
> 
> This set is the newer version of 5/6 and 6/6 of [1]. Newer version of
> 1-4 of the work [2] was recently picked by Andrew.
> 
> Patch 1 enables khugepaged to handle pte-mapped THP. These THPs are left
> in such state when khugepaged failed to get exclusive lock of mmap_sem.
> 
> Patch 2 leverages work in 1 for uprobe on THP. After [2], uprobe only
> splits the PMD. When the uprobe is disabled, we get pte-mapped THP.
> After this set, these pte-mapped THP will be collapsed as pmd-mapped.
> 
> [1] https://lkml.org/lkml/2019/6/23/23
> [2] https://www.spinics.net/lists/linux-mm/msg185889.html
> 
> Song Liu (2):
>   khugepaged: enable collapse pmd for pte-mapped THP
>   uprobe: collapse THP pmd after removing all uprobes

Looks good for the start.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] khugepaged: enable collapse pmd for pte-mapped THP
  2019-07-31 18:33 ` [PATCH v2 1/2] khugepaged: enable " Song Liu
@ 2019-08-01 12:43   ` Oleg Nesterov
  2019-08-01 17:11     ` Song Liu
  2019-08-01 14:50   ` Oleg Nesterov
  1 sibling, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2019-08-01 12:43 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, linux-mm, akpm, matthew.wilcox, kirill.shutemov,
	kernel-team, william.kucharski, srikar

On 07/31, Song Liu wrote:
>
> +void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long haddr)
> +{
> +	struct vm_area_struct *vma = find_vma(mm, haddr);
> +	pmd_t *pmd = mm_find_pmd(mm, haddr);
> +	struct page *hpage = NULL;
> +	unsigned long addr;
> +	spinlock_t *ptl;
> +	int count = 0;
> +	pmd_t _pmd;
> +	int i;
> +
> +	VM_BUG_ON(haddr & ~HPAGE_PMD_MASK);
> +
> +	if (!vma || !pmd || pmd_trans_huge(*pmd))
                            ^^^^^^^^^^^^^^^^^^^^

mm_find_pmd() returns NULL if pmd_trans_huge()

> +	/* step 1: check all mapped PTEs are to the right huge page */
> +	for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) {
> +		pte_t *pte = pte_offset_map(pmd, addr);
> +		struct page *page;
> +
> +		if (pte_none(*pte))
> +			continue;
> +
> +		page = vm_normal_page(vma, addr, *pte);
> +
> +		if (!PageCompound(page))
> +			return;
> +
> +		if (!hpage) {
> +			hpage = compound_head(page);
> +			if (hpage->mapping != vma->vm_file->f_mapping)

Hmm. But how can we know this is still the same vma ?

If nothing else, why vma->vm_file can't be NULL?

Say, a process unmaps this memory after khugepaged_add_pte_mapped_thp()
was called, then it does mmap(haddr, MAP_PRIVATE|MAP_ANONYMOUS), then
do_huge_pmd_anonymous_page() installs a huge page at the same address,
then split_huge_pmd() is called by any reason.

No?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] khugepaged: enable collapse pmd for pte-mapped THP
  2019-07-31 18:33 ` [PATCH v2 1/2] khugepaged: enable " Song Liu
  2019-08-01 12:43   ` Oleg Nesterov
@ 2019-08-01 14:50   ` Oleg Nesterov
  2019-08-01 17:37     ` Song Liu
  1 sibling, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2019-08-01 14:50 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, linux-mm, akpm, matthew.wilcox, kirill.shutemov,
	kernel-team, william.kucharski, srikar

On 07/31, Song Liu wrote:
>
> +static int khugepaged_add_pte_mapped_thp(struct mm_struct *mm,
> +					 unsigned long addr)
> +{
> +	struct mm_slot *mm_slot;
> +	int ret = 0;
> +
> +	/* hold mmap_sem for khugepaged_test_exit() */
> +	VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_sem), mm);
> +	VM_BUG_ON(addr & ~HPAGE_PMD_MASK);
> +
> +	if (unlikely(khugepaged_test_exit(mm)))
> +		return 0;
> +
> +	if (!test_bit(MMF_VM_HUGEPAGE, &mm->flags) &&
> +	    !test_bit(MMF_DISABLE_THP, &mm->flags)) {
> +		ret = __khugepaged_enter(mm);
> +		if (ret)
> +			return ret;
> +	}

could you explain why do we need mm->mmap_sem, khugepaged_test_exit() check
and __khugepaged_enter() ?


> +	spin_lock(&khugepaged_mm_lock);
> +	mm_slot = get_mm_slot(mm);
> +	if (likely(mm_slot && mm_slot->nr_pte_mapped_thp < MAX_PTE_MAPPED_THP))
> +		mm_slot->pte_mapped_thp[mm_slot->nr_pte_mapped_thp++] = addr;

if get_mm_slot() returns mm_slot != NULL we can safely modify ->pte_mapped_thp.
We do not care even if this task has already passed __mmput/__khugepaged_exit,
this slot can't go away.

No?

Oleg.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] khugepaged: enable collapse pmd for pte-mapped THP
  2019-08-01 12:43   ` Oleg Nesterov
@ 2019-08-01 17:11     ` Song Liu
  0 siblings, 0 replies; 10+ messages in thread
From: Song Liu @ 2019-08-01 17:11 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: lkml, Linux-MM, Andrew Morton, Matthew Wilcox,
	Kirill A. Shutemov, Kernel Team, William Kucharski, srikar



> On Aug 1, 2019, at 5:43 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> 
> On 07/31, Song Liu wrote:
>> 
>> +void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long haddr)
>> +{
>> +	struct vm_area_struct *vma = find_vma(mm, haddr);
>> +	pmd_t *pmd = mm_find_pmd(mm, haddr);
>> +	struct page *hpage = NULL;
>> +	unsigned long addr;
>> +	spinlock_t *ptl;
>> +	int count = 0;
>> +	pmd_t _pmd;
>> +	int i;
>> +
>> +	VM_BUG_ON(haddr & ~HPAGE_PMD_MASK);
>> +
>> +	if (!vma || !pmd || pmd_trans_huge(*pmd))
>                            ^^^^^^^^^^^^^^^^^^^^
> 
> mm_find_pmd() returns NULL if pmd_trans_huge()

Good catch! I will simplify this one in v3. 

> 
>> +	/* step 1: check all mapped PTEs are to the right huge page */
>> +	for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) {
>> +		pte_t *pte = pte_offset_map(pmd, addr);
>> +		struct page *page;
>> +
>> +		if (pte_none(*pte))
>> +			continue;
>> +
>> +		page = vm_normal_page(vma, addr, *pte);
>> +
>> +		if (!PageCompound(page))
>> +			return;
>> +
>> +		if (!hpage) {
>> +			hpage = compound_head(page);
>> +			if (hpage->mapping != vma->vm_file->f_mapping)
> 
> Hmm. But how can we know this is still the same vma ?
> 
> If nothing else, why vma->vm_file can't be NULL?

Good point. We should confirm vma->vm_file is not NULL. 

Thanks,
Song


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] khugepaged: enable collapse pmd for pte-mapped THP
  2019-08-01 14:50   ` Oleg Nesterov
@ 2019-08-01 17:37     ` Song Liu
  2019-08-02 10:31       ` Oleg Nesterov
  0 siblings, 1 reply; 10+ messages in thread
From: Song Liu @ 2019-08-01 17:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: lkml, linux-mm, akpm, matthew.wilcox, kirill.shutemov,
	Kernel Team, william.kucharski, srikar



> On Aug 1, 2019, at 7:50 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> 
> On 07/31, Song Liu wrote:
>> 
>> +static int khugepaged_add_pte_mapped_thp(struct mm_struct *mm,
>> +					 unsigned long addr)
>> +{
>> +	struct mm_slot *mm_slot;
>> +	int ret = 0;
>> +
>> +	/* hold mmap_sem for khugepaged_test_exit() */
>> +	VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_sem), mm);
>> +	VM_BUG_ON(addr & ~HPAGE_PMD_MASK);
>> +
>> +	if (unlikely(khugepaged_test_exit(mm)))
>> +		return 0;
>> +
>> +	if (!test_bit(MMF_VM_HUGEPAGE, &mm->flags) &&
>> +	    !test_bit(MMF_DISABLE_THP, &mm->flags)) {
>> +		ret = __khugepaged_enter(mm);
>> +		if (ret)
>> +			return ret;
>> +	}
> 
> could you explain why do we need mm->mmap_sem, khugepaged_test_exit() check
> and __khugepaged_enter() ?

If the mm doesn't have a mm_slot, we would like to create one here (by 
calling __khugepaged_enter()). 

This happens when the THP is created by another mm, or by tmpfs with 
"huge=always"; and then page table of this mm got split by split_huge_pmd(). 
With current kernel, this happens when we attach/detach uprobe to a file 
in tmpfs with huge=always. 

Does this answer your question?

Thanks,
Song

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] khugepaged: enable collapse pmd for pte-mapped THP
  2019-08-01 17:37     ` Song Liu
@ 2019-08-02 10:31       ` Oleg Nesterov
  2019-08-02 20:59         ` Song Liu
  0 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2019-08-02 10:31 UTC (permalink / raw)
  To: Song Liu
  Cc: lkml, linux-mm, akpm, matthew.wilcox, kirill.shutemov,
	Kernel Team, william.kucharski, srikar

On 08/01, Song Liu wrote:
>
>
> > On Aug 1, 2019, at 7:50 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > On 07/31, Song Liu wrote:
> >>
> >> +static int khugepaged_add_pte_mapped_thp(struct mm_struct *mm,
> >> +					 unsigned long addr)
> >> +{
> >> +	struct mm_slot *mm_slot;
> >> +	int ret = 0;
> >> +
> >> +	/* hold mmap_sem for khugepaged_test_exit() */
> >> +	VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_sem), mm);
> >> +	VM_BUG_ON(addr & ~HPAGE_PMD_MASK);
> >> +
> >> +	if (unlikely(khugepaged_test_exit(mm)))
> >> +		return 0;
> >> +
> >> +	if (!test_bit(MMF_VM_HUGEPAGE, &mm->flags) &&
> >> +	    !test_bit(MMF_DISABLE_THP, &mm->flags)) {
> >> +		ret = __khugepaged_enter(mm);
> >> +		if (ret)
> >> +			return ret;
> >> +	}
> >
> > could you explain why do we need mm->mmap_sem, khugepaged_test_exit() check
> > and __khugepaged_enter() ?
>
> If the mm doesn't have a mm_slot, we would like to create one here (by
> calling __khugepaged_enter()).

I can be easily wrong, I never read this code before, but this doesn't
look correct.

Firstly, mm->mmap_sem cam ONLY help if a) the task already has mm_slot
and b) this mm_slot is khugepaged_scan.mm_slot. Otherwise khugepaged_exit()
won't take mmap_sem for writing and thus we can't rely on test_exit().

and this means that down_read(mmap_sem) before khugepaged_add_pte_mapped_thp()
is pointless and can't help; this mm was found by vma_interval_tree_foreach().

so __khugepaged_enter() can race with khugepaged_exit() and this is wrong
in any case.

> This happens when the THP is created by another mm, or by tmpfs with
> "huge=always"; and then page table of this mm got split by split_huge_pmd().
> With current kernel, this happens when we attach/detach uprobe to a file
> in tmpfs with huge=always.

Well. In this particular case khugepaged_enter() was likely already called
by shmem_mmap() or khugepaged_enter_vma_merge(), or madvise.

(in fact I think do_set_pmd() or shmem_fault() should call _enter() too,
 like do_huge_pmd_anonymous_page() does, but this is another story).


And I forgot to mention... I don't understand why
khugepaged_collapse_pte_mapped_thps() has to be called with khugepaged_mm_lock.

Oleg.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] khugepaged: enable collapse pmd for pte-mapped THP
  2019-08-02 10:31       ` Oleg Nesterov
@ 2019-08-02 20:59         ` Song Liu
  0 siblings, 0 replies; 10+ messages in thread
From: Song Liu @ 2019-08-02 20:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: lkml, linux-mm, akpm, matthew.wilcox, kirill.shutemov,
	Kernel Team, william.kucharski, srikar



> On Aug 2, 2019, at 3:31 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> 
> On 08/01, Song Liu wrote:
>> 
>> 
>>> On Aug 1, 2019, at 7:50 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>> 
>>> On 07/31, Song Liu wrote:
>>>> 
>>>> +static int khugepaged_add_pte_mapped_thp(struct mm_struct *mm,
>>>> +					 unsigned long addr)
>>>> +{
>>>> +	struct mm_slot *mm_slot;
>>>> +	int ret = 0;
>>>> +
>>>> +	/* hold mmap_sem for khugepaged_test_exit() */
>>>> +	VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_sem), mm);
>>>> +	VM_BUG_ON(addr & ~HPAGE_PMD_MASK);
>>>> +
>>>> +	if (unlikely(khugepaged_test_exit(mm)))
>>>> +		return 0;
>>>> +
>>>> +	if (!test_bit(MMF_VM_HUGEPAGE, &mm->flags) &&
>>>> +	    !test_bit(MMF_DISABLE_THP, &mm->flags)) {
>>>> +		ret = __khugepaged_enter(mm);
>>>> +		if (ret)
>>>> +			return ret;
>>>> +	}
>>> 
>>> could you explain why do we need mm->mmap_sem, khugepaged_test_exit() check
>>> and __khugepaged_enter() ?
>> 
>> If the mm doesn't have a mm_slot, we would like to create one here (by
>> calling __khugepaged_enter()).
> 
> I can be easily wrong, I never read this code before, but this doesn't
> look correct.
> 
> Firstly, mm->mmap_sem cam ONLY help if a) the task already has mm_slot
> and b) this mm_slot is khugepaged_scan.mm_slot. Otherwise khugepaged_exit()
> won't take mmap_sem for writing and thus we can't rely on test_exit().
> 
> and this means that down_read(mmap_sem) before khugepaged_add_pte_mapped_thp()
> is pointless and can't help; this mm was found by vma_interval_tree_foreach().
> 
> so __khugepaged_enter() can race with khugepaged_exit() and this is wrong
> in any case.
> 
>> This happens when the THP is created by another mm, or by tmpfs with
>> "huge=always"; and then page table of this mm got split by split_huge_pmd().
>> With current kernel, this happens when we attach/detach uprobe to a file
>> in tmpfs with huge=always.
> 
> Well. In this particular case khugepaged_enter() was likely already called
> by shmem_mmap() or khugepaged_enter_vma_merge(), or madvise.
> 
> (in fact I think do_set_pmd() or shmem_fault() should call _enter() too,
> like do_huge_pmd_anonymous_page() does, but this is another story).

Hmm.. I didn't notice the one in shmem_mmap(). And yes, you are right, we 
don't really need __khugepaged_enter() in khugepaged_add_pte_mapped_thp(),
especially when uprobes.c doesn't call it. 

This should simplify this patch. 

> 
> 
> And I forgot to mention... I don't understand why
> khugepaged_collapse_pte_mapped_thps() has to be called with khugepaged_mm_lock.

You are right that khugepaged_collapse_pte_mapped_thps() doesn't need 
khugepaged_mm_lock in this version. For v1, when uprobes.c calls 
khugepaged_add_pte_mapped_thp(), this is necessary. 

Let me clean this up. 

Thanks,
Song


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-08-02 20:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-31 18:33 [PATCH v2 0/2] khugepaged: collapse pmd for pte-mapped THP Song Liu
2019-07-31 18:33 ` [PATCH v2 1/2] khugepaged: enable " Song Liu
2019-08-01 12:43   ` Oleg Nesterov
2019-08-01 17:11     ` Song Liu
2019-08-01 14:50   ` Oleg Nesterov
2019-08-01 17:37     ` Song Liu
2019-08-02 10:31       ` Oleg Nesterov
2019-08-02 20:59         ` Song Liu
2019-07-31 18:33 ` [PATCH v2 2/2] uprobe: collapse THP pmd after removing all uprobes Song Liu
2019-08-01 11:19 ` [PATCH v2 0/2] khugepaged: collapse pmd for pte-mapped THP Kirill A. Shutemov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.