linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Fixes for THP in page cache
@ 2019-10-17 16:42 Song Liu
  2019-10-17 16:42 ` [PATCH v2 1/5] proc/meminfo: fix output alignment Song Liu
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Song Liu @ 2019-10-17 16:42 UTC (permalink / raw)
  To: linux-kernel, linux-mm, akpm
  Cc: matthew.wilcox, kernel-team, william.kucharski, kirill.shutemov,
	Song Liu

This set includes a few fixes for THP in page cache. They are based on
Linus's master branch.

Thanks,
Song

Changes v1 -> v2:
1. Return -EINVAL if the WARN() triggers. (Oleg Nesterov)
2. Include William Kucharski's fix to vmscan.c, which replaces half of
   original (3/4).

Kirill A. Shutemov (3):
  proc/meminfo: fix output alignment
  mm/thp: fix node page state in split_huge_page_to_list()
  mm/thp: allow drop THP from page cache

Song Liu (1):
  uprobe: only do FOLL_SPLIT_PMD for uprobe register

William Kucharski (1):
  mm: Support removing arbitrary sized pages from mapping

 fs/proc/meminfo.c       |  4 ++--
 kernel/events/uprobes.c | 13 +++++++++++--
 mm/huge_memory.c        |  9 +++++++--
 mm/truncate.c           | 12 ++++++++++++
 mm/vmscan.c             |  5 +----
 5 files changed, 33 insertions(+), 10 deletions(-)

--
2.17.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/5] proc/meminfo: fix output alignment
  2019-10-17 16:42 [PATCH v2 0/5] Fixes for THP in page cache Song Liu
@ 2019-10-17 16:42 ` Song Liu
  2019-10-17 17:37   ` Yang Shi
  2019-10-17 16:42 ` [PATCH v2 2/5] mm/thp: fix node page state in split_huge_page_to_list() Song Liu
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Song Liu @ 2019-10-17 16:42 UTC (permalink / raw)
  To: linux-kernel, linux-mm, akpm
  Cc: matthew.wilcox, kernel-team, william.kucharski, kirill.shutemov,
	Song Liu

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Add extra space for FileHugePages and FilePmdMapped, so the output is
aligned with other rows.

Fixes: 60fbf0ab5da1 ("mm,thp: stats for file backed THP")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 fs/proc/meminfo.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index ac9247371871..8c1f1bb1a5ce 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -132,9 +132,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 		    global_node_page_state(NR_SHMEM_THPS) * HPAGE_PMD_NR);
 	show_val_kb(m, "ShmemPmdMapped: ",
 		    global_node_page_state(NR_SHMEM_PMDMAPPED) * HPAGE_PMD_NR);
-	show_val_kb(m, "FileHugePages: ",
+	show_val_kb(m, "FileHugePages:  ",
 		    global_node_page_state(NR_FILE_THPS) * HPAGE_PMD_NR);
-	show_val_kb(m, "FilePmdMapped: ",
+	show_val_kb(m, "FilePmdMapped:  ",
 		    global_node_page_state(NR_FILE_PMDMAPPED) * HPAGE_PMD_NR);
 #endif
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/5] mm/thp: fix node page state in split_huge_page_to_list()
  2019-10-17 16:42 [PATCH v2 0/5] Fixes for THP in page cache Song Liu
  2019-10-17 16:42 ` [PATCH v2 1/5] proc/meminfo: fix output alignment Song Liu
@ 2019-10-17 16:42 ` Song Liu
  2019-10-17 17:38   ` Yang Shi
  2019-10-17 16:42 ` [PATCH v2 3/5] mm: Support removing arbitrary sized pages from mapping Song Liu
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Song Liu @ 2019-10-17 16:42 UTC (permalink / raw)
  To: linux-kernel, linux-mm, akpm
  Cc: matthew.wilcox, kernel-team, william.kucharski, kirill.shutemov,
	Song Liu

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Make sure split_huge_page_to_list() handle the state of shmem THP and
file THP properly.

Fixes: 60fbf0ab5da1 ("mm,thp: stats for file backed THP")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 mm/huge_memory.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c5cb6dcd6c69..13cc93785006 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2789,8 +2789,13 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 			ds_queue->split_queue_len--;
 			list_del(page_deferred_list(head));
 		}
-		if (mapping)
-			__dec_node_page_state(page, NR_SHMEM_THPS);
+		if (mapping) {
+			if (PageSwapBacked(page))
+				__dec_node_page_state(page, NR_SHMEM_THPS);
+			else
+				__dec_node_page_state(page, NR_FILE_THPS);
+		}
+
 		spin_unlock(&ds_queue->split_queue_lock);
 		__split_huge_page(page, list, end, flags);
 		if (PageSwapCache(head)) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/5] mm: Support removing arbitrary sized pages from mapping
  2019-10-17 16:42 [PATCH v2 0/5] Fixes for THP in page cache Song Liu
  2019-10-17 16:42 ` [PATCH v2 1/5] proc/meminfo: fix output alignment Song Liu
  2019-10-17 16:42 ` [PATCH v2 2/5] mm/thp: fix node page state in split_huge_page_to_list() Song Liu
@ 2019-10-17 16:42 ` Song Liu
  2019-10-17 17:43   ` Yang Shi
  2019-10-17 16:42 ` [PATCH v2 4/5] mm/thp: allow drop THP from page cache Song Liu
  2019-10-17 16:42 ` [PATCH v2 5/5] uprobe: only do FOLL_SPLIT_PMD for uprobe register Song Liu
  4 siblings, 1 reply; 13+ messages in thread
From: Song Liu @ 2019-10-17 16:42 UTC (permalink / raw)
  To: linux-kernel, linux-mm, akpm
  Cc: matthew.wilcox, kernel-team, william.kucharski, kirill.shutemov,
	Matthew Wilcox, Song Liu

From: William Kucharski <william.kucharski@oracle.com>

__remove_mapping() assumes that pages can only be either base pages
or HPAGE_PMD_SIZE.  Ask the page what size it is.

Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 mm/vmscan.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index c6659bb758a4..f870da1f4bb7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -932,10 +932,7 @@ static int __remove_mapping(struct address_space *mapping, struct page *page,
 	 * Note that if SetPageDirty is always performed via set_page_dirty,
 	 * and thus under the i_pages lock, then this ordering is not required.
 	 */
-	if (unlikely(PageTransHuge(page)) && PageSwapCache(page))
-		refcount = 1 + HPAGE_PMD_NR;
-	else
-		refcount = 2;
+	refcount = 1 + compound_nr(page);
 	if (!page_ref_freeze(page, refcount))
 		goto cannot_free;
 	/* note: atomic_cmpxchg in page_ref_freeze provides the smp_rmb */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 4/5] mm/thp: allow drop THP from page cache
  2019-10-17 16:42 [PATCH v2 0/5] Fixes for THP in page cache Song Liu
                   ` (2 preceding siblings ...)
  2019-10-17 16:42 ` [PATCH v2 3/5] mm: Support removing arbitrary sized pages from mapping Song Liu
@ 2019-10-17 16:42 ` Song Liu
  2019-10-17 21:46   ` Yang Shi
  2019-10-17 16:42 ` [PATCH v2 5/5] uprobe: only do FOLL_SPLIT_PMD for uprobe register Song Liu
  4 siblings, 1 reply; 13+ messages in thread
From: Song Liu @ 2019-10-17 16:42 UTC (permalink / raw)
  To: linux-kernel, linux-mm, akpm
  Cc: matthew.wilcox, kernel-team, william.kucharski, kirill.shutemov,
	Song Liu

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Once a THP is added to the page cache, it cannot be dropped via
/proc/sys/vm/drop_caches. Fix this issue with proper handling in
invalidate_mapping_pages().

Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 mm/truncate.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/mm/truncate.c b/mm/truncate.c
index 8563339041f6..dd9ebc1da356 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -592,6 +592,16 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
 					unlock_page(page);
 					continue;
 				}
+
+				/* Take a pin outside pagevec */
+				get_page(page);
+
+				/*
+				 * Drop extra pins before trying to invalidate
+				 * the huge page.
+				 */
+				pagevec_remove_exceptionals(&pvec);
+				pagevec_release(&pvec);
 			}
 
 			ret = invalidate_inode_page(page);
@@ -602,6 +612,8 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
 			 */
 			if (!ret)
 				deactivate_file_page(page);
+			if (PageTransHuge(page))
+				put_page(page);
 			count += ret;
 		}
 		pagevec_remove_exceptionals(&pvec);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 5/5] uprobe: only do FOLL_SPLIT_PMD for uprobe register
  2019-10-17 16:42 [PATCH v2 0/5] Fixes for THP in page cache Song Liu
                   ` (3 preceding siblings ...)
  2019-10-17 16:42 ` [PATCH v2 4/5] mm/thp: allow drop THP from page cache Song Liu
@ 2019-10-17 16:42 ` Song Liu
  2019-10-18 12:52   ` Srikar Dronamraju
  4 siblings, 1 reply; 13+ messages in thread
From: Song Liu @ 2019-10-17 16:42 UTC (permalink / raw)
  To: linux-kernel, linux-mm, akpm
  Cc: matthew.wilcox, kernel-team, william.kucharski, kirill.shutemov,
	Song Liu, Srikar Dronamraju, Oleg Nesterov

Attaching uprobe to text section in THP splits the PMD mapped page table
into PTE mapped entries. On uprobe detach, we would like to regroup PMD
mapped page table entry to regain performance benefit of THP.

However, the regroup is broken For perf_event based trace_uprobe. This is
because perf_event based trace_uprobe calls uprobe_unregister twice on
close: first in TRACE_REG_PERF_CLOSE, then in TRACE_REG_PERF_UNREGISTER.
The second call will split the PMD mapped page table entry, which is not
the desired behavior.

Fix this by only use FOLL_SPLIT_PMD for uprobe register case.

Add a WARN() to confirm uprobe unregister never work on huge pages, and
abort the operation when this WARN() triggers.

Fixes: 5a52c9df62b4 ("uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT")
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 kernel/events/uprobes.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 94d38a39d72e..c74761004ee5 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -474,14 +474,17 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	struct vm_area_struct *vma;
 	int ret, is_register, ref_ctr_updated = 0;
 	bool orig_page_huge = false;
+	unsigned int gup_flags = FOLL_FORCE;
 
 	is_register = is_swbp_insn(&opcode);
 	uprobe = container_of(auprobe, struct uprobe, arch);
 
 retry:
+	if (is_register)
+		gup_flags |= FOLL_SPLIT_PMD;
 	/* Read the page with vaddr into memory */
-	ret = get_user_pages_remote(NULL, mm, vaddr, 1,
-			FOLL_FORCE | FOLL_SPLIT_PMD, &old_page, &vma, NULL);
+	ret = get_user_pages_remote(NULL, mm, vaddr, 1, gup_flags,
+				    &old_page, &vma, NULL);
 	if (ret <= 0)
 		return ret;
 
@@ -489,6 +492,12 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	if (ret <= 0)
 		goto put_old;
 
+	if (WARN(!is_register && PageCompound(old_page),
+		 "uprobe unregister should never work on compound page\n")) {
+		ret = -EINVAL;
+		goto put_old;
+	}
+
 	/* We are going to replace instruction, update ref_ctr. */
 	if (!ref_ctr_updated && uprobe->ref_ctr_offset) {
 		ret = update_ref_ctr(uprobe, mm, is_register ? 1 : -1);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/5] proc/meminfo: fix output alignment
  2019-10-17 16:42 ` [PATCH v2 1/5] proc/meminfo: fix output alignment Song Liu
@ 2019-10-17 17:37   ` Yang Shi
  0 siblings, 0 replies; 13+ messages in thread
From: Yang Shi @ 2019-10-17 17:37 UTC (permalink / raw)
  To: Song Liu
  Cc: Linux Kernel Mailing List, Linux MM, Andrew Morton,
	matthew.wilcox, kernel-team, william.kucharski,
	Kirill A. Shutemov

On Thu, Oct 17, 2019 at 9:42 AM Song Liu <songliubraving@fb.com> wrote:
>
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
> Add extra space for FileHugePages and FilePmdMapped, so the output is
> aligned with other rows.
>
> Fixes: 60fbf0ab5da1 ("mm,thp: stats for file backed THP")
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Tested-by: Song Liu <songliubraving@fb.com>
> Signed-off-by: Song Liu <songliubraving@fb.com>

Acked-by: Yang Shi <yang.shi@linux.alibaba.com>

> ---
>  fs/proc/meminfo.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> index ac9247371871..8c1f1bb1a5ce 100644
> --- a/fs/proc/meminfo.c
> +++ b/fs/proc/meminfo.c
> @@ -132,9 +132,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
>                     global_node_page_state(NR_SHMEM_THPS) * HPAGE_PMD_NR);
>         show_val_kb(m, "ShmemPmdMapped: ",
>                     global_node_page_state(NR_SHMEM_PMDMAPPED) * HPAGE_PMD_NR);
> -       show_val_kb(m, "FileHugePages: ",
> +       show_val_kb(m, "FileHugePages:  ",
>                     global_node_page_state(NR_FILE_THPS) * HPAGE_PMD_NR);
> -       show_val_kb(m, "FilePmdMapped: ",
> +       show_val_kb(m, "FilePmdMapped:  ",
>                     global_node_page_state(NR_FILE_PMDMAPPED) * HPAGE_PMD_NR);
>  #endif
>
> --
> 2.17.1
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/5] mm/thp: fix node page state in split_huge_page_to_list()
  2019-10-17 16:42 ` [PATCH v2 2/5] mm/thp: fix node page state in split_huge_page_to_list() Song Liu
@ 2019-10-17 17:38   ` Yang Shi
  0 siblings, 0 replies; 13+ messages in thread
From: Yang Shi @ 2019-10-17 17:38 UTC (permalink / raw)
  To: Song Liu
  Cc: Linux Kernel Mailing List, Linux MM, Andrew Morton,
	matthew.wilcox, kernel-team, william.kucharski,
	Kirill A. Shutemov

On Thu, Oct 17, 2019 at 9:42 AM Song Liu <songliubraving@fb.com> wrote:
>
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
> Make sure split_huge_page_to_list() handle the state of shmem THP and
> file THP properly.
>
> Fixes: 60fbf0ab5da1 ("mm,thp: stats for file backed THP")
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Tested-by: Song Liu <songliubraving@fb.com>
> Signed-off-by: Song Liu <songliubraving@fb.com>

Acked-by: Yang Shi <yang.shi@linux.alibaba.com>

> ---
>  mm/huge_memory.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index c5cb6dcd6c69..13cc93785006 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2789,8 +2789,13 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>                         ds_queue->split_queue_len--;
>                         list_del(page_deferred_list(head));
>                 }
> -               if (mapping)
> -                       __dec_node_page_state(page, NR_SHMEM_THPS);
> +               if (mapping) {
> +                       if (PageSwapBacked(page))
> +                               __dec_node_page_state(page, NR_SHMEM_THPS);
> +                       else
> +                               __dec_node_page_state(page, NR_FILE_THPS);
> +               }
> +
>                 spin_unlock(&ds_queue->split_queue_lock);
>                 __split_huge_page(page, list, end, flags);
>                 if (PageSwapCache(head)) {
> --
> 2.17.1
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/5] mm: Support removing arbitrary sized pages from mapping
  2019-10-17 16:42 ` [PATCH v2 3/5] mm: Support removing arbitrary sized pages from mapping Song Liu
@ 2019-10-17 17:43   ` Yang Shi
  0 siblings, 0 replies; 13+ messages in thread
From: Yang Shi @ 2019-10-17 17:43 UTC (permalink / raw)
  To: Song Liu
  Cc: Linux Kernel Mailing List, Linux MM, Andrew Morton,
	matthew.wilcox, kernel-team, william.kucharski,
	Kirill A. Shutemov, Matthew Wilcox

On Thu, Oct 17, 2019 at 9:42 AM Song Liu <songliubraving@fb.com> wrote:
>
> From: William Kucharski <william.kucharski@oracle.com>
>
> __remove_mapping() assumes that pages can only be either base pages
> or HPAGE_PMD_SIZE.  Ask the page what size it is.
>
> Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
> Signed-off-by: William Kucharski <william.kucharski@oracle.com>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Signed-off-by: Song Liu <songliubraving@fb.com>

Acked-by: Yang Shi <yang.shi@linux.alibaba.com>

> ---
>  mm/vmscan.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c6659bb758a4..f870da1f4bb7 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -932,10 +932,7 @@ static int __remove_mapping(struct address_space *mapping, struct page *page,
>          * Note that if SetPageDirty is always performed via set_page_dirty,
>          * and thus under the i_pages lock, then this ordering is not required.
>          */
> -       if (unlikely(PageTransHuge(page)) && PageSwapCache(page))
> -               refcount = 1 + HPAGE_PMD_NR;
> -       else
> -               refcount = 2;
> +       refcount = 1 + compound_nr(page);
>         if (!page_ref_freeze(page, refcount))
>                 goto cannot_free;
>         /* note: atomic_cmpxchg in page_ref_freeze provides the smp_rmb */
> --
> 2.17.1
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/5] mm/thp: allow drop THP from page cache
  2019-10-17 16:42 ` [PATCH v2 4/5] mm/thp: allow drop THP from page cache Song Liu
@ 2019-10-17 21:46   ` Yang Shi
  2019-10-18 13:32     ` Kirill A. Shutemov
  0 siblings, 1 reply; 13+ messages in thread
From: Yang Shi @ 2019-10-17 21:46 UTC (permalink / raw)
  To: Song Liu
  Cc: Linux Kernel Mailing List, Linux MM, Andrew Morton,
	matthew.wilcox, kernel-team, william.kucharski,
	Kirill A. Shutemov

On Thu, Oct 17, 2019 at 9:42 AM Song Liu <songliubraving@fb.com> wrote:
>
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
> Once a THP is added to the page cache, it cannot be dropped via
> /proc/sys/vm/drop_caches. Fix this issue with proper handling in
> invalidate_mapping_pages().
>
> Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Tested-by: Song Liu <songliubraving@fb.com>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  mm/truncate.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/mm/truncate.c b/mm/truncate.c
> index 8563339041f6..dd9ebc1da356 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -592,6 +592,16 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
>                                         unlock_page(page);
>                                         continue;
>                                 }
> +
> +                               /* Take a pin outside pagevec */
> +                               get_page(page);
> +
> +                               /*
> +                                * Drop extra pins before trying to invalidate
> +                                * the huge page.
> +                                */
> +                               pagevec_remove_exceptionals(&pvec);
> +                               pagevec_release(&pvec);

Shall we skip the outer pagevec_remove_exceptions() if it has been done here?

>                         }
>
>                         ret = invalidate_inode_page(page);
> @@ -602,6 +612,8 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
>                          */
>                         if (!ret)
>                                 deactivate_file_page(page);
> +                       if (PageTransHuge(page))
> +                               put_page(page);
>                         count += ret;
>                 }
>                 pagevec_remove_exceptionals(&pvec);
> --
> 2.17.1
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 5/5] uprobe: only do FOLL_SPLIT_PMD for uprobe register
  2019-10-17 16:42 ` [PATCH v2 5/5] uprobe: only do FOLL_SPLIT_PMD for uprobe register Song Liu
@ 2019-10-18 12:52   ` Srikar Dronamraju
  0 siblings, 0 replies; 13+ messages in thread
From: Srikar Dronamraju @ 2019-10-18 12:52 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, linux-mm, akpm, matthew.wilcox, kernel-team,
	william.kucharski, kirill.shutemov, Oleg Nesterov

* Song Liu <songliubraving@fb.com> [2019-10-17 09:42:22]:

> Attaching uprobe to text section in THP splits the PMD mapped page table
> into PTE mapped entries. On uprobe detach, we would like to regroup PMD
> mapped page table entry to regain performance benefit of THP.
> 
> However, the regroup is broken For perf_event based trace_uprobe. This is
> because perf_event based trace_uprobe calls uprobe_unregister twice on
> close: first in TRACE_REG_PERF_CLOSE, then in TRACE_REG_PERF_UNREGISTER.
> The second call will split the PMD mapped page table entry, which is not
> the desired behavior.
> 
> Fix this by only use FOLL_SPLIT_PMD for uprobe register case.
> 
> Add a WARN() to confirm uprobe unregister never work on huge pages, and
> abort the operation when this WARN() triggers.
> 
> Fixes: 5a52c9df62b4 ("uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT")
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---

Looks good to me.

Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

>  kernel/events/uprobes.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 94d38a39d72e..c74761004ee5 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -474,14 +474,17 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
>  	struct vm_area_struct *vma;
>  	int ret, is_register, ref_ctr_updated = 0;
>  	bool orig_page_huge = false;
> +	unsigned int gup_flags = FOLL_FORCE;
> 
>  	is_register = is_swbp_insn(&opcode);
>  	uprobe = container_of(auprobe, struct uprobe, arch);
> 
>  retry:
> +	if (is_register)
> +		gup_flags |= FOLL_SPLIT_PMD;
>  	/* Read the page with vaddr into memory */
> -	ret = get_user_pages_remote(NULL, mm, vaddr, 1,
> -			FOLL_FORCE | FOLL_SPLIT_PMD, &old_page, &vma, NULL);
> +	ret = get_user_pages_remote(NULL, mm, vaddr, 1, gup_flags,
> +				    &old_page, &vma, NULL);
>  	if (ret <= 0)
>  		return ret;
> 
> @@ -489,6 +492,12 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
>  	if (ret <= 0)
>  		goto put_old;
> 
> +	if (WARN(!is_register && PageCompound(old_page),
> +		 "uprobe unregister should never work on compound page\n")) {
> +		ret = -EINVAL;
> +		goto put_old;
> +	}
> +
>  	/* We are going to replace instruction, update ref_ctr. */
>  	if (!ref_ctr_updated && uprobe->ref_ctr_offset) {
>  		ret = update_ref_ctr(uprobe, mm, is_register ? 1 : -1);
> -- 
> 2.17.1
> 

-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/5] mm/thp: allow drop THP from page cache
  2019-10-17 21:46   ` Yang Shi
@ 2019-10-18 13:32     ` Kirill A. Shutemov
  2019-10-18 21:54       ` Yang Shi
  0 siblings, 1 reply; 13+ messages in thread
From: Kirill A. Shutemov @ 2019-10-18 13:32 UTC (permalink / raw)
  To: Yang Shi
  Cc: Song Liu, Linux Kernel Mailing List, Linux MM, Andrew Morton,
	matthew.wilcox, kernel-team, william.kucharski,
	Kirill A. Shutemov

On Thu, Oct 17, 2019 at 02:46:38PM -0700, Yang Shi wrote:
> On Thu, Oct 17, 2019 at 9:42 AM Song Liu <songliubraving@fb.com> wrote:
> >
> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> >
> > Once a THP is added to the page cache, it cannot be dropped via
> > /proc/sys/vm/drop_caches. Fix this issue with proper handling in
> > invalidate_mapping_pages().
> >
> > Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Tested-by: Song Liu <songliubraving@fb.com>
> > Signed-off-by: Song Liu <songliubraving@fb.com>
> > ---
> >  mm/truncate.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/mm/truncate.c b/mm/truncate.c
> > index 8563339041f6..dd9ebc1da356 100644
> > --- a/mm/truncate.c
> > +++ b/mm/truncate.c
> > @@ -592,6 +592,16 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
> >                                         unlock_page(page);
> >                                         continue;
> >                                 }
> > +
> > +                               /* Take a pin outside pagevec */
> > +                               get_page(page);
> > +
> > +                               /*
> > +                                * Drop extra pins before trying to invalidate
> > +                                * the huge page.
> > +                                */
> > +                               pagevec_remove_exceptionals(&pvec);
> > +                               pagevec_release(&pvec);
> 
> Shall we skip the outer pagevec_remove_exceptions() if it has been done here?

It will be NOP and skipping would complicate the code.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/5] mm/thp: allow drop THP from page cache
  2019-10-18 13:32     ` Kirill A. Shutemov
@ 2019-10-18 21:54       ` Yang Shi
  0 siblings, 0 replies; 13+ messages in thread
From: Yang Shi @ 2019-10-18 21:54 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Song Liu, Linux Kernel Mailing List, Linux MM, Andrew Morton,
	matthew.wilcox, kernel-team, william.kucharski,
	Kirill A. Shutemov

On Fri, Oct 18, 2019 at 6:32 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> On Thu, Oct 17, 2019 at 02:46:38PM -0700, Yang Shi wrote:
> > On Thu, Oct 17, 2019 at 9:42 AM Song Liu <songliubraving@fb.com> wrote:
> > >
> > > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > >
> > > Once a THP is added to the page cache, it cannot be dropped via
> > > /proc/sys/vm/drop_caches. Fix this issue with proper handling in
> > > invalidate_mapping_pages().
> > >
> > > Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
> > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > Tested-by: Song Liu <songliubraving@fb.com>
> > > Signed-off-by: Song Liu <songliubraving@fb.com>
> > > ---
> > >  mm/truncate.c | 12 ++++++++++++
> > >  1 file changed, 12 insertions(+)
> > >
> > > diff --git a/mm/truncate.c b/mm/truncate.c
> > > index 8563339041f6..dd9ebc1da356 100644
> > > --- a/mm/truncate.c
> > > +++ b/mm/truncate.c
> > > @@ -592,6 +592,16 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
> > >                                         unlock_page(page);
> > >                                         continue;
> > >                                 }
> > > +
> > > +                               /* Take a pin outside pagevec */
> > > +                               get_page(page);
> > > +
> > > +                               /*
> > > +                                * Drop extra pins before trying to invalidate
> > > +                                * the huge page.
> > > +                                */
> > > +                               pagevec_remove_exceptionals(&pvec);
> > > +                               pagevec_release(&pvec);
> >
> > Shall we skip the outer pagevec_remove_exceptions() if it has been done here?
>
> It will be NOP and skipping would complicate the code.

Yes, it would be. Anyway, it looks ok too. Acked-by: Yang Shi
<yang.shi@linux.alibaba.com>

>
> --
>  Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-10-18 21:54 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-17 16:42 [PATCH v2 0/5] Fixes for THP in page cache Song Liu
2019-10-17 16:42 ` [PATCH v2 1/5] proc/meminfo: fix output alignment Song Liu
2019-10-17 17:37   ` Yang Shi
2019-10-17 16:42 ` [PATCH v2 2/5] mm/thp: fix node page state in split_huge_page_to_list() Song Liu
2019-10-17 17:38   ` Yang Shi
2019-10-17 16:42 ` [PATCH v2 3/5] mm: Support removing arbitrary sized pages from mapping Song Liu
2019-10-17 17:43   ` Yang Shi
2019-10-17 16:42 ` [PATCH v2 4/5] mm/thp: allow drop THP from page cache Song Liu
2019-10-17 21:46   ` Yang Shi
2019-10-18 13:32     ` Kirill A. Shutemov
2019-10-18 21:54       ` Yang Shi
2019-10-17 16:42 ` [PATCH v2 5/5] uprobe: only do FOLL_SPLIT_PMD for uprobe register Song Liu
2019-10-18 12:52   ` Srikar Dronamraju

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).