linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Cleanup and fixup for huge_memory
@ 2021-04-29 13:26 Miaohe Lin
  2021-04-29 13:26 ` [PATCH v2 1/5] mm/huge_memory.c: remove dedicated macro HPAGE_CACHE_INDEX_MASK Miaohe Lin
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Miaohe Lin @ 2021-04-29 13:26 UTC (permalink / raw)
  To: akpm
  Cc: ziy, william.kucharski, willy, yang.shi, aneesh.kumar, rcampbell,
	songliubraving, kirill.shutemov, riel, hannes, minchan,
	linux-kernel, linux-mm, linmiaohe

Hi all,
This series contains cleanups to remove dedicated macro and remove
unnecessary tlb_remove_page_size() for huge zero pmd. Also this adds
missing read-only THP checking for transparent_hugepage_enabled() and
avoids discarding hugepage if other processes are mapping it. More
details can be found in the respective changelogs. Thanks!

v1->v2:
  collect Reviewed-by tag
  add missing check for read-only THP per Yang Shi

Miaohe Lin (5):
  mm/huge_memory.c: remove dedicated macro HPAGE_CACHE_INDEX_MASK
  mm/huge_memory.c: use page->deferred_list
  mm/huge_memory.c: add missing read-only THP checking in
    transparent_hugepage_enabled()
  mm/huge_memory.c: remove unnecessary tlb_remove_page_size() for huge
    zero pmd
  mm/huge_memory.c: don't discard hugepage if other processes are
    mapping it

 include/linux/huge_mm.h | 27 +++++++++++++++++++--------
 mm/huge_memory.c        | 15 +++++++++------
 mm/khugepaged.c         |  4 +---
 mm/shmem.c              |  3 +--
 4 files changed, 30 insertions(+), 19 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/5] mm/huge_memory.c: remove dedicated macro HPAGE_CACHE_INDEX_MASK
  2021-04-29 13:26 [PATCH v2 0/5] Cleanup and fixup for huge_memory Miaohe Lin
@ 2021-04-29 13:26 ` Miaohe Lin
  2021-04-29 14:48   ` David Hildenbrand
  2021-04-29 13:26 ` [PATCH v2 2/5] mm/huge_memory.c: use page->deferred_list Miaohe Lin
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Miaohe Lin @ 2021-04-29 13:26 UTC (permalink / raw)
  To: akpm
  Cc: ziy, william.kucharski, willy, yang.shi, aneesh.kumar, rcampbell,
	songliubraving, kirill.shutemov, riel, hannes, minchan,
	linux-kernel, linux-mm, linmiaohe

Rewrite the pgoff checking logic to remove macro HPAGE_CACHE_INDEX_MASK
which is only used here to simplify the code.

Reviewed-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 include/linux/huge_mm.h | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 9626fda5efce..0a526f211fec 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -152,15 +152,13 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
 
 bool transparent_hugepage_enabled(struct vm_area_struct *vma);
 
-#define HPAGE_CACHE_INDEX_MASK (HPAGE_PMD_NR - 1)
-
 static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
 		unsigned long haddr)
 {
 	/* Don't have to check pgoff for anonymous vma */
 	if (!vma_is_anonymous(vma)) {
-		if (((vma->vm_start >> PAGE_SHIFT) & HPAGE_CACHE_INDEX_MASK) !=
-			(vma->vm_pgoff & HPAGE_CACHE_INDEX_MASK))
+		if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
+				HPAGE_PMD_NR))
 			return false;
 	}
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/5] mm/huge_memory.c: use page->deferred_list
  2021-04-29 13:26 [PATCH v2 0/5] Cleanup and fixup for huge_memory Miaohe Lin
  2021-04-29 13:26 ` [PATCH v2 1/5] mm/huge_memory.c: remove dedicated macro HPAGE_CACHE_INDEX_MASK Miaohe Lin
@ 2021-04-29 13:26 ` Miaohe Lin
  2021-04-29 14:52   ` David Hildenbrand
  2021-04-29 13:26 ` [PATCH v2 3/5] mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled() Miaohe Lin
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Miaohe Lin @ 2021-04-29 13:26 UTC (permalink / raw)
  To: akpm
  Cc: ziy, william.kucharski, willy, yang.shi, aneesh.kumar, rcampbell,
	songliubraving, kirill.shutemov, riel, hannes, minchan,
	linux-kernel, linux-mm, linmiaohe

Now that we can represent the location of ->deferred_list instead of
->mapping + ->index, make use of it to improve readability.

Reviewed-by: Yang Shi <shy828301@gmail.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/huge_memory.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 63ed6b25deaa..76ca1eb2a223 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2868,7 +2868,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
 	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
 	/* Take pin on all head pages to avoid freeing them under us */
 	list_for_each_safe(pos, next, &ds_queue->split_queue) {
-		page = list_entry((void *)pos, struct page, mapping);
+		page = list_entry((void *)pos, struct page, deferred_list);
 		page = compound_head(page);
 		if (get_page_unless_zero(page)) {
 			list_move(page_deferred_list(page), &list);
@@ -2883,7 +2883,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
 	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
 
 	list_for_each_safe(pos, next, &list) {
-		page = list_entry((void *)pos, struct page, mapping);
+		page = list_entry((void *)pos, struct page, deferred_list);
 		if (!trylock_page(page))
 			goto next;
 		/* split_huge_page() removes page from list on success */
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/5] mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled()
  2021-04-29 13:26 [PATCH v2 0/5] Cleanup and fixup for huge_memory Miaohe Lin
  2021-04-29 13:26 ` [PATCH v2 1/5] mm/huge_memory.c: remove dedicated macro HPAGE_CACHE_INDEX_MASK Miaohe Lin
  2021-04-29 13:26 ` [PATCH v2 2/5] mm/huge_memory.c: use page->deferred_list Miaohe Lin
@ 2021-04-29 13:26 ` Miaohe Lin
  2021-04-29 14:57   ` David Hildenbrand
  2021-04-29 13:26 ` [PATCH v2 4/5] mm/huge_memory.c: remove unnecessary tlb_remove_page_size() for huge zero pmd Miaohe Lin
  2021-04-29 13:26 ` [PATCH v2 5/5] mm/huge_memory.c: don't discard hugepage if other processes are mapping it Miaohe Lin
  4 siblings, 1 reply; 16+ messages in thread
From: Miaohe Lin @ 2021-04-29 13:26 UTC (permalink / raw)
  To: akpm
  Cc: ziy, william.kucharski, willy, yang.shi, aneesh.kumar, rcampbell,
	songliubraving, kirill.shutemov, riel, hannes, minchan,
	linux-kernel, linux-mm, linmiaohe

Since commit 99cb0dbd47a1 ("mm,thp: add read-only THP support for
(non-shmem) FS"), read-only THP file mapping is supported. But it
forgot to add checking for it in transparent_hugepage_enabled().
To fix it, we add checking for read-only THP file mapping and also
introduce helper transhuge_vma_enabled() to check whether thp is
enabled for specified vma to reduce duplicated code.

Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 include/linux/huge_mm.h | 21 +++++++++++++++++----
 mm/huge_memory.c        |  6 ++++++
 mm/khugepaged.c         |  4 +---
 mm/shmem.c              |  3 +--
 4 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 0a526f211fec..f460b74619fc 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -115,6 +115,16 @@ extern struct kobj_attribute shmem_enabled_attr;
 
 extern unsigned long transparent_hugepage_flags;
 
+static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
+					  unsigned long vm_flags)
+{
+	/* Explicitly disabled through madvise. */
+	if ((vm_flags & VM_NOHUGEPAGE) ||
+	    test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
+		return false;
+	return true;
+}
+
 /*
  * to be used on vmas which are known to support THP.
  * Use transparent_hugepage_enabled otherwise
@@ -128,15 +138,12 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
 	if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
 		return false;
 
-	if (vma->vm_flags & VM_NOHUGEPAGE)
+	if (!transhuge_vma_enabled(vma, vma->vm_flags))
 		return false;
 
 	if (vma_is_temporary_stack(vma))
 		return false;
 
-	if (test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
-		return false;
-
 	if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
 		return true;
 
@@ -362,6 +369,12 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
 	return false;
 }
 
+static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
+					  unsigned long vm_flags)
+{
+	return false;
+}
+
 static inline void prep_transhuge_page(struct page *page) {}
 
 static inline bool is_transparent_hugepage(struct page *page)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 76ca1eb2a223..e24a96de2e37 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -68,12 +68,18 @@ bool transparent_hugepage_enabled(struct vm_area_struct *vma)
 	/* The addr is used to check if the vma size fits */
 	unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
 
+	if (!transhuge_vma_enabled(vma, vma->vm_flags))
+		return false;
 	if (!transhuge_vma_suitable(vma, addr))
 		return false;
 	if (vma_is_anonymous(vma))
 		return __transparent_hugepage_enabled(vma);
 	if (vma_is_shmem(vma))
 		return shmem_huge_enabled(vma);
+	if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
+	    !inode_is_open_for_write(vma->vm_file->f_inode) &&
+	    (vma->vm_flags & VM_EXEC))
+		return true;
 
 	return false;
 }
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 6c0185fdd815..d97b20fad6e8 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -442,9 +442,7 @@ static inline int khugepaged_test_exit(struct mm_struct *mm)
 static bool hugepage_vma_check(struct vm_area_struct *vma,
 			       unsigned long vm_flags)
 {
-	/* Explicitly disabled through madvise. */
-	if ((vm_flags & VM_NOHUGEPAGE) ||
-	    test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
+	if (!transhuge_vma_enabled(vma, vm_flags))
 		return false;
 
 	/* Enabled via shmem mount options or sysfs settings. */
diff --git a/mm/shmem.c b/mm/shmem.c
index a08cedefbfaa..1dcbec313c70 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -4032,8 +4032,7 @@ bool shmem_huge_enabled(struct vm_area_struct *vma)
 	loff_t i_size;
 	pgoff_t off;
 
-	if ((vma->vm_flags & VM_NOHUGEPAGE) ||
-	    test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
+	if (!transhuge_vma_enabled(vma, vma->vm_flags))
 		return false;
 	if (shmem_huge == SHMEM_HUGE_FORCE)
 		return true;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 4/5] mm/huge_memory.c: remove unnecessary tlb_remove_page_size() for huge zero pmd
  2021-04-29 13:26 [PATCH v2 0/5] Cleanup and fixup for huge_memory Miaohe Lin
                   ` (2 preceding siblings ...)
  2021-04-29 13:26 ` [PATCH v2 3/5] mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled() Miaohe Lin
@ 2021-04-29 13:26 ` Miaohe Lin
  2021-04-29 15:02   ` David Hildenbrand
  2021-04-29 17:55   ` Yang Shi
  2021-04-29 13:26 ` [PATCH v2 5/5] mm/huge_memory.c: don't discard hugepage if other processes are mapping it Miaohe Lin
  4 siblings, 2 replies; 16+ messages in thread
From: Miaohe Lin @ 2021-04-29 13:26 UTC (permalink / raw)
  To: akpm
  Cc: ziy, william.kucharski, willy, yang.shi, aneesh.kumar, rcampbell,
	songliubraving, kirill.shutemov, riel, hannes, minchan,
	linux-kernel, linux-mm, linmiaohe

Commit aa88b68c3b1d ("thp: keep huge zero page pinned until tlb flush")
introduced tlb_remove_page() for huge zero page to keep it pinned until
flush is complete and prevents the page from being split under us. But
huge zero page is kept pinned until all relevant mm_users reach zero since
the commit 6fcb52a56ff6 ("thp: reduce usage of huge zero page's atomic
counter"). So tlb_remove_page_size() for huge zero pmd is unnecessary now.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/huge_memory.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e24a96de2e37..af30338ac49c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1680,12 +1680,9 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		if (arch_needs_pgtable_deposit())
 			zap_deposited_table(tlb->mm, pmd);
 		spin_unlock(ptl);
-		if (is_huge_zero_pmd(orig_pmd))
-			tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
 	} else if (is_huge_zero_pmd(orig_pmd)) {
 		zap_deposited_table(tlb->mm, pmd);
 		spin_unlock(ptl);
-		tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
 	} else {
 		struct page *page = NULL;
 		int flush_needed = 1;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 5/5] mm/huge_memory.c: don't discard hugepage if other processes are mapping it
  2021-04-29 13:26 [PATCH v2 0/5] Cleanup and fixup for huge_memory Miaohe Lin
                   ` (3 preceding siblings ...)
  2021-04-29 13:26 ` [PATCH v2 4/5] mm/huge_memory.c: remove unnecessary tlb_remove_page_size() for huge zero pmd Miaohe Lin
@ 2021-04-29 13:26 ` Miaohe Lin
  4 siblings, 0 replies; 16+ messages in thread
From: Miaohe Lin @ 2021-04-29 13:26 UTC (permalink / raw)
  To: akpm
  Cc: ziy, william.kucharski, willy, yang.shi, aneesh.kumar, rcampbell,
	songliubraving, kirill.shutemov, riel, hannes, minchan,
	linux-kernel, linux-mm, linmiaohe

If other processes are mapping any other subpages of the hugepage, i.e. in
pte-mapped thp case, page_mapcount() will return 1 incorrectly. Then we
would discard the page while other processes are still mapping it. Fix it
by using total_mapcount() which can tell whether other processes are still
mapping it.

Fixes: b8d3c4c3009d ("mm/huge_memory.c: don't split THP page when MADV_FREE syscall is called")
Reviewed-by: Yang Shi <shy828301@gmail.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/huge_memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index af30338ac49c..87b0241394f4 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1607,7 +1607,7 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	 * If other processes are mapping this page, we couldn't discard
 	 * the page unless they all do MADV_FREE so let's skip the page.
 	 */
-	if (page_mapcount(page) != 1)
+	if (total_mapcount(page) != 1)
 		goto out;
 
 	if (!trylock_page(page))
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/5] mm/huge_memory.c: remove dedicated macro HPAGE_CACHE_INDEX_MASK
  2021-04-29 13:26 ` [PATCH v2 1/5] mm/huge_memory.c: remove dedicated macro HPAGE_CACHE_INDEX_MASK Miaohe Lin
@ 2021-04-29 14:48   ` David Hildenbrand
  2021-04-30  1:39     ` Miaohe Lin
  0 siblings, 1 reply; 16+ messages in thread
From: David Hildenbrand @ 2021-04-29 14:48 UTC (permalink / raw)
  To: Miaohe Lin, akpm
  Cc: ziy, william.kucharski, willy, yang.shi, aneesh.kumar, rcampbell,
	songliubraving, kirill.shutemov, riel, hannes, minchan,
	linux-kernel, linux-mm

On 29.04.21 15:26, Miaohe Lin wrote:
> Rewrite the pgoff checking logic to remove macro HPAGE_CACHE_INDEX_MASK
> which is only used here to simplify the code.
> 
> Reviewed-by: Yang Shi <shy828301@gmail.com>
> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>   include/linux/huge_mm.h | 6 ++----
>   1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 9626fda5efce..0a526f211fec 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -152,15 +152,13 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
>   
>   bool transparent_hugepage_enabled(struct vm_area_struct *vma);
>   
> -#define HPAGE_CACHE_INDEX_MASK (HPAGE_PMD_NR - 1)
> -
>   static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
>   		unsigned long haddr)
>   {
>   	/* Don't have to check pgoff for anonymous vma */
>   	if (!vma_is_anonymous(vma)) {
> -		if (((vma->vm_start >> PAGE_SHIFT) & HPAGE_CACHE_INDEX_MASK) !=
> -			(vma->vm_pgoff & HPAGE_CACHE_INDEX_MASK))
> +		if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
> +				HPAGE_PMD_NR))
>   			return false;

I'd have used

if (!IS_ALIGNED(PHYS_PFN(vma->vm_start) - vma->vm_pgoff,

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] mm/huge_memory.c: use page->deferred_list
  2021-04-29 13:26 ` [PATCH v2 2/5] mm/huge_memory.c: use page->deferred_list Miaohe Lin
@ 2021-04-29 14:52   ` David Hildenbrand
  0 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2021-04-29 14:52 UTC (permalink / raw)
  To: Miaohe Lin, akpm
  Cc: ziy, william.kucharski, willy, yang.shi, aneesh.kumar, rcampbell,
	songliubraving, kirill.shutemov, riel, hannes, minchan,
	linux-kernel, linux-mm

On 29.04.21 15:26, Miaohe Lin wrote:
> Now that we can represent the location of ->deferred_list instead of
> ->mapping + ->index, make use of it to improve readability.
> 
> Reviewed-by: Yang Shi <shy828301@gmail.com>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>   mm/huge_memory.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 63ed6b25deaa..76ca1eb2a223 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2868,7 +2868,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
>   	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>   	/* Take pin on all head pages to avoid freeing them under us */
>   	list_for_each_safe(pos, next, &ds_queue->split_queue) {
> -		page = list_entry((void *)pos, struct page, mapping);
> +		page = list_entry((void *)pos, struct page, deferred_list);
>   		page = compound_head(page);
>   		if (get_page_unless_zero(page)) {
>   			list_move(page_deferred_list(page), &list);
> @@ -2883,7 +2883,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
>   	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>   
>   	list_for_each_safe(pos, next, &list) {
> -		page = list_entry((void *)pos, struct page, mapping);
> +		page = list_entry((void *)pos, struct page, deferred_list);
>   		if (!trylock_page(page))
>   			goto next;
>   		/* split_huge_page() removes page from list on success */
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/5] mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled()
  2021-04-29 13:26 ` [PATCH v2 3/5] mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled() Miaohe Lin
@ 2021-04-29 14:57   ` David Hildenbrand
  2021-04-29 16:23     ` Yang Shi
  2021-04-30  1:57     ` Miaohe Lin
  0 siblings, 2 replies; 16+ messages in thread
From: David Hildenbrand @ 2021-04-29 14:57 UTC (permalink / raw)
  To: Miaohe Lin, akpm
  Cc: ziy, william.kucharski, willy, yang.shi, aneesh.kumar, rcampbell,
	songliubraving, kirill.shutemov, riel, hannes, minchan,
	linux-kernel, linux-mm

On 29.04.21 15:26, Miaohe Lin wrote:
> Since commit 99cb0dbd47a1 ("mm,thp: add read-only THP support for
> (non-shmem) FS"), read-only THP file mapping is supported. But it
> forgot to add checking for it in transparent_hugepage_enabled().
> To fix it, we add checking for read-only THP file mapping and also
> introduce helper transhuge_vma_enabled() to check whether thp is
> enabled for specified vma to reduce duplicated code.
> 
> Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>   include/linux/huge_mm.h | 21 +++++++++++++++++----
>   mm/huge_memory.c        |  6 ++++++
>   mm/khugepaged.c         |  4 +---
>   mm/shmem.c              |  3 +--
>   4 files changed, 25 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 0a526f211fec..f460b74619fc 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -115,6 +115,16 @@ extern struct kobj_attribute shmem_enabled_attr;
>   
>   extern unsigned long transparent_hugepage_flags;
>   
> +static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
> +					  unsigned long vm_flags)

You're passing the vma already, why do you pass vma->vm_flags 
separately? It's sufficient to pass in the vma only.

> +{
> +	/* Explicitly disabled through madvise. */
> +	if ((vm_flags & VM_NOHUGEPAGE) ||
> +	    test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> +		return false;
> +	return true;
> +}
> +
>   /*
>    * to be used on vmas which are known to support THP.
>    * Use transparent_hugepage_enabled otherwise
> @@ -128,15 +138,12 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
>   	if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
>   		return false;
>   
> -	if (vma->vm_flags & VM_NOHUGEPAGE)
> +	if (!transhuge_vma_enabled(vma, vma->vm_flags))
>   		return false;
>   
>   	if (vma_is_temporary_stack(vma))
>   		return false;
>   
> -	if (test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> -		return false;
> -
>   	if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
>   		return true;
>   
> @@ -362,6 +369,12 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
>   	return false;
>   }
>   
> +static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
> +					  unsigned long vm_flags)
> +{
> +	return false;
> +}
> +
>   static inline void prep_transhuge_page(struct page *page) {}
>   
>   static inline bool is_transparent_hugepage(struct page *page)
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 76ca1eb2a223..e24a96de2e37 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -68,12 +68,18 @@ bool transparent_hugepage_enabled(struct vm_area_struct *vma)
>   	/* The addr is used to check if the vma size fits */
>   	unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
>   
> +	if (!transhuge_vma_enabled(vma, vma->vm_flags))
> +		return false;
>   	if (!transhuge_vma_suitable(vma, addr))
>   		return false;
>   	if (vma_is_anonymous(vma))
>   		return __transparent_hugepage_enabled(vma);
>   	if (vma_is_shmem(vma))
>   		return shmem_huge_enabled(vma);
> +	if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
> +	    !inode_is_open_for_write(vma->vm_file->f_inode) &&
> +	    (vma->vm_flags & VM_EXEC))
> +		return true;

Nit: I'm really wondering why we have 3 different functions that sound 
like they are doing the same thing

transparent_hugepage_enabled(vma)
transhuge_vma_enabled()
transhuge_vma_suitable()

Which check belongs where? Does it really have to be that complicated?

>   
>   	return false;
>   }
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 6c0185fdd815..d97b20fad6e8 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -442,9 +442,7 @@ static inline int khugepaged_test_exit(struct mm_struct *mm)
>   static bool hugepage_vma_check(struct vm_area_struct *vma,
>   			       unsigned long vm_flags)
>   {
> -	/* Explicitly disabled through madvise. */
> -	if ((vm_flags & VM_NOHUGEPAGE) ||
> -	    test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> +	if (!transhuge_vma_enabled(vma, vm_flags))
>   		return false;
>   
>   	/* Enabled via shmem mount options or sysfs settings. */
> diff --git a/mm/shmem.c b/mm/shmem.c
> index a08cedefbfaa..1dcbec313c70 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -4032,8 +4032,7 @@ bool shmem_huge_enabled(struct vm_area_struct *vma)
>   	loff_t i_size;
>   	pgoff_t off;
>   
> -	if ((vma->vm_flags & VM_NOHUGEPAGE) ||
> -	    test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> +	if (!transhuge_vma_enabled(vma, vma->vm_flags))
>   		return false;
>   	if (shmem_huge == SHMEM_HUGE_FORCE)
>   		return true;
> 


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 4/5] mm/huge_memory.c: remove unnecessary tlb_remove_page_size() for huge zero pmd
  2021-04-29 13:26 ` [PATCH v2 4/5] mm/huge_memory.c: remove unnecessary tlb_remove_page_size() for huge zero pmd Miaohe Lin
@ 2021-04-29 15:02   ` David Hildenbrand
  2021-04-29 17:55   ` Yang Shi
  1 sibling, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2021-04-29 15:02 UTC (permalink / raw)
  To: Miaohe Lin, akpm
  Cc: ziy, william.kucharski, willy, yang.shi, aneesh.kumar, rcampbell,
	songliubraving, kirill.shutemov, riel, hannes, minchan,
	linux-kernel, linux-mm, Aaron Lu

On 29.04.21 15:26, Miaohe Lin wrote:
> Commit aa88b68c3b1d ("thp: keep huge zero page pinned until tlb flush")
> introduced tlb_remove_page() for huge zero page to keep it pinned until
> flush is complete and prevents the page from being split under us. But
> huge zero page is kept pinned until all relevant mm_users reach zero since
> the commit 6fcb52a56ff6 ("thp: reduce usage of huge zero page's atomic
> counter"). So tlb_remove_page_size() for huge zero pmd is unnecessary now.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>   mm/huge_memory.c | 3 ---
>   1 file changed, 3 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index e24a96de2e37..af30338ac49c 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1680,12 +1680,9 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>   		if (arch_needs_pgtable_deposit())
>   			zap_deposited_table(tlb->mm, pmd);
>   		spin_unlock(ptl);
> -		if (is_huge_zero_pmd(orig_pmd))
> -			tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
>   	} else if (is_huge_zero_pmd(orig_pmd)) {
>   		zap_deposited_table(tlb->mm, pmd);
>   		spin_unlock(ptl);
> -		tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
>   	} else {
>   		struct page *page = NULL;
>   		int flush_needed = 1;
> 

This sounds sane to me

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/5] mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled()
  2021-04-29 14:57   ` David Hildenbrand
@ 2021-04-29 16:23     ` Yang Shi
  2021-04-30  1:57     ` Miaohe Lin
  1 sibling, 0 replies; 16+ messages in thread
From: Yang Shi @ 2021-04-29 16:23 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Miaohe Lin, Andrew Morton, Zi Yan, william.kucharski,
	Matthew Wilcox, Yang Shi, aneesh.kumar, Ralph Campbell, Song Liu,
	Kirill A. Shutemov, Rik van Riel, Johannes Weiner, Minchan Kim,
	Linux Kernel Mailing List, Linux MM

On Thu, Apr 29, 2021 at 7:57 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 29.04.21 15:26, Miaohe Lin wrote:
> > Since commit 99cb0dbd47a1 ("mm,thp: add read-only THP support for
> > (non-shmem) FS"), read-only THP file mapping is supported. But it
> > forgot to add checking for it in transparent_hugepage_enabled().
> > To fix it, we add checking for read-only THP file mapping and also
> > introduce helper transhuge_vma_enabled() to check whether thp is
> > enabled for specified vma to reduce duplicated code.
> >
> > Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
> > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> > ---
> >   include/linux/huge_mm.h | 21 +++++++++++++++++----
> >   mm/huge_memory.c        |  6 ++++++
> >   mm/khugepaged.c         |  4 +---
> >   mm/shmem.c              |  3 +--
> >   4 files changed, 25 insertions(+), 9 deletions(-)
> >
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index 0a526f211fec..f460b74619fc 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -115,6 +115,16 @@ extern struct kobj_attribute shmem_enabled_attr;
> >
> >   extern unsigned long transparent_hugepage_flags;
> >
> > +static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
> > +                                       unsigned long vm_flags)
>
> You're passing the vma already, why do you pass vma->vm_flags
> separately? It's sufficient to pass in the vma only.

I do agree.

>
> > +{
> > +     /* Explicitly disabled through madvise. */
> > +     if ((vm_flags & VM_NOHUGEPAGE) ||
> > +         test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> > +             return false;
> > +     return true;
> > +}
> > +
> >   /*
> >    * to be used on vmas which are known to support THP.
> >    * Use transparent_hugepage_enabled otherwise
> > @@ -128,15 +138,12 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
> >       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
> >               return false;
> >
> > -     if (vma->vm_flags & VM_NOHUGEPAGE)
> > +     if (!transhuge_vma_enabled(vma, vma->vm_flags))
> >               return false;
> >
> >       if (vma_is_temporary_stack(vma))
> >               return false;
> >
> > -     if (test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> > -             return false;
> > -
> >       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
> >               return true;
> >
> > @@ -362,6 +369,12 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> >       return false;
> >   }
> >
> > +static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
> > +                                       unsigned long vm_flags)
> > +{
> > +     return false;
> > +}
> > +
> >   static inline void prep_transhuge_page(struct page *page) {}
> >
> >   static inline bool is_transparent_hugepage(struct page *page)
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 76ca1eb2a223..e24a96de2e37 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -68,12 +68,18 @@ bool transparent_hugepage_enabled(struct vm_area_struct *vma)
> >       /* The addr is used to check if the vma size fits */
> >       unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
> >
> > +     if (!transhuge_vma_enabled(vma, vma->vm_flags))
> > +             return false;
> >       if (!transhuge_vma_suitable(vma, addr))
> >               return false;
> >       if (vma_is_anonymous(vma))
> >               return __transparent_hugepage_enabled(vma);
> >       if (vma_is_shmem(vma))
> >               return shmem_huge_enabled(vma);
> > +     if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
> > +         !inode_is_open_for_write(vma->vm_file->f_inode) &&
> > +         (vma->vm_flags & VM_EXEC))
> > +             return true;
>
> Nit: I'm really wondering why we have 3 different functions that sound
> like they are doing the same thing
>
> transparent_hugepage_enabled(vma)

This one is called by smap code only which does check everything (for
both anonymous and shmem) to decide if allocating THP is possible.
Miaohe Lin's patch adds check for readonly FS THP.

> transhuge_vma_enabled()

This is the helper added by Miaohe Lin in this patch. It checks
VM_NOHUGEPAGE and MMF_DISABLE flags. It helps eliminate some duplicate
code. And this check is called at a couple of different places.

> transhuge_vma_suitable()

This one checks if vma is aligned properly. It may be better to rename
it to transhuge_vma_aligned(). It seems this check is just called by
page fault handler.

>
> Which check belongs where? Does it really have to be that complicated?
>
> >
> >       return false;
> >   }
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 6c0185fdd815..d97b20fad6e8 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -442,9 +442,7 @@ static inline int khugepaged_test_exit(struct mm_struct *mm)
> >   static bool hugepage_vma_check(struct vm_area_struct *vma,
> >                              unsigned long vm_flags)
> >   {
> > -     /* Explicitly disabled through madvise. */
> > -     if ((vm_flags & VM_NOHUGEPAGE) ||
> > -         test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> > +     if (!transhuge_vma_enabled(vma, vm_flags))
> >               return false;
> >
> >       /* Enabled via shmem mount options or sysfs settings. */
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index a08cedefbfaa..1dcbec313c70 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -4032,8 +4032,7 @@ bool shmem_huge_enabled(struct vm_area_struct *vma)
> >       loff_t i_size;
> >       pgoff_t off;
> >
> > -     if ((vma->vm_flags & VM_NOHUGEPAGE) ||
> > -         test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> > +     if (!transhuge_vma_enabled(vma, vma->vm_flags))
> >               return false;
> >       if (shmem_huge == SHMEM_HUGE_FORCE)
> >               return true;
> >
>
>
> --
> Thanks,
>
> David / dhildenb
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 4/5] mm/huge_memory.c: remove unnecessary tlb_remove_page_size() for huge zero pmd
  2021-04-29 13:26 ` [PATCH v2 4/5] mm/huge_memory.c: remove unnecessary tlb_remove_page_size() for huge zero pmd Miaohe Lin
  2021-04-29 15:02   ` David Hildenbrand
@ 2021-04-29 17:55   ` Yang Shi
  1 sibling, 0 replies; 16+ messages in thread
From: Yang Shi @ 2021-04-29 17:55 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Andrew Morton, Zi Yan, william.kucharski, Matthew Wilcox,
	Yang Shi, aneesh.kumar, Ralph Campbell, Song Liu,
	Kirill A. Shutemov, Rik van Riel, Johannes Weiner, Minchan Kim,
	Linux Kernel Mailing List, Linux MM

On Thu, Apr 29, 2021 at 6:27 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> Commit aa88b68c3b1d ("thp: keep huge zero page pinned until tlb flush")
> introduced tlb_remove_page() for huge zero page to keep it pinned until
> flush is complete and prevents the page from being split under us. But
> huge zero page is kept pinned until all relevant mm_users reach zero since
> the commit 6fcb52a56ff6 ("thp: reduce usage of huge zero page's atomic
> counter"). So tlb_remove_page_size() for huge zero pmd is unnecessary now.

By reading the git history, it seems the lifecycle of huge zero page
is bound to process instead of page table due to the latter commit.
The patch looks correct to me. Reviewed-by: Yang Shi
<shy828301@gmail.com>

>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/huge_memory.c | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index e24a96de2e37..af30338ac49c 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1680,12 +1680,9 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>                 if (arch_needs_pgtable_deposit())
>                         zap_deposited_table(tlb->mm, pmd);
>                 spin_unlock(ptl);
> -               if (is_huge_zero_pmd(orig_pmd))
> -                       tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
>         } else if (is_huge_zero_pmd(orig_pmd)) {
>                 zap_deposited_table(tlb->mm, pmd);
>                 spin_unlock(ptl);
> -               tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
>         } else {
>                 struct page *page = NULL;
>                 int flush_needed = 1;
> --
> 2.23.0
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/5] mm/huge_memory.c: remove dedicated macro HPAGE_CACHE_INDEX_MASK
  2021-04-29 14:48   ` David Hildenbrand
@ 2021-04-30  1:39     ` Miaohe Lin
  0 siblings, 0 replies; 16+ messages in thread
From: Miaohe Lin @ 2021-04-30  1:39 UTC (permalink / raw)
  To: David Hildenbrand, akpm
  Cc: ziy, william.kucharski, willy, yang.shi, aneesh.kumar, rcampbell,
	songliubraving, kirill.shutemov, riel, hannes, minchan,
	linux-kernel, linux-mm

On 2021/4/29 22:48, David Hildenbrand wrote:
> On 29.04.21 15:26, Miaohe Lin wrote:
>> Rewrite the pgoff checking logic to remove macro HPAGE_CACHE_INDEX_MASK
>> which is only used here to simplify the code.
>>
>> Reviewed-by: Yang Shi <shy828301@gmail.com>
>> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>   include/linux/huge_mm.h | 6 ++----
>>   1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 9626fda5efce..0a526f211fec 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -152,15 +152,13 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
>>     bool transparent_hugepage_enabled(struct vm_area_struct *vma);
>>   -#define HPAGE_CACHE_INDEX_MASK (HPAGE_PMD_NR - 1)
>> -
>>   static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
>>           unsigned long haddr)
>>   {
>>       /* Don't have to check pgoff for anonymous vma */
>>       if (!vma_is_anonymous(vma)) {
>> -        if (((vma->vm_start >> PAGE_SHIFT) & HPAGE_CACHE_INDEX_MASK) !=
>> -            (vma->vm_pgoff & HPAGE_CACHE_INDEX_MASK))
>> +        if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
>> +                HPAGE_PMD_NR))
>>               return false;
> 
> I'd have used
> 
> if (!IS_ALIGNED(PHYS_PFN(vma->vm_start) - vma->vm_pgoff,
> 

It's because I want keep the code style consistent with hugepage_vma_check().
There is similiar code in hugepage_vma_check():

	return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
			HPAGE_PMD_NR);

> Reviewed-by: David Hildenbrand <david@redhat.com>

Many thanks for review!

> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/5] mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled()
  2021-04-29 14:57   ` David Hildenbrand
  2021-04-29 16:23     ` Yang Shi
@ 2021-04-30  1:57     ` Miaohe Lin
  2021-04-30  7:49       ` David Hildenbrand
  1 sibling, 1 reply; 16+ messages in thread
From: Miaohe Lin @ 2021-04-30  1:57 UTC (permalink / raw)
  To: David Hildenbrand, akpm
  Cc: ziy, william.kucharski, willy, yang.shi, aneesh.kumar, rcampbell,
	songliubraving, kirill.shutemov, riel, hannes, minchan,
	linux-kernel, linux-mm

On 2021/4/29 22:57, David Hildenbrand wrote:
> On 29.04.21 15:26, Miaohe Lin wrote:
>> Since commit 99cb0dbd47a1 ("mm,thp: add read-only THP support for
>> (non-shmem) FS"), read-only THP file mapping is supported. But it
>> forgot to add checking for it in transparent_hugepage_enabled().
>> To fix it, we add checking for read-only THP file mapping and also
>> introduce helper transhuge_vma_enabled() to check whether thp is
>> enabled for specified vma to reduce duplicated code.
>>
>> Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>   include/linux/huge_mm.h | 21 +++++++++++++++++----
>>   mm/huge_memory.c        |  6 ++++++
>>   mm/khugepaged.c         |  4 +---
>>   mm/shmem.c              |  3 +--
>>   4 files changed, 25 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 0a526f211fec..f460b74619fc 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -115,6 +115,16 @@ extern struct kobj_attribute shmem_enabled_attr;
>>     extern unsigned long transparent_hugepage_flags;
>>   +static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
>> +                      unsigned long vm_flags)
> 
> You're passing the vma already, why do you pass vma->vm_flags separately? It's sufficient to pass in the vma only.
> 

Many thanks for comment! IMO, vm_flags may not always equal to vma->vm_flags. When hugepage_vma_check()
is called from collapse_pte_mapped_thp, vma_flags = vma->vm_flags | VM_HUGEPAGE. So I think we should
pass vm_flags here.

>> +{
>> +    /* Explicitly disabled through madvise. */
>> +    if ((vm_flags & VM_NOHUGEPAGE) ||
>> +        test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
>> +        return false;
>> +    return true;
>> +}
>> +
>>   /*
>>    * to be used on vmas which are known to support THP.
>>    * Use transparent_hugepage_enabled otherwise
>> @@ -128,15 +138,12 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
>>       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
>>           return false;
>>   -    if (vma->vm_flags & VM_NOHUGEPAGE)
>> +    if (!transhuge_vma_enabled(vma, vma->vm_flags))
>>           return false;
>>         if (vma_is_temporary_stack(vma))
>>           return false;
>>   -    if (test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
>> -        return false;
>> -
>>       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
>>           return true;
>>   @@ -362,6 +369,12 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
>>       return false;
>>   }
>>   +static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
>> +                      unsigned long vm_flags)
>> +{
>> +    return false;
>> +}
>> +
>>   static inline void prep_transhuge_page(struct page *page) {}
>>     static inline bool is_transparent_hugepage(struct page *page)
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 76ca1eb2a223..e24a96de2e37 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -68,12 +68,18 @@ bool transparent_hugepage_enabled(struct vm_area_struct *vma)
>>       /* The addr is used to check if the vma size fits */
>>       unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
>>   +    if (!transhuge_vma_enabled(vma, vma->vm_flags))
>> +        return false;
>>       if (!transhuge_vma_suitable(vma, addr))
>>           return false;
>>       if (vma_is_anonymous(vma))
>>           return __transparent_hugepage_enabled(vma);
>>       if (vma_is_shmem(vma))
>>           return shmem_huge_enabled(vma);
>> +    if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
>> +        !inode_is_open_for_write(vma->vm_file->f_inode) &&
>> +        (vma->vm_flags & VM_EXEC))
>> +        return true;
> 
> Nit: I'm really wondering why we have 3 different functions that sound like they are doing the same thing
> 
> transparent_hugepage_enabled(vma)
> transhuge_vma_enabled()
> transhuge_vma_suitable()
> 
> Which check belongs where? Does it really have to be that complicated?
> 

IMO, transhuge_vma_suitable() checks whether pgoff , vm_start and vm_end is possible for thp.
transhuge_vma_enabled() checks whether thp is explicitly disabled through madvise.
And transparent_hugepage_enabled() use these helpers to get the conclusion whether thp is
enabled for specified vma.

Any suggestions?
Thanks again!

>>         return false;
>>   }
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 6c0185fdd815..d97b20fad6e8 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -442,9 +442,7 @@ static inline int khugepaged_test_exit(struct mm_struct *mm)
>>   static bool hugepage_vma_check(struct vm_area_struct *vma,
>>                      unsigned long vm_flags)
>>   {
>> -    /* Explicitly disabled through madvise. */
>> -    if ((vm_flags & VM_NOHUGEPAGE) ||
>> -        test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
>> +    if (!transhuge_vma_enabled(vma, vm_flags))
>>           return false;
>>         /* Enabled via shmem mount options or sysfs settings. */
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index a08cedefbfaa..1dcbec313c70 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -4032,8 +4032,7 @@ bool shmem_huge_enabled(struct vm_area_struct *vma)
>>       loff_t i_size;
>>       pgoff_t off;
>>   -    if ((vma->vm_flags & VM_NOHUGEPAGE) ||
>> -        test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
>> +    if (!transhuge_vma_enabled(vma, vma->vm_flags))
>>           return false;
>>       if (shmem_huge == SHMEM_HUGE_FORCE)
>>           return true;
>>
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/5] mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled()
  2021-04-30  1:57     ` Miaohe Lin
@ 2021-04-30  7:49       ` David Hildenbrand
  2021-04-30  8:20         ` Miaohe Lin
  0 siblings, 1 reply; 16+ messages in thread
From: David Hildenbrand @ 2021-04-30  7:49 UTC (permalink / raw)
  To: Miaohe Lin, akpm
  Cc: ziy, william.kucharski, willy, yang.shi, aneesh.kumar, rcampbell,
	songliubraving, kirill.shutemov, riel, hannes, minchan,
	linux-kernel, linux-mm

On 30.04.21 03:57, Miaohe Lin wrote:
> On 2021/4/29 22:57, David Hildenbrand wrote:
>> On 29.04.21 15:26, Miaohe Lin wrote:
>>> Since commit 99cb0dbd47a1 ("mm,thp: add read-only THP support for
>>> (non-shmem) FS"), read-only THP file mapping is supported. But it
>>> forgot to add checking for it in transparent_hugepage_enabled().
>>> To fix it, we add checking for read-only THP file mapping and also
>>> introduce helper transhuge_vma_enabled() to check whether thp is
>>> enabled for specified vma to reduce duplicated code.
>>>
>>> Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>> ---
>>>    include/linux/huge_mm.h | 21 +++++++++++++++++----
>>>    mm/huge_memory.c        |  6 ++++++
>>>    mm/khugepaged.c         |  4 +---
>>>    mm/shmem.c              |  3 +--
>>>    4 files changed, 25 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index 0a526f211fec..f460b74619fc 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -115,6 +115,16 @@ extern struct kobj_attribute shmem_enabled_attr;
>>>      extern unsigned long transparent_hugepage_flags;
>>>    +static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
>>> +                      unsigned long vm_flags)
>>
>> You're passing the vma already, why do you pass vma->vm_flags separately? It's sufficient to pass in the vma only.
>>
> 
> Many thanks for comment! IMO, vm_flags may not always equal to vma->vm_flags. When hugepage_vma_check()
> is called from collapse_pte_mapped_thp, vma_flags = vma->vm_flags | VM_HUGEPAGE. So I think we should
> pass vm_flags here.

Oh, sorry, I missed the hugepage_vma_check() user. That's unfortunate.

>>>    static inline void prep_transhuge_page(struct page *page) {}
>>>      static inline bool is_transparent_hugepage(struct page *page)
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 76ca1eb2a223..e24a96de2e37 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -68,12 +68,18 @@ bool transparent_hugepage_enabled(struct vm_area_struct *vma)
>>>        /* The addr is used to check if the vma size fits */
>>>        unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
>>>    +    if (!transhuge_vma_enabled(vma, vma->vm_flags))
>>> +        return false;
>>>        if (!transhuge_vma_suitable(vma, addr))
>>>            return false;
>>>        if (vma_is_anonymous(vma))
>>>            return __transparent_hugepage_enabled(vma);
>>>        if (vma_is_shmem(vma))
>>>            return shmem_huge_enabled(vma);
>>> +    if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
>>> +        !inode_is_open_for_write(vma->vm_file->f_inode) &&
>>> +        (vma->vm_flags & VM_EXEC))
>>> +        return true;
>>
>> Nit: I'm really wondering why we have 3 different functions that sound like they are doing the same thing
>>
>> transparent_hugepage_enabled(vma)
>> transhuge_vma_enabled()
>> transhuge_vma_suitable()
>>
>> Which check belongs where? Does it really have to be that complicated?
>>
> 
> IMO, transhuge_vma_suitable() checks whether pgoff , vm_start and vm_end is possible for thp.
> transhuge_vma_enabled() checks whether thp is explicitly disabled through madvise.
> And transparent_hugepage_enabled() use these helpers to get the conclusion whether thp is
> enabled for specified vma.
> 
> Any suggestions?

transparent_hugepage_enabled() vs. transhuge_vma_enabled() is really 
sub-optimal naming. I guess "transparent_hugepage_active()" would have 
been clearer (enabled + suitable + applicable). Cannot really give a 
good suggestion here on how to name transhuge_vma_enabled() differently.


We now have

transparent_hugepage_enabled()
-> transhuge_vma_enabled()
-> __transparent_hugepage_enabled() -> transhuge_vma_enabled()
-> shmem_huge_enabled() -> transhuge_vma_enabled()

That looks sub-optimal as well. Maybe we should have a

static inline bool file_thp_enabled(struct vma *vma)
{
	return transhuge_vma_enabled() &&
	       IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
	       !inode_is_open_for_write(vma->vm_file->f_inode) &&
	       (vma->vm_flags & VM_EXEC))
}

and in transparent_hugepage_enabled() only do a

if (vma->vm_file)
	return file_thp_enabled(vma);


Or move the transhuge_vma_enabled() check completely to 
transparent_hugepage_enabled() if possible.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/5] mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled()
  2021-04-30  7:49       ` David Hildenbrand
@ 2021-04-30  8:20         ` Miaohe Lin
  0 siblings, 0 replies; 16+ messages in thread
From: Miaohe Lin @ 2021-04-30  8:20 UTC (permalink / raw)
  To: David Hildenbrand, akpm
  Cc: ziy, william.kucharski, willy, yang.shi, aneesh.kumar, rcampbell,
	songliubraving, kirill.shutemov, riel, hannes, minchan,
	linux-kernel, linux-mm

On 2021/4/30 15:49, David Hildenbrand wrote:
> On 30.04.21 03:57, Miaohe Lin wrote:
>> On 2021/4/29 22:57, David Hildenbrand wrote:
>>> On 29.04.21 15:26, Miaohe Lin wrote:
>>>> Since commit 99cb0dbd47a1 ("mm,thp: add read-only THP support for
>>>> (non-shmem) FS"), read-only THP file mapping is supported. But it
>>>> forgot to add checking for it in transparent_hugepage_enabled().
>>>> To fix it, we add checking for read-only THP file mapping and also
>>>> introduce helper transhuge_vma_enabled() to check whether thp is
>>>> enabled for specified vma to reduce duplicated code.
>>>>
>>>> Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>> ---
>>>>    include/linux/huge_mm.h | 21 +++++++++++++++++----
>>>>    mm/huge_memory.c        |  6 ++++++
>>>>    mm/khugepaged.c         |  4 +---
>>>>    mm/shmem.c              |  3 +--
>>>>    4 files changed, 25 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>> index 0a526f211fec..f460b74619fc 100644
>>>> --- a/include/linux/huge_mm.h
>>>> +++ b/include/linux/huge_mm.h
>>>> @@ -115,6 +115,16 @@ extern struct kobj_attribute shmem_enabled_attr;
>>>>      extern unsigned long transparent_hugepage_flags;
>>>>    +static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
>>>> +                      unsigned long vm_flags)
>>>
>>> You're passing the vma already, why do you pass vma->vm_flags separately? It's sufficient to pass in the vma only.
>>>
>>
>> Many thanks for comment! IMO, vm_flags may not always equal to vma->vm_flags. When hugepage_vma_check()
>> is called from collapse_pte_mapped_thp, vma_flags = vma->vm_flags | VM_HUGEPAGE. So I think we should
>> pass vm_flags here.
> 
> Oh, sorry, I missed the hugepage_vma_check() user. That's unfortunate.

Yes, that's unfortunate.

> 
>>>>    static inline void prep_transhuge_page(struct page *page) {}
>>>>      static inline bool is_transparent_hugepage(struct page *page)
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 76ca1eb2a223..e24a96de2e37 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -68,12 +68,18 @@ bool transparent_hugepage_enabled(struct vm_area_struct *vma)
>>>>        /* The addr is used to check if the vma size fits */
>>>>        unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
>>>>    +    if (!transhuge_vma_enabled(vma, vma->vm_flags))
>>>> +        return false;
>>>>        if (!transhuge_vma_suitable(vma, addr))
>>>>            return false;
>>>>        if (vma_is_anonymous(vma))
>>>>            return __transparent_hugepage_enabled(vma);
>>>>        if (vma_is_shmem(vma))
>>>>            return shmem_huge_enabled(vma);
>>>> +    if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
>>>> +        !inode_is_open_for_write(vma->vm_file->f_inode) &&
>>>> +        (vma->vm_flags & VM_EXEC))
>>>> +        return true;
>>>
>>> Nit: I'm really wondering why we have 3 different functions that sound like they are doing the same thing
>>>
>>> transparent_hugepage_enabled(vma)
>>> transhuge_vma_enabled()
>>> transhuge_vma_suitable()
>>>
>>> Which check belongs where? Does it really have to be that complicated?
>>>
>>
>> IMO, transhuge_vma_suitable() checks whether pgoff , vm_start and vm_end is possible for thp.
>> transhuge_vma_enabled() checks whether thp is explicitly disabled through madvise.
>> And transparent_hugepage_enabled() use these helpers to get the conclusion whether thp is
>> enabled for specified vma.
>>
>> Any suggestions?
> 
> transparent_hugepage_enabled() vs. transhuge_vma_enabled() is really sub-optimal naming. I guess "transparent_hugepage_active()" would have been clearer (enabled + suitable + applicable). Cannot really give a good suggestion here on how to name transhuge_vma_enabled() differently.
> 

I think transparent_hugepage_active() sounds better too.

> 
> We now have
> 
> transparent_hugepage_enabled()
> -> transhuge_vma_enabled()
> -> __transparent_hugepage_enabled() -> transhuge_vma_enabled()
> -> shmem_huge_enabled() -> transhuge_vma_enabled()
> 
> That looks sub-optimal as well. Maybe we should have a
> 
> static inline bool file_thp_enabled(struct vma *vma)
> {
>     return transhuge_vma_enabled() &&
>            IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>            !inode_is_open_for_write(vma->vm_file->f_inode) &&
>            (vma->vm_flags & VM_EXEC))
> }
> 
> and in transparent_hugepage_enabled() only do a
> 
> if (vma->vm_file)
>     return file_thp_enabled(vma);
> 

Really good suggestion! I would try to do this one in next version. Many thanks for your time and suggestion! :)

> 
> Or move the transhuge_vma_enabled() check completely to transparent_hugepage_enabled() if possible.
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-04-30  8:20 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-29 13:26 [PATCH v2 0/5] Cleanup and fixup for huge_memory Miaohe Lin
2021-04-29 13:26 ` [PATCH v2 1/5] mm/huge_memory.c: remove dedicated macro HPAGE_CACHE_INDEX_MASK Miaohe Lin
2021-04-29 14:48   ` David Hildenbrand
2021-04-30  1:39     ` Miaohe Lin
2021-04-29 13:26 ` [PATCH v2 2/5] mm/huge_memory.c: use page->deferred_list Miaohe Lin
2021-04-29 14:52   ` David Hildenbrand
2021-04-29 13:26 ` [PATCH v2 3/5] mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled() Miaohe Lin
2021-04-29 14:57   ` David Hildenbrand
2021-04-29 16:23     ` Yang Shi
2021-04-30  1:57     ` Miaohe Lin
2021-04-30  7:49       ` David Hildenbrand
2021-04-30  8:20         ` Miaohe Lin
2021-04-29 13:26 ` [PATCH v2 4/5] mm/huge_memory.c: remove unnecessary tlb_remove_page_size() for huge zero pmd Miaohe Lin
2021-04-29 15:02   ` David Hildenbrand
2021-04-29 17:55   ` Yang Shi
2021-04-29 13:26 ` [PATCH v2 5/5] mm/huge_memory.c: don't discard hugepage if other processes are mapping it Miaohe Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).