linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Miaohe Lin <linmiaohe@huawei.com>
To: Yang Shi <shy828301@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Zi Yan <ziy@nvidia.com>,
	William Kucharski <william.kucharski@oracle.com>,
	Matthew Wilcox <willy@infradead.org>,
	Yang Shi <yang.shi@linux.alibaba.com>,
	Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>,
	Ralph Campbell <rcampbell@nvidia.com>,
	Song Liu <songliubraving@fb.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Rik van Riel <riel@surriel.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Minchan Kim <minchan@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	David Hildenbrand <david@redhat.com>
Subject: Re: [PATCH v4] mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled()
Date: Fri, 25 Feb 2022 11:47:40 +0800	[thread overview]
Message-ID: <d6780754-4938-2a55-60f9-f2d1777e82da@huawei.com> (raw)
In-Reply-To: <CAHbLzkpfN+4uN6za=it4oXWU1ySKMKad3mXMqdcuTVWyyXpSPA@mail.gmail.com>

On 2022/2/25 11:24, Yang Shi wrote:
> On Thu, Feb 24, 2022 at 6:38 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> On 2022/2/25 5:56, Yang Shi wrote:
>>> On Thu, Feb 24, 2022 at 1:29 PM Yang Shi <shy828301@gmail.com> wrote:
>>>>
>>>> On Thu, Feb 24, 2022 at 10:51 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>>>>>
>>>>> On 5/14/21 11:30, Miaohe Lin wrote:
>>>>>> Since commit 99cb0dbd47a1 ("mm,thp: add read-only THP support for
>>>>>> (non-shmem) FS"), read-only THP file mapping is supported. But it
>>>>>> forgot to add checking for it in transparent_hugepage_enabled().
>>>>>> To fix it, we add checking for read-only THP file mapping and also
>>>>>> introduce helper transhuge_vma_enabled() to check whether thp is
>>>>>> enabled for specified vma to reduce duplicated code. We rename
>>>>>> transparent_hugepage_enabled to transparent_hugepage_active to make
>>>>>> the code easier to follow as suggested by David Hildenbrand.
>>>>>>
>>>>>> Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
>>>>>> Reviewed-by: Yang Shi <shy828301@gmail.com>
>>>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>>>
>>>>> FYI, I stumbled upon needing this for my reproducer for the read()
>>>>> corruption [1] to work, and I think it's still not ideal. I also needed
>>>>> madvise(MADV_HUGEPAGE) for the executable file mapping to make sure
>>>>> khugepaged acts on my process. Yeah, commit 99cb0dbd47a1 suggests madvise.
>>>>> Yet, khugepaged will happily collapse file mappings even without madvise
>>>>> anyway. However, it might not know about the mm at all unless
>>>>> __khugepaged_enter() has been applied at least on one of its vma's.
>>>>> madvise() is one way to do it for executable file mappings, but it can also
>>>>> happen through e.g. do_huge_pmd_anonymous_page() on another mapping, which
>>>>> has nothing to do with the file mapping.
>>>>> So what I'm trying to say is that we are somewhat inconsistent - the rules
>>>>> to consider a vma in khugepaged seem to be clear and result of admin
>>>>> configuration and madvise, but the rules to consider a mm for khugepaged
>>>>> (which is the only way to collapse file mappings) are not completely
>>>>> identical and there might be "random luck" involved.
>>>>
>>>> Yes, khugepaged_enter() is not called for file vma explicitly. My wild
>>>> guess is THP for readonly fs was assumed to be collapsed by explicit
>>>> demand from the users, for example, madvise.
>>>>
>>>> To achieve more consistent behavior we could call khugepaged_enter()
>>>> for file fault just like what huge anonymous fault does. Of course we
>>>> need to fix khugepaged_enter() to do the proper check as well.
>>>
>>> Or we could follow shmem's implementation to call khugepaged_enter()
>>> in .mmap method for each filesystem so that khugepaged_enter() doesn't
>>> have to be called for each filemap_fault(). Luckily most filesystems
>>> use generic_file_mmap().
>>>
>>
>> I think I get the point. We need to fix the inconsistent behavior between file
>> vma and anon vma where khugepaged_enter() is only called for anon vma explicitly.
>> Am I supposed to do this or Yang Shi would kindly help do this ?
> 
> I will take care of it. I have been working on a prototype.
> 

Thanks for letting me know this to avoid my future duplicated work. And thanks for
working on this. :)

>>
>> Many thanks both of you. :)
>>
>>>>
>>>>>
>>>>> [1] https://lore.kernel.org/all/df3b5d1c-a36b-2c73-3e27-99e74983de3a@suse.cz/
>>>>>
>>>>>> ---
>>>>>> v3->v4:
>>>>>>   collect Reviewed-by tag
>>>>>>   define transhuge_vma_enabled next to transhuge_vma_suitable
>>>>>> ---
>>>>>>  fs/proc/task_mmu.c      |  2 +-
>>>>>>  include/linux/huge_mm.h | 57 +++++++++++++++++++++++++----------------
>>>>>>  mm/huge_memory.c        | 11 +++++++-
>>>>>>  mm/khugepaged.c         |  4 +--
>>>>>>  mm/shmem.c              |  3 +--
>>>>>>  5 files changed, 48 insertions(+), 29 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>>>>>> index fc9784544b24..7389df326edd 100644
>>>>>> --- a/fs/proc/task_mmu.c
>>>>>> +++ b/fs/proc/task_mmu.c
>>>>>> @@ -832,7 +832,7 @@ static int show_smap(struct seq_file *m, void *v)
>>>>>>       __show_smap(m, &mss, false);
>>>>>>
>>>>>>       seq_printf(m, "THPeligible:    %d\n",
>>>>>> -                transparent_hugepage_enabled(vma));
>>>>>> +                transparent_hugepage_active(vma));
>>>>>>
>>>>>>       if (arch_pkeys_enabled())
>>>>>>               seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
>>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>>>> index 0a526f211fec..7b7f7b52ccb8 100644
>>>>>> --- a/include/linux/huge_mm.h
>>>>>> +++ b/include/linux/huge_mm.h
>>>>>> @@ -115,9 +115,34 @@ extern struct kobj_attribute shmem_enabled_attr;
>>>>>>
>>>>>>  extern unsigned long transparent_hugepage_flags;
>>>>>>
>>>>>> +static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
>>>>>> +             unsigned long haddr)
>>>>>> +{
>>>>>> +     /* Don't have to check pgoff for anonymous vma */
>>>>>> +     if (!vma_is_anonymous(vma)) {
>>>>>> +             if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
>>>>>> +                             HPAGE_PMD_NR))
>>>>>> +                     return false;
>>>>>> +     }
>>>>>> +
>>>>>> +     if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end)
>>>>>> +             return false;
>>>>>> +     return true;
>>>>>> +}
>>>>>> +
>>>>>> +static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
>>>>>> +                                       unsigned long vm_flags)
>>>>>> +{
>>>>>> +     /* Explicitly disabled through madvise. */
>>>>>> +     if ((vm_flags & VM_NOHUGEPAGE) ||
>>>>>> +         test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
>>>>>> +             return false;
>>>>>> +     return true;
>>>>>> +}
>>>>>> +
>>>>>>  /*
>>>>>>   * to be used on vmas which are known to support THP.
>>>>>> - * Use transparent_hugepage_enabled otherwise
>>>>>> + * Use transparent_hugepage_active otherwise
>>>>>>   */
>>>>>>  static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
>>>>>>  {
>>>>>> @@ -128,15 +153,12 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
>>>>>>       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
>>>>>>               return false;
>>>>>>
>>>>>> -     if (vma->vm_flags & VM_NOHUGEPAGE)
>>>>>> +     if (!transhuge_vma_enabled(vma, vma->vm_flags))
>>>>>>               return false;
>>>>>>
>>>>>>       if (vma_is_temporary_stack(vma))
>>>>>>               return false;
>>>>>>
>>>>>> -     if (test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
>>>>>> -             return false;
>>>>>> -
>>>>>>       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
>>>>>>               return true;
>>>>>>
>>>>>> @@ -150,22 +172,7 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
>>>>>>       return false;
>>>>>>  }
>>>>>>
>>>>>> -bool transparent_hugepage_enabled(struct vm_area_struct *vma);
>>>>>> -
>>>>>> -static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
>>>>>> -             unsigned long haddr)
>>>>>> -{
>>>>>> -     /* Don't have to check pgoff for anonymous vma */
>>>>>> -     if (!vma_is_anonymous(vma)) {
>>>>>> -             if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
>>>>>> -                             HPAGE_PMD_NR))
>>>>>> -                     return false;
>>>>>> -     }
>>>>>> -
>>>>>> -     if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end)
>>>>>> -             return false;
>>>>>> -     return true;
>>>>>> -}
>>>>>> +bool transparent_hugepage_active(struct vm_area_struct *vma);
>>>>>>
>>>>>>  #define transparent_hugepage_use_zero_page()                         \
>>>>>>       (transparent_hugepage_flags &                                   \
>>>>>> @@ -351,7 +358,7 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
>>>>>>       return false;
>>>>>>  }
>>>>>>
>>>>>> -static inline bool transparent_hugepage_enabled(struct vm_area_struct *vma)
>>>>>> +static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
>>>>>>  {
>>>>>>       return false;
>>>>>>  }
>>>>>> @@ -362,6 +369,12 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
>>>>>>       return false;
>>>>>>  }
>>>>>>
>>>>>> +static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
>>>>>> +                                       unsigned long vm_flags)
>>>>>> +{
>>>>>> +     return false;
>>>>>> +}
>>>>>> +
>>>>>>  static inline void prep_transhuge_page(struct page *page) {}
>>>>>>
>>>>>>  static inline bool is_transparent_hugepage(struct page *page)
>>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>>> index 76ca1eb2a223..4f37867eed12 100644
>>>>>> --- a/mm/huge_memory.c
>>>>>> +++ b/mm/huge_memory.c
>>>>>> @@ -63,7 +63,14 @@ static struct shrinker deferred_split_shrinker;
>>>>>>  static atomic_t huge_zero_refcount;
>>>>>>  struct page *huge_zero_page __read_mostly;
>>>>>>
>>>>>> -bool transparent_hugepage_enabled(struct vm_area_struct *vma)
>>>>>> +static inline bool file_thp_enabled(struct vm_area_struct *vma)
>>>>>> +{
>>>>>> +     return transhuge_vma_enabled(vma, vma->vm_flags) && vma->vm_file &&
>>>>>> +            !inode_is_open_for_write(vma->vm_file->f_inode) &&
>>>>>> +            (vma->vm_flags & VM_EXEC);
>>>>>> +}
>>>>>> +
>>>>>> +bool transparent_hugepage_active(struct vm_area_struct *vma)
>>>>>>  {
>>>>>>       /* The addr is used to check if the vma size fits */
>>>>>>       unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
>>>>>> @@ -74,6 +81,8 @@ bool transparent_hugepage_enabled(struct vm_area_struct *vma)
>>>>>>               return __transparent_hugepage_enabled(vma);
>>>>>>       if (vma_is_shmem(vma))
>>>>>>               return shmem_huge_enabled(vma);
>>>>>> +     if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
>>>>>> +             return file_thp_enabled(vma);
>>>>>>
>>>>>>       return false;
>>>>>>  }
>>>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>>>> index 6c0185fdd815..d97b20fad6e8 100644
>>>>>> --- a/mm/khugepaged.c
>>>>>> +++ b/mm/khugepaged.c
>>>>>> @@ -442,9 +442,7 @@ static inline int khugepaged_test_exit(struct mm_struct *mm)
>>>>>>  static bool hugepage_vma_check(struct vm_area_struct *vma,
>>>>>>                              unsigned long vm_flags)
>>>>>>  {
>>>>>> -     /* Explicitly disabled through madvise. */
>>>>>> -     if ((vm_flags & VM_NOHUGEPAGE) ||
>>>>>> -         test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
>>>>>> +     if (!transhuge_vma_enabled(vma, vm_flags))
>>>>>>               return false;
>>>>>>
>>>>>>       /* Enabled via shmem mount options or sysfs settings. */
>>>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>>>> index a08cedefbfaa..1dcbec313c70 100644
>>>>>> --- a/mm/shmem.c
>>>>>> +++ b/mm/shmem.c
>>>>>> @@ -4032,8 +4032,7 @@ bool shmem_huge_enabled(struct vm_area_struct *vma)
>>>>>>       loff_t i_size;
>>>>>>       pgoff_t off;
>>>>>>
>>>>>> -     if ((vma->vm_flags & VM_NOHUGEPAGE) ||
>>>>>> -         test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
>>>>>> +     if (!transhuge_vma_enabled(vma, vma->vm_flags))
>>>>>>               return false;
>>>>>>       if (shmem_huge == SHMEM_HUGE_FORCE)
>>>>>>               return true;
>>>>>
>>> .
>>>
>>
> .
> 



      reply	other threads:[~2022-02-25  3:47 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-14  9:30 [PATCH v4] mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled() Miaohe Lin
2022-02-24 18:51 ` Vlastimil Babka
2022-02-24 21:29   ` Yang Shi
2022-02-24 21:56     ` Yang Shi
2022-02-25  2:38       ` Miaohe Lin
2022-02-25  3:24         ` Yang Shi
2022-02-25  3:47           ` Miaohe Lin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d6780754-4938-2a55-60f9-f2d1777e82da@huawei.com \
    --to=linmiaohe@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=rcampbell@nvidia.com \
    --cc=riel@surriel.com \
    --cc=shy828301@gmail.com \
    --cc=songliubraving@fb.com \
    --cc=vbabka@suse.cz \
    --cc=william.kucharski@oracle.com \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).