From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: David Hildenbrand <david@redhat.com>,
Mike Kravetz <mike.kravetz@oracle.com>
Cc: akpm@linux-foundation.org, songmuchun@bytedance.com,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 1/5] mm/hugetlb: fix races when looking up a CONT-PTE size hugetlb page
Date: Wed, 24 Aug 2022 23:06:25 +0800 [thread overview]
Message-ID: <22585fc8-b0bc-0e14-d121-2767cd178424@linux.alibaba.com> (raw)
In-Reply-To: <0f736dc5-1798-10ad-c506-9a2a38841359@redhat.com>
On 8/24/2022 10:33 PM, David Hildenbrand wrote:
> On 24.08.22 16:30, Baolin Wang wrote:
>>
>>
>> On 8/24/2022 7:55 PM, David Hildenbrand wrote:
>>> On 24.08.22 11:41, Baolin Wang wrote:
>>>>
>>>>
>>>> On 8/24/2022 3:31 PM, David Hildenbrand wrote:
>>>>>>>>>
>>>>>>>>> IMHO, these follow_huge_xxx() functions are arch-specified at first and
>>>>>>>>> were moved into the common hugetlb.c by commit 9e5fc74c3025 ("mm:
>>>>>>>>> hugetlb: Copy general hugetlb code from x86 to mm"), and now there are
>>>>>>>>> still some arch-specified follow_huge_xxx() definition, for example:
>>>>>>>>> ia64: follow_huge_addr
>>>>>>>>> powerpc: follow_huge_pd
>>>>>>>>> s390: follow_huge_pud
>>>>>>>>>
>>>>>>>>> What I mean is that follow_hugetlb_page() is a common and
>>>>>>>>> not-arch-specified function, is it suitable to change it to be
>>>>>>>>> arch-specified?
>>>>>>>>> And thinking more, can we rename follow_hugetlb_page() as
>>>>>>>>> hugetlb_page_faultin() and simplify it to only handle the page faults of
>>>>>>>>> hugetlb like the faultin_page() for normal page? That means we can make
>>>>>>>>> sure only follow_page_mask() can handle hugetlb.
>>>>>>>>>
>>>>>>>
>>>>>>> Something like that might work, but you still have two page table walkers
>>>>>>> for hugetlb. I like David's idea (if I understand it correctly) of
>>>>>>
>>>>>> What I mean is we may change the hugetlb handling like normal page:
>>>>>> 1) use follow_page_mask() to look up a hugetlb firstly.
>>>>>> 2) if can not get the hugetlb, then try to page fault by
>>>>>> hugetlb_page_faultin().
>>>>>> 3) if page fault successed, then retry to find hugetlb by
>>>>>> follow_page_mask().
>>>>>
>>>>> That implies putting more hugetlbfs special code into generic GUP,
>>>>> turning it even more complicated. But of course, it depends on how the
>>>>> end result looks like. My gut feeling was that hugetlb is better handled
>>>>> in follow_hugetlb_page() separately (just like we do with a lot of other
>>>>> page table walkers).
>>>>
>>>> OK, fair enough.
>>>>
>>>>>>
>>>>>> Just a rough thought, and I need more investigation for my idea and
>>>>>> David's idea.
>>>>>>
>>>>>>> using follow_hugetlb_page for both cases. As noted, it will need to be
>>>>>>> taught how to not trigger faults in the follow_page_mask case.
>>>>>>
>>>>>> Anyway, I also agree we need some cleanup, and firstly I think we should
>>>>>> cleanup these arch-specified follow_huge_xxx() on some architectures
>>>>>> which are similar with the common ones. I will look into these.
>>>>>
>>>>> There was a recent discussion on that, e.g.:
>>>>>
>>>>> https://lkml.kernel.org/r/20220818135717.609eef8a@thinkpad
>>>>
>>>> Thanks.
>>>>
>>>>>
>>>>>>
>>>>>> However, considering cleanup may need more investigation and
>>>>>> refactoring, now I prefer to make these bug-fix patches of this patchset
>>>>>> into mainline firstly, which are suitable to backport to old version to
>>>>>> fix potential race issues. Mike and David, how do you think? Could you
>>>>>> help to review these patches? Thanks.
>>>>>
>>>>> Patch #1 certainly add more special code just to handle another hugetlb
>>>>> corner case (CONT pages), and maybe just making it all use
>>>>> follow_hugetlb_page() would be even cleaner and less error prone.
>>>>>
>>>>> I agree that locking is shaky, but I'm not sure if we really want to
>>>>> backport this to stable trees:
>>>>>
>>>>> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
>>>>>
>>>>> "It must fix a real bug that bothers people (not a, “This could be a
>>>>> problem...” type thing)."
>>>>>
>>>>>
>>>>> Do we actually have any instance of this being a real (and not a
>>>>> theoretical) problem? If not, I'd rather clean it all up right away.
>>>>
>>>> I think this is a real problem (not theoretical), and easy to write some
>>>> code to show the issue. For example, suppose thread A is trying to look
>>>> up a CONT-PTE size hugetlb page under the lock, however antoher thread B
>>>> can migrate the CONT-PTE hugetlb page at the same time, which will cause
>>>> thread A to get an incorrect page, if thread A want to do something for
>>>> this incorrect page, error occurs.
>>>>
>>>> Actually we also want to backport these fixes to the distro with old
>>>> kernel versions to make the hugetlb more stable. Otherwise we must hit
>>>> these issues sooner or later if the customers use CONT-PTE/PMD hugetlb.
>>>>
>>>> Anyway, if you and Mike still think these issues are not important
>>>> enough to be fixed in the old versions, I can do the cleanup firstly.
>>>>
>>>
>>> [asking myself which follow_page() users actually care about hugetlb,
>>> and why we need this handling in follow_page at all]
>>>
>>> Which follow_page() user do we care about here? Primarily mm/migrate.c
>>> only I assume?
>>
>> Right, mainly affects the move_pages() syscall I think. Yes, I can not
>> know all of the users of the move_pages() syscall now or in the future
>> in our data center, but like I said the move_pages() syscall + hugetlb
>> can be a real potential stability issue.
>>
>
> I wonder if we can get rid of follow_page() completely, there are not
> too many users. Or alternatively simply make it use general GUP
> infrastructure more clearly. We'd need something like FOLL_NOFAULT that
> also covers "absolutely no faults".
I am not sure I get your point. So you want change to use
__get_user_pages() (or silimar wrappers) to look up a normal page or
hugetlb instead of follow_page()? and adding a new FOLL_NOFAULT flag to
__get_user_pages().
If I understand correctly, we still need more work to move those
arch-specified follow_huge_xxx() into follow_hugetlb_page() firstly like
we disscussed before? Which seems not backportable too.
I am not againt your idea, and I also agree that we should do some
cleanup. But the point is if we need backport patches to fix this issue,
which affects move_pages() syscall, if the answer is yes, I think my
current fixes are suitable to backport.
next prev parent reply other threads:[~2022-08-24 15:06 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-23 7:50 [PATCH v2 0/5] Fix some issues when looking up hugetlb page Baolin Wang
2022-08-23 7:50 ` [PATCH v2 1/5] mm/hugetlb: fix races when looking up a CONT-PTE size " Baolin Wang
2022-08-23 8:29 ` David Hildenbrand
2022-08-23 10:02 ` Baolin Wang
2022-08-23 10:23 ` David Hildenbrand
2022-08-23 23:55 ` Mike Kravetz
2022-08-24 2:06 ` Baolin Wang
2022-08-24 7:31 ` David Hildenbrand
2022-08-24 9:41 ` Baolin Wang
2022-08-24 11:55 ` David Hildenbrand
2022-08-24 14:30 ` Baolin Wang
2022-08-24 14:33 ` David Hildenbrand
2022-08-24 15:06 ` Baolin Wang [this message]
2022-08-24 15:13 ` David Hildenbrand
2022-08-24 15:23 ` Baolin Wang
2022-08-24 23:34 ` Mike Kravetz
2022-08-25 1:43 ` Baolin Wang
2022-08-25 7:10 ` David Hildenbrand
2022-08-25 7:58 ` Baolin Wang
2022-08-25 18:30 ` Mike Kravetz
2022-08-25 7:25 ` David Hildenbrand
2022-08-25 10:54 ` Baolin Wang
2022-08-25 21:13 ` Mike Kravetz
2022-08-26 22:40 ` Mike Kravetz
2022-08-27 13:59 ` Aneesh Kumar K.V
2022-08-29 18:30 ` Mike Kravetz
2022-08-23 7:50 ` [PATCH v2 2/5] mm/hugetlb: use PTE page lock to protect CONT-PTE entries Baolin Wang
2022-08-23 7:50 ` [PATCH v2 3/5] mm/hugetlb: fix races when looking up a CONT-PMD size hugetlb page Baolin Wang
2022-08-23 7:50 ` [PATCH v2 4/5] mm/hugetlb: use PMD page lock to protect CONT-PTE entries Baolin Wang
2022-08-23 8:14 ` David Hildenbrand
2022-08-23 10:12 ` Baolin Wang
2022-08-23 7:50 ` [PATCH v2 5/5] mm/hugetlb: add FOLL_MIGRATION validation before waiting for a migration entry Baolin Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=22585fc8-b0bc-0e14-d121-2767cd178424@linux.alibaba.com \
--to=baolin.wang@linux.alibaba.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=songmuchun@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).