linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Linux 5.19 __NR_move_pages failed for hugepage
       [not found] <BYAPR11MB3495837E180867B47E551EFCF7649@BYAPR11MB3495.namprd11.prod.outlook.com>
@ 2022-08-12  1:59 ` Miaohe Lin
  2022-08-12  3:04   ` Wang, Haiyue
  0 siblings, 1 reply; 4+ messages in thread
From: Miaohe Lin @ 2022-08-12  1:59 UTC (permalink / raw)
  To: Wang, Haiyue
  Cc: akpm, Linux-MM, linux-kernel, Naoya Horiguchi, David Hildenbrand

On 2022/8/11 16:01, Wang, Haiyue wrote:
> Hi Miaohe,
> 
>  

Hi Haiyue,

Many thanks for your report and debug.

> 
> When I call “*syscall (__NR_move_pages, 0, n_pages, ptr, 0, status, 0)*” to get the huge page node
> 
> information, it is failed with ‘-2’ returned in ‘status’ array.
> 
>  
> 
> After some debug, I found that “*follow_huge_pud*” will return NULL if ‘*FOLL_GET*’ is set.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e66f17ff71772b209eed39de35aaa99ba819c93d <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e66f17ff71772b209eed39de35aaa99ba819c93d>
> 
>  
> 
> This will make your patch doesn’t work for huge page.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd614841c06338a087769ee3cfa96718784d1f5 <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd614841c06338a087769ee3cfa96718784d1f5>
> 

Supporting of ‘*FOLL_GET*’ in follow_huge_pud is introduced via the below commit:

https://lore.kernel.org/all/20220714042420.1847125-9-naoya.horiguchi@linux.dev/T/#mb3c83df087fba454b7b4ea32227fb8775ca70081

But that's still not perfect yet. For s390 version of follow_huge_pud, FOLL_GET is still not supported. And pgd level
hugepage doesn't support FOLL_GET now.

>  
> 
> Not sure you know this issue or not, just share my debug information.

I'm not sure whether it's better to revert my above "problematic" patch first then add it back when all hugetlb pages support FOLL_GET.
Or we could just live with it? Any thoughts?


Thanks,
Miaohe Lin


> 
>  
> 
> BR,
> 
> Haiyue
> 
>  
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Linux 5.19 __NR_move_pages failed for hugepage
  2022-08-12  1:59 ` Linux 5.19 __NR_move_pages failed for hugepage Miaohe Lin
@ 2022-08-12  3:04   ` Wang, Haiyue
  2022-08-12  6:40     ` Miaohe Lin
  0 siblings, 1 reply; 4+ messages in thread
From: Wang, Haiyue @ 2022-08-12  3:04 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, Linux-MM, linux-kernel, Naoya Horiguchi, David Hildenbrand

> -----Original Message-----
> From: Miaohe Lin <linmiaohe@huawei.com>
> Sent: Friday, August 12, 2022 09:59
> To: Wang, Haiyue <haiyue.wang@intel.com>
> Cc: akpm@linux-foundation.org; Linux-MM <linux-mm@kvack.org>; linux-kernel <linux-
> kernel@vger.kernel.org>; Naoya Horiguchi <naoya.horiguchi@linux.dev>; David Hildenbrand
> <david@redhat.com>
> Subject: Re: Linux 5.19 __NR_move_pages failed for hugepage
> 
> On 2022/8/11 16:01, Wang, Haiyue wrote:
> > Hi Miaohe,
> >
> >
> 
> Hi Haiyue,
> 
> Many thanks for your report and debug.
> 
> >
> > When I call "*syscall (__NR_move_pages, 0, n_pages, ptr, 0, status, 0)*" to get the huge page node
> >
> > information, it is failed with '-2' returned in 'status' array.
> >
> >
> >
> > After some debug, I found that "*follow_huge_pud*" will return NULL if '*FOLL_GET*' is set.
> >
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e66f17ff71772b209eed39de
> 35aaa99ba819c93d
> <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e66f17ff71772b209eed39d
> e35aaa99ba819c93d>
> >
> >
> >
> > This will make your patch doesn't work for huge page.
> >
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd614841c06338a087769ee
> 3cfa96718784d1f5
> <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd614841c06338a087769e
> e3cfa96718784d1f5>
> >
> 
> Supporting of '*FOLL_GET*' in follow_huge_pud is introduced via the below commit:
> 
> https://lore.kernel.org/all/20220714042420.1847125-9-
> naoya.horiguchi@linux.dev/T/#mb3c83df087fba454b7b4ea32227fb8775ca70081
> 
> But that's still not perfect yet. For s390 version of follow_huge_pud, FOLL_GET is still not supported.
> And pgd level
> hugepage doesn't support FOLL_GET now.
> 
> >
> >
> > Not sure you know this issue or not, just share my debug information.
> 
> I'm not sure whether it's better to revert my above "problematic" patch first then add it back when
> all hugetlb pages support FOLL_GET.
> Or we could just live with it? Any thoughts?
> 

TBH, the issue is more complicated than I think. :-(

Looks like only '[PATCH v7 5/8] mm, hwpoison: set PG_hwpoison for busy hugetlb pages' will be
backported to 5.19 ? Only this patch has "Fixes:" tag. If so, it will break 5.19.

I just run VPP 'https://fd.io/' to find the error message about huge page allocation
after I switched from 5.18 to 5.19.

> 
> Thanks,
> Miaohe Lin
> 
> 
> >
> >
> >
> > BR,
> >
> > Haiyue
> >
> >
> >


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Linux 5.19 __NR_move_pages failed for hugepage
  2022-08-12  3:04   ` Wang, Haiyue
@ 2022-08-12  6:40     ` Miaohe Lin
  2022-08-12  8:50       ` Wang, Haiyue
  0 siblings, 1 reply; 4+ messages in thread
From: Miaohe Lin @ 2022-08-12  6:40 UTC (permalink / raw)
  To: Wang, Haiyue
  Cc: akpm, Linux-MM, linux-kernel, Naoya Horiguchi, David Hildenbrand

On 2022/8/12 11:04, Wang, Haiyue wrote:
>> -----Original Message-----
>> From: Miaohe Lin <linmiaohe@huawei.com>
>> Sent: Friday, August 12, 2022 09:59
>> To: Wang, Haiyue <haiyue.wang@intel.com>
>> Cc: akpm@linux-foundation.org; Linux-MM <linux-mm@kvack.org>; linux-kernel <linux-
>> kernel@vger.kernel.org>; Naoya Horiguchi <naoya.horiguchi@linux.dev>; David Hildenbrand
>> <david@redhat.com>
>> Subject: Re: Linux 5.19 __NR_move_pages failed for hugepage
>>
>> On 2022/8/11 16:01, Wang, Haiyue wrote:
>>> Hi Miaohe,
>>>
>>>
>>
>> Hi Haiyue,
>>
>> Many thanks for your report and debug.
>>
>>>
>>> When I call "*syscall (__NR_move_pages, 0, n_pages, ptr, 0, status, 0)*" to get the huge page node
>>>
>>> information, it is failed with '-2' returned in 'status' array.
>>>
>>>
>>>
>>> After some debug, I found that "*follow_huge_pud*" will return NULL if '*FOLL_GET*' is set.
>>>
>>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e66f17ff71772b209eed39de
>> 35aaa99ba819c93d
>> <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e66f17ff71772b209eed39d
>> e35aaa99ba819c93d>
>>>
>>>
>>>
>>> This will make your patch doesn't work for huge page.
>>>
>>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd614841c06338a087769ee
>> 3cfa96718784d1f5
>> <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd614841c06338a087769e
>> e3cfa96718784d1f5>
>>>
>>
>> Supporting of '*FOLL_GET*' in follow_huge_pud is introduced via the below commit:
>>
>> https://lore.kernel.org/all/20220714042420.1847125-9-
>> naoya.horiguchi@linux.dev/T/#mb3c83df087fba454b7b4ea32227fb8775ca70081
>>
>> But that's still not perfect yet. For s390 version of follow_huge_pud, FOLL_GET is still not supported.
>> And pgd level
>> hugepage doesn't support FOLL_GET now.
>>
>>>
>>>
>>> Not sure you know this issue or not, just share my debug information.
>>
>> I'm not sure whether it's better to revert my above "problematic" patch first then add it back when
>> all hugetlb pages support FOLL_GET.
>> Or we could just live with it? Any thoughts?
>>
> 
> TBH, the issue is more complicated than I think. :-(
> 
> Looks like only '[PATCH v7 5/8] mm, hwpoison: set PG_hwpoison for busy hugetlb pages' will be
> backported to 5.19 ? Only this patch has "Fixes:" tag. If so, it will break 5.19.

If you want to mitigate the problem of __NR_move_pages failing for hugepage, "[PATCH v7 2/8] mm/hugetlb:
make pud_huge() and follow_huge_pud() aware of non-present pud entry" could be backported to 5.19.

> 
> I just run VPP 'https://fd.io/' to find the error message about huge page allocation
> after I switched from 5.18 to 5.19.

Do you mean the reported problem is found by VPP? Anyway, you can send a patch to fix the problem if you like. :)
I will try fixing it if requested of course (but I'm not sure how to fix it yet).

Thanks,
Miaohe Lin


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Linux 5.19 __NR_move_pages failed for hugepage
  2022-08-12  6:40     ` Miaohe Lin
@ 2022-08-12  8:50       ` Wang, Haiyue
  0 siblings, 0 replies; 4+ messages in thread
From: Wang, Haiyue @ 2022-08-12  8:50 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, Linux-MM, linux-kernel, Naoya Horiguchi, David Hildenbrand

> -----Original Message-----
> From: Miaohe Lin <linmiaohe@huawei.com>
> Sent: Friday, August 12, 2022 14:41
> To: Wang, Haiyue <haiyue.wang@intel.com>
> Cc: akpm@linux-foundation.org; Linux-MM <linux-mm@kvack.org>; linux-kernel <linux-
> kernel@vger.kernel.org>; Naoya Horiguchi <naoya.horiguchi@linux.dev>; David Hildenbrand
> <david@redhat.com>
> Subject: Re: Linux 5.19 __NR_move_pages failed for hugepage
> 
> On 2022/8/12 11:04, Wang, Haiyue wrote:
> >> -----Original Message-----
> >> From: Miaohe Lin <linmiaohe@huawei.com>
> >> Sent: Friday, August 12, 2022 09:59
> >> To: Wang, Haiyue <haiyue.wang@intel.com>
> >> Cc: akpm@linux-foundation.org; Linux-MM <linux-mm@kvack.org>; linux-kernel <linux-
> >> kernel@vger.kernel.org>; Naoya Horiguchi <naoya.horiguchi@linux.dev>; David Hildenbrand
> >> <david@redhat.com>
> >> Subject: Re: Linux 5.19 __NR_move_pages failed for hugepage
> >>
> >> On 2022/8/11 16:01, Wang, Haiyue wrote:
> >>> Hi Miaohe,
> >>>
> >>>
> >>
> >> Hi Haiyue,
> >>
> >> Many thanks for your report and debug.
> >>
> >>>
> >>> When I call "*syscall (__NR_move_pages, 0, n_pages, ptr, 0, status, 0)*" to get the huge page node
> >>>
> >>> information, it is failed with '-2' returned in 'status' array.
> >>>
> >>>
> >>>
> >>> After some debug, I found that "*follow_huge_pud*" will return NULL if '*FOLL_GET*' is set.
> >>>
> >>>
> >>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e66f17ff71772b209eed39de
> >> 35aaa99ba819c93d
> >>
> <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e66f17ff71772b209eed39d
> >> e35aaa99ba819c93d>
> >>>
> >>>
> >>>
> >>> This will make your patch doesn't work for huge page.
> >>>
> >>>
> >>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd614841c06338a087769ee
> >> 3cfa96718784d1f5
> >>
> <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd614841c06338a087769e
> >> e3cfa96718784d1f5>
> >>>
> >>
> >> Supporting of '*FOLL_GET*' in follow_huge_pud is introduced via the below commit:
> >>
> >> https://lore.kernel.org/all/20220714042420.1847125-9-
> >> naoya.horiguchi@linux.dev/T/#mb3c83df087fba454b7b4ea32227fb8775ca70081
> >>
> >> But that's still not perfect yet. For s390 version of follow_huge_pud, FOLL_GET is still not
> supported.
> >> And pgd level
> >> hugepage doesn't support FOLL_GET now.
> >>
> >>>
> >>>
> >>> Not sure you know this issue or not, just share my debug information.
> >>
> >> I'm not sure whether it's better to revert my above "problematic" patch first then add it back when
> >> all hugetlb pages support FOLL_GET.
> >> Or we could just live with it? Any thoughts?
> >>
> >
> > TBH, the issue is more complicated than I think. :-(
> >
> > Looks like only '[PATCH v7 5/8] mm, hwpoison: set PG_hwpoison for busy hugetlb pages' will be
> > backported to 5.19 ? Only this patch has "Fixes:" tag. If so, it will break 5.19.
> 
> If you want to mitigate the problem of __NR_move_pages failing for hugepage, "[PATCH v7 2/8]
> mm/hugetlb:
> make pud_huge() and follow_huge_pud() aware of non-present pud entry" could be backported to 5.19.
> 
> >
> > I just run VPP 'https://fd.io/' to find the error message about huge page allocation
> > after I switched from 5.18 to 5.19.
> 
> Do you mean the reported problem is found by VPP? Anyway, you can send a patch to fix the problem if
> you like. :)
> I will try fixing it if requested of course (but I'm not sure how to fix it yet).
> 

I try a quick fix, and cc'ed you. Ugly design, but your fix is kept.

> Thanks,
> Miaohe Lin


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-08-12  8:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <BYAPR11MB3495837E180867B47E551EFCF7649@BYAPR11MB3495.namprd11.prod.outlook.com>
2022-08-12  1:59 ` Linux 5.19 __NR_move_pages failed for hugepage Miaohe Lin
2022-08-12  3:04   ` Wang, Haiyue
2022-08-12  6:40     ` Miaohe Lin
2022-08-12  8:50       ` Wang, Haiyue

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).