* [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable @ 2020-01-14 9:16 Li Xinhai 2020-01-14 9:16 ` [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone Li Xinhai ` (2 more replies) 0 siblings, 3 replies; 30+ messages in thread From: Li Xinhai @ 2020-01-14 9:16 UTC (permalink / raw) To: linux-mm; +Cc: akpm, Michal Hocko, Mike Kravetz Checking hstate at early phase when isolating page, instead of during unmap and move phase, to avoid useless isolation. Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> --- include/linux/mempolicy.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 5228c62..986e51d 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -185,10 +185,9 @@ static inline bool vma_migratable(struct vm_area_struct *vma) if (vma_is_dax(vma)) return false; -#ifndef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION - if (vma->vm_flags & VM_HUGETLB) + if (is_vm_hugetlb_page(vma) && + !hugepage_migration_supported(hstate_vma(vma))) return false; -#endif /* * Migration allocates pages in the highest zone. If we cannot -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-14 9:16 [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable Li Xinhai @ 2020-01-14 9:16 ` Li Xinhai 2020-01-14 14:09 ` Li Xinhai 2020-01-14 19:12 ` [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable Mike Kravetz 2020-01-15 1:25 ` Andrew Morton 2 siblings, 1 reply; 30+ messages in thread From: Li Xinhai @ 2020-01-14 9:16 UTC (permalink / raw) To: linux-mm; +Cc: akpm, Michal Hocko, Mike Kravetz Checking MPOL_MF_STRICT is ignored for HUGETLB vma according to mbind man page: Notes MPOL_MF_STRICT is ignored on huge page mappings. If MPOL_MF_STRICT is specified alone without any MOVE flag, we should indicate, from test_walk, that walking this vma should be skipped even if there are misplaced pages. Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> --- mm/mempolicy.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 067cf7d..c117b5f 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -656,6 +656,13 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end, return 1; } + /* MPOL_MF_STRICT is ignored for huge page, skip checking + * misplaced pages + */ + if ((flags & MPOL_MF_VALID) == MPOL_MF_STRICT && + is_vm_hugetlb_page(vma)) + return 1; + /* queue pages from current vma */ if (flags & MPOL_MF_VALID) return 0; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-14 9:16 ` [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone Li Xinhai @ 2020-01-14 14:09 ` Li Xinhai 2020-01-14 18:27 ` Yang Shi 2020-01-15 1:07 ` Mike Kravetz 0 siblings, 2 replies; 30+ messages in thread From: Li Xinhai @ 2020-01-14 14:09 UTC (permalink / raw) To: linux-mm; +Cc: akpm, mhocko, Mike Kravetz, yang.shi, n-horiguchi Add cc to Yang Shi <yang.shi@linux.alibaba.com> Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> , who has been worked on this part On 2020-01-14 at 17:16 Li Xinhai wrote: >Checking MPOL_MF_STRICT is ignored for HUGETLB vma according to mbind man >page: > >Notes >MPOL_MF_STRICT is ignored on huge page mappings. > >If MPOL_MF_STRICT is specified alone without any MOVE flag, we should >indicate, from test_walk, that walking this vma should be skipped even if >there are misplaced pages. > >Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> >Cc: Michal Hocko <mhocko@suse.com> >Cc: Mike Kravetz <mike.kravetz@oracle.com> >--- > mm/mempolicy.c | 7 +++++++ > 1 file changed, 7 insertions(+) > >diff --git a/mm/mempolicy.c b/mm/mempolicy.c >index 067cf7d..c117b5f 100644 >--- a/mm/mempolicy.c >+++ b/mm/mempolicy.c >@@ -656,6 +656,13 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end, > return 1; > } > >+ /* MPOL_MF_STRICT is ignored for huge page, skip checking >+ * misplaced pages >+ */ >+ if ((flags & MPOL_MF_VALID) == MPOL_MF_STRICT && >+ is_vm_hugetlb_page(vma)) >+ return 1; >+ > /* queue pages from current vma */ > if (flags & MPOL_MF_VALID) > return 0; >-- >1.8.3.1 > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-14 14:09 ` Li Xinhai @ 2020-01-14 18:27 ` Yang Shi 2020-01-15 1:07 ` Mike Kravetz 1 sibling, 0 replies; 30+ messages in thread From: Yang Shi @ 2020-01-14 18:27 UTC (permalink / raw) To: Li Xinhai, linux-mm; +Cc: akpm, mhocko, Mike Kravetz, n-horiguchi On 1/14/20 6:09 AM, Li Xinhai wrote: > Add cc to > Yang Shi <yang.shi@linux.alibaba.com> > Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > , who has been worked on this part > > On 2020-01-14 at 17:16 Li Xinhai wrote: >> Checking MPOL_MF_STRICT is ignored for HUGETLB vma according to mbind man >> page: >> >> Notes >> MPOL_MF_STRICT is ignored on huge page mappings. >> >> If MPOL_MF_STRICT is specified alone without any MOVE flag, we should >> indicate, from test_walk, that walking this vma should be skipped even if >> there are misplaced pages. >> >> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> >> Cc: Michal Hocko <mhocko@suse.com> >> Cc: Mike Kravetz <mike.kravetz@oracle.com> >> --- >> mm/mempolicy.c | 7 +++++++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/mm/mempolicy.c b/mm/mempolicy.c >> index 067cf7d..c117b5f 100644 >> --- a/mm/mempolicy.c >> +++ b/mm/mempolicy.c >> @@ -656,6 +656,13 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end, >> return 1; >> } >> >> + /* MPOL_MF_STRICT is ignored for huge page, skip checking >> + * misplaced pages >> + */ >> + if ((flags & MPOL_MF_VALID) == MPOL_MF_STRICT && >> + is_vm_hugetlb_page(vma)) >> + return 1; It makes some sense to me. We do save some effort by not acquiring/releasing ptl and walking page tables. Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> >> /* queue pages from current vma */ >> if (flags & MPOL_MF_VALID) >> return 0; >> -- >> 1.8.3.1 > > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-14 14:09 ` Li Xinhai 2020-01-14 18:27 ` Yang Shi @ 2020-01-15 1:07 ` Mike Kravetz 2020-01-15 1:24 ` Yang Shi 2020-01-15 21:07 ` Mike Kravetz 1 sibling, 2 replies; 30+ messages in thread From: Mike Kravetz @ 2020-01-15 1:07 UTC (permalink / raw) To: Li Xinhai, linux-mm; +Cc: akpm, mhocko, yang.shi, n-horiguchi On 1/14/20 6:09 AM, Li Xinhai wrote: > Add cc to > Yang Shi <yang.shi@linux.alibaba.com> > Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > , who has been worked on this part > > On 2020-01-14 at 17:16 Li Xinhai wrote: >> Checking MPOL_MF_STRICT is ignored for HUGETLB vma according to mbind man >> page: >> >> Notes >> MPOL_MF_STRICT is ignored on huge page mappings. >> >> If MPOL_MF_STRICT is specified alone without any MOVE flag, we should >> indicate, from test_walk, that walking this vma should be skipped even if >> there are misplaced pages. >> >> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> >> Cc: Michal Hocko <mhocko@suse.com> >> Cc: Mike Kravetz <mike.kravetz@oracle.com> I do not necessarily disagree with the change. However, this has made me question a couple things: 1) Why does the man page say MPOL_MF_STRICT is ignored on huge page mappings? - Is that leftover from the the days when huge page migration was not supported? - Is it just because huge page migration is more likely to fail than base page migration. 2) Does the mbind code function properly when unable to migrate a huge page MPOL_MF_STRICT is set? A quick look at the code looks like it returns EIO. I will look into these questions. However, if someone already knows the answer that would be appreciated. -- Mike Kravetz ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-15 1:07 ` Mike Kravetz @ 2020-01-15 1:24 ` Yang Shi 2020-01-15 4:28 ` Mike Kravetz 2020-01-15 21:07 ` Mike Kravetz 1 sibling, 1 reply; 30+ messages in thread From: Yang Shi @ 2020-01-15 1:24 UTC (permalink / raw) To: Mike Kravetz, Li Xinhai, linux-mm; +Cc: akpm, mhocko, n-horiguchi On 1/14/20 5:07 PM, Mike Kravetz wrote: > On 1/14/20 6:09 AM, Li Xinhai wrote: >> Add cc to >> Yang Shi <yang.shi@linux.alibaba.com> >> Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> >> , who has been worked on this part >> >> On 2020-01-14 at 17:16 Li Xinhai wrote: >>> Checking MPOL_MF_STRICT is ignored for HUGETLB vma according to mbind man >>> page: >>> >>> Notes >>> MPOL_MF_STRICT is ignored on huge page mappings. >>> >>> If MPOL_MF_STRICT is specified alone without any MOVE flag, we should >>> indicate, from test_walk, that walking this vma should be skipped even if >>> there are misplaced pages. >>> >>> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> >>> Cc: Michal Hocko <mhocko@suse.com> >>> Cc: Mike Kravetz <mike.kravetz@oracle.com> > I do not necessarily disagree with the change. However, this has made me > question a couple things: > 1) Why does the man page say MPOL_MF_STRICT is ignored on huge page mappings? > - Is that leftover from the the days when huge page migration was not > supported? > - Is it just because huge page migration is more likely to fail than > base page migration. > 2) Does the mbind code function properly when unable to migrate a huge page > MPOL_MF_STRICT is set? A quick look at the code looks like it returns > EIO. I don't know the answer about question #1 I didn't dig into the history. The queue_pages_hugetlb() returns 0 unconditionally, I think this is what "MPOL_MF_STRICT is ignored on huge page mappings" means in code. It would return -EIO for base pages or THP as what the manpage describes. > > I will look into these questions. However, if someone already knows the > answer that would be appreciated. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-15 1:24 ` Yang Shi @ 2020-01-15 4:28 ` Mike Kravetz 2020-01-15 5:23 ` Yang Shi 0 siblings, 1 reply; 30+ messages in thread From: Mike Kravetz @ 2020-01-15 4:28 UTC (permalink / raw) To: Yang Shi, Li Xinhai, linux-mm; +Cc: akpm, mhocko, n-horiguchi On 1/14/20 5:24 PM, Yang Shi wrote: > > > On 1/14/20 5:07 PM, Mike Kravetz wrote: >> On 1/14/20 6:09 AM, Li Xinhai wrote: >>> Add cc to >>> Yang Shi <yang.shi@linux.alibaba.com> >>> Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> >>> , who has been worked on this part >>> >>> On 2020-01-14 at 17:16 Li Xinhai wrote: >>>> Checking MPOL_MF_STRICT is ignored for HUGETLB vma according to mbind man >>>> page: >>>> >>>> Notes >>>> MPOL_MF_STRICT is ignored on huge page mappings. >>>> >>>> If MPOL_MF_STRICT is specified alone without any MOVE flag, we should >>>> indicate, from test_walk, that walking this vma should be skipped even if >>>> there are misplaced pages. >>>> >>>> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> >>>> Cc: Michal Hocko <mhocko@suse.com> >>>> Cc: Mike Kravetz <mike.kravetz@oracle.com> >> I do not necessarily disagree with the change. However, this has made me >> question a couple things: >> 1) Why does the man page say MPOL_MF_STRICT is ignored on huge page mappings? >> - Is that leftover from the the days when huge page migration was not >> supported? >> - Is it just because huge page migration is more likely to fail than >> base page migration. >> 2) Does the mbind code function properly when unable to migrate a huge page >> MPOL_MF_STRICT is set? A quick look at the code looks like it returns >> EIO. > > I don't know the answer about question #1 I didn't dig into the history. The queue_pages_hugetlb() returns 0 unconditionally, I think this is what "MPOL_MF_STRICT is ignored on huge page mappings" means in code. > > It would return -EIO for base pages or THP as what the manpage describes. > I was thinking about a migration failure after isolation. This block of code in do_mbind() after queue_pages_range() and mbind_range(). if (!err) { int nr_failed = 0; if (!list_empty(&pagelist)) { WARN_ON_ONCE(flags & MPOL_MF_LAZY); nr_failed = migrate_pages(&pagelist, new_page, NULL, start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND); if (nr_failed) putback_movable_pages(&pagelist); } if ((ret > 0) || (nr_failed && (flags & MPOL_MF_STRICT))) err = -EIO; -- Mike Kravetz ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-15 4:28 ` Mike Kravetz @ 2020-01-15 5:23 ` Yang Shi 2020-01-15 7:36 ` Li Xinhai 0 siblings, 1 reply; 30+ messages in thread From: Yang Shi @ 2020-01-15 5:23 UTC (permalink / raw) To: Mike Kravetz, Li Xinhai, linux-mm; +Cc: akpm, mhocko, n-horiguchi On 1/14/20 8:28 PM, Mike Kravetz wrote: > On 1/14/20 5:24 PM, Yang Shi wrote: >> >> On 1/14/20 5:07 PM, Mike Kravetz wrote: >>> On 1/14/20 6:09 AM, Li Xinhai wrote: >>>> Add cc to >>>> Yang Shi <yang.shi@linux.alibaba.com> >>>> Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> >>>> , who has been worked on this part >>>> >>>> On 2020-01-14 at 17:16 Li Xinhai wrote: >>>>> Checking MPOL_MF_STRICT is ignored for HUGETLB vma according to mbind man >>>>> page: >>>>> >>>>> Notes >>>>> MPOL_MF_STRICT is ignored on huge page mappings. >>>>> >>>>> If MPOL_MF_STRICT is specified alone without any MOVE flag, we should >>>>> indicate, from test_walk, that walking this vma should be skipped even if >>>>> there are misplaced pages. >>>>> >>>>> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> >>>>> Cc: Michal Hocko <mhocko@suse.com> >>>>> Cc: Mike Kravetz <mike.kravetz@oracle.com> >>> I do not necessarily disagree with the change. However, this has made me >>> question a couple things: >>> 1) Why does the man page say MPOL_MF_STRICT is ignored on huge page mappings? >>> - Is that leftover from the the days when huge page migration was not >>> supported? >>> - Is it just because huge page migration is more likely to fail than >>> base page migration. >>> 2) Does the mbind code function properly when unable to migrate a huge page >>> MPOL_MF_STRICT is set? A quick look at the code looks like it returns >>> EIO. >> I don't know the answer about question #1 I didn't dig into the history. The queue_pages_hugetlb() returns 0 unconditionally, I think this is what "MPOL_MF_STRICT is ignored on huge page mappings" means in code. >> >> It would return -EIO for base pages or THP as what the manpage describes. >> > I was thinking about a migration failure after isolation. This block of > code in do_mbind() after queue_pages_range() and mbind_range(). > > if (!err) { > int nr_failed = 0; > > if (!list_empty(&pagelist)) { > WARN_ON_ONCE(flags & MPOL_MF_LAZY); > nr_failed = migrate_pages(&pagelist, new_page, NULL, > start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND); > if (nr_failed) > putback_movable_pages(&pagelist); > } > > if ((ret > 0) || (nr_failed && (flags & MPOL_MF_STRICT))) > err = -EIO; Hmm.. I agree this part in man page does look ambiguous. We may assume "MPOL_MF_STRICT is ignored on huge page mappings." implies if MPOL_MF_STRICT is specified alone? If MOVE flag is specified it should return -EIO if some pages could not be moved as what the man page describes. I don't know what the intention was at the first place. We may have to dig into the history. > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-15 5:23 ` Yang Shi @ 2020-01-15 7:36 ` Li Xinhai 2020-01-15 17:16 ` Yang Shi 0 siblings, 1 reply; 30+ messages in thread From: Li Xinhai @ 2020-01-15 7:36 UTC (permalink / raw) To: yang.shi, Mike Kravetz, linux-mm; +Cc: akpm, mhocko, n-horiguchi On 2020-01-15 at 13:23 Yang Shi wrote: > > >On 1/14/20 8:28 PM, Mike Kravetz wrote: >> On 1/14/20 5:24 PM, Yang Shi wrote: >>> >>> On 1/14/20 5:07 PM, Mike Kravetz wrote: >>>> On 1/14/20 6:09 AM, Li Xinhai wrote: >>>>> Add cc to >>>>> Yang Shi <yang.shi@linux.alibaba.com> >>>>> Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> >>>>> , who has been worked on this part >>>>> >>>>> On 2020-01-14 at 17:16 Li Xinhai wrote: >>>>>> Checking MPOL_MF_STRICT is ignored for HUGETLB vma according to mbind man >>>>>> page: >>>>>> >>>>>> Notes >>>>>> MPOL_MF_STRICT is ignored on huge page mappings. >>>>>> >>>>>> If MPOL_MF_STRICT is specified alone without any MOVE flag, we should >>>>>> indicate, from test_walk, that walking this vma should be skipped even if >>>>>> there are misplaced pages. >>>>>> >>>>>> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> >>>>>> Cc: Michal Hocko <mhocko@suse.com> >>>>>> Cc: Mike Kravetz <mike.kravetz@oracle.com> >>>> I do not necessarily disagree with the change. However, this has made me >>>> question a couple things: >>>> 1) Why does the man page say MPOL_MF_STRICT is ignored on huge page mappings? >>>> - Is that leftover from the the days when huge page migration was not >>>> supported? >>>> - Is it just because huge page migration is more likely to fail than >>>> base page migration. >>>> 2) Does the mbind code function properly when unable to migrate a huge page >>>> MPOL_MF_STRICT is set? A quick look at the code looks like it returns >>>> EIO. for question (2), look into queue_pages_hugetlb(), the misplaced page would not cause -EIO reported, for both STRICT set alone and STRICT set with MOVE*; that means STRICT been effectively ignored during isolation phase. In unmap and move phase, -EIO is reported if failed to move page and STRICT is set. >>> I don't know the answer about question #1 I didn't dig into the history. The queue_pages_hugetlb() returns 0 unconditionally, I think this is what "MPOL_MF_STRICT is ignored on huge page mappings" means in code. >>> >>> It would return -EIO for base pages or THP as what the manpage describes. >>> >> I was thinking about a migration failure after isolation. This block of >> code in do_mbind() after queue_pages_range() and mbind_range(). >> >> if (!err) { >> int nr_failed = 0; >> >> if (!list_empty(&pagelist)) { >> WARN_ON_ONCE(flags & MPOL_MF_LAZY); >> nr_failed = migrate_pages(&pagelist, new_page, NULL, >> start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND); >> if (nr_failed) >> putback_movable_pages(&pagelist); >> } >> >> if ((ret > 0) || (nr_failed && (flags & MPOL_MF_STRICT))) >> err = -EIO; > >Hmm.. I agree this part in man page does look ambiguous. We may assume >"MPOL_MF_STRICT is ignored on huge page mappings." implies if >MPOL_MF_STRICT is specified alone? If MOVE flag is specified it should >return -EIO if some pages could not be moved as what the man page describes. > It looks to me that current code has no feasible way to ignore STRICT flag for hugetlb page when failure happen in unmap&move phase, because mbind is about to handle multiple vma(i.e., hugetlb vma mixed with other vma) in one call. >I don't know what the intention was at the first place. We may have to >dig into the history. > >> > > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-15 7:36 ` Li Xinhai @ 2020-01-15 17:16 ` Yang Shi 0 siblings, 0 replies; 30+ messages in thread From: Yang Shi @ 2020-01-15 17:16 UTC (permalink / raw) To: Li Xinhai, Mike Kravetz, linux-mm; +Cc: akpm, mhocko, n-horiguchi On 1/14/20 11:36 PM, Li Xinhai wrote: > On 2020-01-15 at 13:23 Yang Shi wrote: >> >> On 1/14/20 8:28 PM, Mike Kravetz wrote: >>> On 1/14/20 5:24 PM, Yang Shi wrote: >>>> On 1/14/20 5:07 PM, Mike Kravetz wrote: >>>>> On 1/14/20 6:09 AM, Li Xinhai wrote: >>>>>> Add cc to >>>>>> Yang Shi <yang.shi@linux.alibaba.com> >>>>>> Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> >>>>>> , who has been worked on this part >>>>>> >>>>>> On 2020-01-14 at 17:16 Li Xinhai wrote: >>>>>>> Checking MPOL_MF_STRICT is ignored for HUGETLB vma according to mbind man >>>>>>> page: >>>>>>> >>>>>>> Notes >>>>>>> MPOL_MF_STRICT is ignored on huge page mappings. >>>>>>> >>>>>>> If MPOL_MF_STRICT is specified alone without any MOVE flag, we should >>>>>>> indicate, from test_walk, that walking this vma should be skipped even if >>>>>>> there are misplaced pages. >>>>>>> >>>>>>> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> >>>>>>> Cc: Michal Hocko <mhocko@suse.com> >>>>>>> Cc: Mike Kravetz <mike.kravetz@oracle.com> >>>>> I do not necessarily disagree with the change. However, this has made me >>>>> question a couple things: >>>>> 1) Why does the man page say MPOL_MF_STRICT is ignored on huge page mappings? >>>>> - Is that leftover from the the days when huge page migration was not >>>>> supported? >>>>> - Is it just because huge page migration is more likely to fail than >>>>> base page migration. >>>>> 2) Does the mbind code function properly when unable to migrate a huge page >>>>> MPOL_MF_STRICT is set? A quick look at the code looks like it returns >>>>> EIO. > for question (2), > look into queue_pages_hugetlb(), the misplaced page would not > cause -EIO reported, for both STRICT set alone and STRICT set with MOVE*; > that means STRICT been effectively ignored during isolation phase. > > In unmap and move phase, -EIO is reported if failed to move page and > STRICT is set. > >>>> I don't know the answer about question #1 I didn't dig into the history. The queue_pages_hugetlb() returns 0 unconditionally, I think this is what "MPOL_MF_STRICT is ignored on huge page mappings" means in code. >>>> >>>> It would return -EIO for base pages or THP as what the manpage describes. >>>> >>> I was thinking about a migration failure after isolation. This block of >>> code in do_mbind() after queue_pages_range() and mbind_range(). >>> >>> if (!err) { >>> int nr_failed = 0; >>> >>> if (!list_empty(&pagelist)) { >>> WARN_ON_ONCE(flags & MPOL_MF_LAZY); >>> nr_failed = migrate_pages(&pagelist, new_page, NULL, >>> start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND); >>> if (nr_failed) >>> putback_movable_pages(&pagelist); >>> } >>> >>> if ((ret > 0) || (nr_failed && (flags & MPOL_MF_STRICT))) >>> err = -EIO; >> Hmm.. I agree this part in man page does look ambiguous. We may assume >> "MPOL_MF_STRICT is ignored on huge page mappings." implies if >> MPOL_MF_STRICT is specified alone? If MOVE flag is specified it should >> return -EIO if some pages could not be moved as what the man page describes. >> > It looks to me that current code has no feasible way to ignore STRICT > flag for hugetlb page when failure happen in unmap&move phase, > because mbind is about to handle multiple vma(i.e., hugetlb vma mixed with > other vma) in one call. Yes, if you have both MPOL_MF_STRICT and MPOL_MF_MOVE_{ALL} specified, so I speculate the condition is MPOL_MF_STRICT is specified alone, but the man page doesn't elaborate this. This is also what the code does. But, I'm not sure if my understanding is correct or not. > >> I don't know what the intention was at the first place. We may have to >> dig into the history. >> > > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-15 1:07 ` Mike Kravetz 2020-01-15 1:24 ` Yang Shi @ 2020-01-15 21:07 ` Mike Kravetz 2020-01-15 21:30 ` Yang Shi 2020-01-16 7:59 ` Michal Hocko 1 sibling, 2 replies; 30+ messages in thread From: Mike Kravetz @ 2020-01-15 21:07 UTC (permalink / raw) To: Li Xinhai, linux-mm; +Cc: akpm, mhocko, yang.shi, n-horiguchi Naoya, would appreciate any comments you have as this seems to be related to your changes to add huge page migration support. On 1/14/20 5:07 PM, Mike Kravetz wrote: > 1) Why does the man page say MPOL_MF_STRICT is ignored on huge page mappings? > - Is that leftover from the the days when huge page migration was not > supported? > - Is it just because huge page migration is more likely to fail than > base page migration. According to man page, mbind was added to v2.6.7 kernel. git history does not go back that far, so I first looked at v2.6.12. Definitions from include/linux/mempolicy.h ------------------------------------------ /* Policies */ #define MPOL_DEFAULT 0 #define MPOL_PREFERRED 1 #define MPOL_BIND 2 #define MPOL_INTERLEAVE 3 /* Flags for mbind */ #define MPOL_MF_STRICT (1<<0) /* Verify existing pages in the mapping */ Note the MPOL_MF_STRICT would simply verify current page placement. At this time, hugetlb pages were not part of this verification. From v2.6.12 routine check_range() ... for (vma = first; vma && vma->vm_start < end; vma = vma->vm_next) { if (!vma->vm_next && vma->vm_end < end) return ERR_PTR(-EFAULT); if (prev && prev->vm_end < vma->vm_start) return ERR_PTR(-EFAULT); if ((flags & MPOL_MF_STRICT) && !is_vm_hugetlb_page(vma)) { err = verify_pages(vma->vm_mm, vma->vm_start, vma->vm_end, nodes); if (err) { first = ERR_PTR(err); break; } } prev = vma; } ... man page history does not go back this far. The first mbind man page has some things that are interesting: 1) It DOES NOT have the note saying MPOL_MF_STRICT is ignored on huge page mappings (even though the code does this). 2) Support for huge page policy was added with 2.6.16 3) For MPOL_MF_STRICT, in 2.6.16 or later the kernel will also try to move pages to the requested node with this flag. Next look at v2.6.16 kernel ========================== This kernel introduces the MPOL_MF_MOVE* flags #define MPOL_MF_MOVE (1<<1) /* Move pages owned by this process to conform to mapping */ #define MPOL_MF_MOVE_ALL (1<<2) /* Move every page to conform to mapping */ Once again, the up front check_range() routine will skip hugetlb vma's. Note that check_range() will also isolate pages. So since hugetlb vma's are skipped, no hugetlb pages will be isolated. Since no hugetlb pages are isolated, none will be on the list to be migrate. Therefore, hugetlb pages are effectively ignored for the new MPOL_MF_MOVE* flags. This makes sense as huge page migration support was not added until the v3.12 kernel. Note that at when MPOL_MF_MOVE* flags were added to the mbind man page, the statement "MPOL_MF_STRICT is ignored on huge page mappings right now." was added. This was later changed to "MPOL_MF_STRICT is ignored on huge page mappings." Next look at v3.12 kernel ========================= The v3.12 code looks similiar to today's code. In the verify/isolation phase, v3.12 routine queue_pages_hugetlb_pmd_range() looks very similiar to queue_pages_hugetlb(). Neither will cause errors at this point in the process. And, hugetlb pages with a mapcount == 1 will potentially be isolated and added to the list of pages to be migrated. In addition, if the subsequent call to migrate_pages() fails to migrate ANY page, we return -EIO if MPOL_MF_STRICT is set. This is true even if the only page(s) that failed to migrate were hugetlb pages. It should also be noted that no mbind man page updates were made WRT hugetlb pages after migration support was added. Summary ======= It 'looks' like the statement "MPOL_MF_STRICT is ignored on huge page mappings." is left over from the original mbind implementation. When the huge page migration support was added, I can not be sure if ignoring MPOL_MF_STRICT for huge pages during the verify/isolation phase was intentional. It seems like it was as the return value from isolate_huge_page() is ignored. What should we do? ================== 1) Nothing more than optimizations by Li Xinhai. Behavior that could be seen as conflicting with man page has existed since v3.12 and I am not aware of any complaints. 2) In addition to optimizations by Li Xinhai, modify code to truly ignore MPOL_MF_STRICT for huge page mappings. This would be fairly easy to do after a failure of migrate_pages(). We could simply traverse the list of pages that were not migrated looking for any non-hugetlb page. 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings." and modify code accordingly. My suggestion would be for 1 or 2. Thoughts? -- Mike Kravetz ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-15 21:07 ` Mike Kravetz @ 2020-01-15 21:30 ` Yang Shi 2020-01-15 21:45 ` Mike Kravetz 2020-01-16 7:59 ` Michal Hocko 1 sibling, 1 reply; 30+ messages in thread From: Yang Shi @ 2020-01-15 21:30 UTC (permalink / raw) To: Mike Kravetz, Li Xinhai, linux-mm; +Cc: akpm, mhocko, n-horiguchi [-- Attachment #1: Type: text/plain, Size: 5502 bytes --] On 1/15/20 1:07 PM, Mike Kravetz wrote: > Naoya, would appreciate any comments you have as this seems to be related > to your changes to add huge page migration support. > > On 1/14/20 5:07 PM, Mike Kravetz wrote: >> 1) Why does the man page say MPOL_MF_STRICT is ignored on huge page mappings? >> - Is that leftover from the the days when huge page migration was not >> supported? >> - Is it just because huge page migration is more likely to fail than >> base page migration. > > According to man page, mbind was added to v2.6.7 kernel. git history does > not go back that far, so I first looked at v2.6.12. > > Definitions from include/linux/mempolicy.h > ------------------------------------------ > /* Policies */ > #define MPOL_DEFAULT 0 > #define MPOL_PREFERRED 1 > #define MPOL_BIND 2 > #define MPOL_INTERLEAVE 3 > > /* Flags for mbind */ > #define MPOL_MF_STRICT (1<<0) /* Verify existing pages in the mapping */ > > Note the MPOL_MF_STRICT would simply verify current page placement. At > this time, hugetlb pages were not part of this verification. > > From v2.6.12 routine check_range() > ... > for (vma = first; vma && vma->vm_start < end; vma = vma->vm_next) { > if (!vma->vm_next && vma->vm_end < end) > return ERR_PTR(-EFAULT); > if (prev && prev->vm_end < vma->vm_start) > return ERR_PTR(-EFAULT); > if ((flags & MPOL_MF_STRICT) && !is_vm_hugetlb_page(vma)) { > err = verify_pages(vma->vm_mm, > vma->vm_start, vma->vm_end, nodes); > if (err) { > first = ERR_PTR(err); > break; > } > } > prev = vma; > } > ... > > man page history does not go back this far. The first mbind man page has > some things that are interesting: > 1) It DOES NOT have the note saying MPOL_MF_STRICT is ignored on huge > page mappings (even though the code does this). > 2) Support for huge page policy was added with 2.6.16 > 3) For MPOL_MF_STRICT, in 2.6.16 or later the kernel will also try to > move pages to the requested node with this flag. > > Next look at v2.6.16 kernel > ========================== > This kernel introduces the MPOL_MF_MOVE* flags > #define MPOL_MF_MOVE (1<<1) /* Move pages owned by this process to conform > to mapping */ > #define MPOL_MF_MOVE_ALL (1<<2) /* Move every page to conform to mapping */ > > Once again, the up front check_range() routine will skip hugetlb vma's. > Note that check_range() will also isolate pages. So since hugetlb vma's > are skipped, no hugetlb pages will be isolated. Since no hugetlb pages > are isolated, none will be on the list to be migrate. Therefore, hugetlb > pages are effectively ignored for the new MPOL_MF_MOVE* flags. This makes > sense as huge page migration support was not added until the v3.12 kernel. > > Note that at when MPOL_MF_MOVE* flags were added to the mbind man page, > the statement "MPOL_MF_STRICT is ignored on huge page mappings right now." > was added. This was later changed to "MPOL_MF_STRICT is ignored on huge > page mappings." > > Next look at v3.12 kernel > ========================= > The v3.12 code looks similiar to today's code. In the verify/isolation > phase, v3.12 routine queue_pages_hugetlb_pmd_range() looks very similiar > to queue_pages_hugetlb(). Neither will cause errors at this point in the > process. And, hugetlb pages with a mapcount == 1 will potentially be > isolated and added to the list of pages to be migrated. In addition, if > the subsequent call to migrate_pages() fails to migrate ANY page, we > return -EIO if MPOL_MF_STRICT is set. This is true even if the only page(s) > that failed to migrate were hugetlb pages. > > It should also be noted that no mbind man page updates were made WRT > hugetlb pages after migration support was added. > > Summary > ======= > It 'looks' like the statement "MPOL_MF_STRICT is ignored on huge page > mappings." is left over from the original mbind implementation. When > the huge page migration support was added, I can not be sure if ignoring > MPOL_MF_STRICT for huge pages during the verify/isolation phase was > intentional. It seems like it was as the return value from > isolate_huge_page() is ignored. Thanks a lot for digging into the history. > > What should we do? > ================== > 1) Nothing more than optimizations by Li Xinhai. Behavior that could be > seen as conflicting with man page has existed since v3.12 and I am > not aware of any complaints. > 2) In addition to optimizations by Li Xinhai, modify code to truly ignore > MPOL_MF_STRICT for huge page mappings. This would be fairly easy to do > after a failure of migrate_pages(). We could simply traverse the list > of pages that were not migrated looking for any non-hugetlb page. I don't think we can do this easily since migrate_pages() would put the migration failed hugetlb pages back to hugepage_activelist so there should not any hugetlb page reside on the pagelist regardless of failure if I read the code correctly. Other than that traversing page list to look for a certain type page doesn't sound scalable to me. > 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings." > and modify code accordingly. > > My suggestion would be for 1 or 2. Thoughts? By rethinking the history (thanks again for digging into it), it sounds #3 should be more reasonable. It sounds like the behavior was changed since hugetlb migration was added but the man page was not updated to reflect the change. [-- Attachment #2: Type: text/html, Size: 6455 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-15 21:30 ` Yang Shi @ 2020-01-15 21:45 ` Mike Kravetz 2020-01-15 21:59 ` Yang Shi 0 siblings, 1 reply; 30+ messages in thread From: Mike Kravetz @ 2020-01-15 21:45 UTC (permalink / raw) To: Yang Shi, Li Xinhai, linux-mm; +Cc: akpm, mhocko, n-horiguchi On 1/15/20 1:30 PM, Yang Shi wrote: > On 1/15/20 1:07 PM, Mike Kravetz wrote: >> What should we do? >> ================== >> 1) Nothing more than optimizations by Li Xinhai. Behavior that could be >> seen as conflicting with man page has existed since v3.12 and I am >> not aware of any complaints. >> 2) In addition to optimizations by Li Xinhai, modify code to truly ignore >> MPOL_MF_STRICT for huge page mappings. This would be fairly easy to do >> after a failure of migrate_pages(). We could simply traverse the list >> of pages that were not migrated looking for any non-hugetlb page. > > I don't think we can do this easily since migrate_pages() would put the migration failed hugetlb pages back to hugepage_activelist so there should not any hugetlb page reside on the pagelist regardless of failure if I read the code correctly. > You are correct. I made an assumption without actually looking at the code. :( > Other than that traversing page list to look for a certain type page doesn't sound scalable to me. > >> 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings." >> and modify code accordingly. >> >> My suggestion would be for 1 or 2. Thoughts? > > By rethinking the history (thanks again for digging into it), it sounds #3 should be more reasonable. It sounds like the behavior was changed since hugetlb migration was added but the man page was not updated to reflect the change. > Let's hope Naoya comments. My only concern with #3 is that we will be changing behavior. I do not think many people (if any) depend on existing behavior. However, you can never be sure. -- Mike Kravetz ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-15 21:45 ` Mike Kravetz @ 2020-01-15 21:59 ` Yang Shi 2020-01-16 8:07 ` HORIGUCHI NAOYA(堀口 直也) 0 siblings, 1 reply; 30+ messages in thread From: Yang Shi @ 2020-01-15 21:59 UTC (permalink / raw) To: Mike Kravetz, Li Xinhai, linux-mm; +Cc: akpm, mhocko, n-horiguchi On 1/15/20 1:45 PM, Mike Kravetz wrote: > On 1/15/20 1:30 PM, Yang Shi wrote: >> On 1/15/20 1:07 PM, Mike Kravetz wrote: >>> What should we do? >>> ================== >>> 1) Nothing more than optimizations by Li Xinhai. Behavior that could be >>> seen as conflicting with man page has existed since v3.12 and I am >>> not aware of any complaints. >>> 2) In addition to optimizations by Li Xinhai, modify code to truly ignore >>> MPOL_MF_STRICT for huge page mappings. This would be fairly easy to do >>> after a failure of migrate_pages(). We could simply traverse the list >>> of pages that were not migrated looking for any non-hugetlb page. >> I don't think we can do this easily since migrate_pages() would put the migration failed hugetlb pages back to hugepage_activelist so there should not any hugetlb page reside on the pagelist regardless of failure if I read the code correctly. >> > You are correct. I made an assumption without actually looking at the code. :( > >> Other than that traversing page list to look for a certain type page doesn't sound scalable to me. >> >>> 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings." >>> and modify code accordingly. >>> >>> My suggestion would be for 1 or 2. Thoughts? >> By rethinking the history (thanks again for digging into it), it sounds #3 should be more reasonable. It sounds like the behavior was changed since hugetlb migration was added but the man page was not updated to reflect the change. >> > Let's hope Naoya comments. My only concern with #3 is that we will be changing > behavior. I do not think many people (if any) depend on existing behavior. > However, you can never be sure. Yes, this would change the bahavior, but I don't see why we have to treat hugetlb specially nowadays with migration supported. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-15 21:59 ` Yang Shi @ 2020-01-16 8:07 ` HORIGUCHI NAOYA(堀口 直也) 2020-01-16 15:32 ` Li Xinhai 0 siblings, 1 reply; 30+ messages in thread From: HORIGUCHI NAOYA(堀口 直也) @ 2020-01-16 8:07 UTC (permalink / raw) To: Yang Shi, Mike Kravetz; +Cc: Li Xinhai, linux-mm, akpm, mhocko, n-horiguchi Hi everyone, Thank you all for finding and digging the issue. > Summary > ======= > It 'looks' like the statement "MPOL_MF_STRICT is ignored on huge page > mappings." is left over from the original mbind implementation. When > the huge page migration support was added, I can not be sure if ignoring > MPOL_MF_STRICT for huge pages during the verify/isolation phase was > intentional. It seems like it was as the return value from > isolate_huge_page() is ignored. This summary is totally correct. I've simply missed considering MPOL_MF_STRICT flag when implementing hugetlb migration. As you pointed out, the discrepacy between the manpage and the code is also due to the lack of updates on the "MPOL_MF_STRICT is ignored on huge page mappings." statement. On Wed, Jan 15, 2020 at 01:59:14PM -0800, Yang Shi wrote: > > On 1/15/20 1:45 PM, Mike Kravetz wrote: > > On 1/15/20 1:30 PM, Yang Shi wrote: > > > On 1/15/20 1:07 PM, Mike Kravetz wrote: > > > > What should we do? > > > > ================== > > > > 1) Nothing more than optimizations by Li Xinhai. Behavior that could be > > > > seen as conflicting with man page has existed since v3.12 and I am > > > > not aware of any complaints. > > > > 2) In addition to optimizations by Li Xinhai, modify code to truly ignore > > > > MPOL_MF_STRICT for huge page mappings. This would be fairly easy to do > > > > after a failure of migrate_pages(). We could simply traverse the list > > > > of pages that were not migrated looking for any non-hugetlb page. > > > I don't think we can do this easily since migrate_pages() would put the migration failed hugetlb pages back to hugepage_activelist so there should not any hugetlb page reside on the pagelist regardless of failure if I read the code correctly. Although this behavior seems to me not prevent from finding non-hugetlb pages in migration list, this is a difference in migration behavior between normal pages and hugepages that might be better to be optimized. Maybe hugepages failed to migrate should remain in migration list after migrate_pages() returns and the should be put back via putback_movable_pages(). > > > > > You are correct. I made an assumption without actually looking at the code. :( > > > > > Other than that traversing page list to look for a certain type page doesn't sound scalable to me. > > > > > > > 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings." > > > > and modify code accordingly. > > > > > > > > My suggestion would be for 1 or 2. Thoughts? > > > By rethinking the history (thanks again for digging into it), it sounds #3 should be more reasonable. It sounds like the behavior was changed since hugetlb migration was added but the man page was not updated to reflect the change. > > > > > Let's hope Naoya comments. My only concern with #3 is that we will be changing > > behavior. I do not think many people (if any) depend on existing behavior. > > However, you can never be sure. > > Yes, this would change the bahavior, but I don't see why we have to treat > hugetlb specially nowadays with migration supported. (Option #1 is good for short term solution, but eventually) I agree with option #3. We have no reason to handle hugetlb differently about MPOL_MF_STRICT flag. Thanks, Naoya Horiguchi ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-16 8:07 ` HORIGUCHI NAOYA(堀口 直也) @ 2020-01-16 15:32 ` Li Xinhai 0 siblings, 0 replies; 30+ messages in thread From: Li Xinhai @ 2020-01-16 15:32 UTC (permalink / raw) To: HORIGUCHI NAOYA(堀口 直也), yang.shi, Mike Kravetz Cc: linux-mm, akpm, mhocko, n-horiguchi On 2020-01-16 at 16:07 HORIGUCHI NAOYA(堀口 直也) wrote: >Hi everyone, > >Thank you all for finding and digging the issue. > >> Summary >> ======= >> It 'looks' like the statement "MPOL_MF_STRICT is ignored on huge page >> mappings." is left over from the original mbind implementation. When >> the huge page migration support was added, I can not be sure if ignoring >> MPOL_MF_STRICT for huge pages during the verify/isolation phase was >> intentional. It seems like it was as the return value from >> isolate_huge_page() is ignored. > >This summary is totally correct. I've simply missed considering MPOL_MF_STRICT >flag when implementing hugetlb migration. As you pointed out, the discrepacy >between the manpage and the code is also due to the lack of updates on the >"MPOL_MF_STRICT is ignored on huge page mappings." statement. > >On Wed, Jan 15, 2020 at 01:59:14PM -0800, Yang Shi wrote: >> >> On 1/15/20 1:45 PM, Mike Kravetz wrote: >> > On 1/15/20 1:30 PM, Yang Shi wrote: >> > > On 1/15/20 1:07 PM, Mike Kravetz wrote: >> > > > What should we do? >> > > > ================== >> > > > 1) Nothing more than optimizations by Li Xinhai. Behavior that could be >> > > > seen as conflicting with man page has existed since v3.12 and I am >> > > > not aware of any complaints. >> > > > 2) In addition to optimizations by Li Xinhai, modify code to truly ignore >> > > > MPOL_MF_STRICT for huge page mappings. This would be fairly easy to do >> > > > after a failure of migrate_pages(). We could simply traverse the list >> > > > of pages that were not migrated looking for any non-hugetlb page. >> > > I don't think we can do this easily since migrate_pages() would put the migration failed hugetlb pages back to hugepage_activelist so there should not any hugetlb page reside on the pagelist regardless of failure if I read the code correctly. > >Although this behavior seems to me not prevent from finding non-hugetlb >pages in migration list, this is a difference in migration behavior between >normal pages and hugepages that might be better to be optimized. >Maybe hugepages failed to migrate should remain in migration list after >migrate_pages() returns and the should be put back via putback_movable_pages(). > >> > > >> > You are correct. I made an assumption without actually looking at the code. :( >> > >> > > Other than that traversing page list to look for a certain type page doesn't sound scalable to me. >> > > >> > > > 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings." >> > > > and modify code accordingly. >> > > > >> > > > My suggestion would be for 1 or 2. Thoughts? >> > > By rethinking the history (thanks again for digging into it), it sounds #3 should be more reasonable. It sounds like the behavior was changed since hugetlb migration was added but the man page was not updated to reflect the change. >> > > >> > Let's hope Naoya comments. My only concern with #3 is that we will be changing >> > behavior. I do not think many people (if any) depend on existing behavior. >> > However, you can never be sure. >> >> Yes, this would change the bahavior, but I don't see why we have to treat >> hugetlb specially nowadays with migration supported. > >(Option #1 is good for short term solution, but eventually) I agree with option #3. >We have no reason to handle hugetlb differently about MPOL_MF_STRICT flag. Thanks. Same thoughts for option #3, but it seems better not change current behavior. Add more about current behavior of code: - In unmap&move phase, there is no different behavior of handling hugepage and non-hugepage, that is when STRICT is set, report -EIO if any page can't move, when STRICT is not set, don't report when failed to move; - In isolation phase, STRICT is effective for non-hugepge, that means set STRICT alone will cause -EIO if found misplaced pages, and set STRICT with MOVE* will cause -EIO if failed to isolate pages; for hugepage, STRICT is ignored, it don't detect misplaced pages nor report -EIO if isolation failed. This patch don't change any part of current behavior, only avoids walking page table, where currently do nothing if STRICT is set alone. > >Thanks, >Naoya Horiguchi ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-15 21:07 ` Mike Kravetz 2020-01-15 21:30 ` Yang Shi @ 2020-01-16 7:59 ` Michal Hocko 2020-01-16 19:22 ` Mike Kravetz 1 sibling, 1 reply; 30+ messages in thread From: Michal Hocko @ 2020-01-16 7:59 UTC (permalink / raw) To: Mike Kravetz; +Cc: Li Xinhai, linux-mm, akpm, yang.shi, n-horiguchi On Wed 15-01-20 13:07:17, Mike Kravetz wrote: [...] > Summary > ======= > It 'looks' like the statement "MPOL_MF_STRICT is ignored on huge page > mappings." is left over from the original mbind implementation. When > the huge page migration support was added, I can not be sure if ignoring > MPOL_MF_STRICT for huge pages during the verify/isolation phase was > intentional. It seems like it was as the return value from > isolate_huge_page() is ignored. THanks for the tedious work of studying the mess^Whistory. > > What should we do? > ================== > 1) Nothing more than optimizations by Li Xinhai. Behavior that could be > seen as conflicting with man page has existed since v3.12 and I am > not aware of any complaints. > 2) In addition to optimizations by Li Xinhai, modify code to truly ignore > MPOL_MF_STRICT for huge page mappings. This would be fairly easy to do > after a failure of migrate_pages(). We could simply traverse the list > of pages that were not migrated looking for any non-hugetlb page. > 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings." > and modify code accordingly. > > My suggestion would be for 1 or 2. Thoughts? And why do we exactly need to do anything at all? There is an inconsistency that has been there for years without anybody noticing. NUMA API is a mess on its own and unfixable at this stage, there will always be some corner cases. If there is no real workload hitting this incosistency and suffering, I would rather not touch this at all. Unless the change would clean up the code or make it more maintainable. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-16 7:59 ` Michal Hocko @ 2020-01-16 19:22 ` Mike Kravetz 2020-01-17 2:32 ` Yang Shi 2020-01-17 2:38 ` Li Xinhai 0 siblings, 2 replies; 30+ messages in thread From: Mike Kravetz @ 2020-01-16 19:22 UTC (permalink / raw) To: Michal Hocko; +Cc: Li Xinhai, linux-mm, akpm, yang.shi, n-horiguchi On 1/15/20 11:59 PM, Michal Hocko wrote: > On Wed 15-01-20 13:07:17, Mike Kravetz wrote: >> What should we do? >> ================== >> 1) Nothing more than optimizations by Li Xinhai. Behavior that could be >> seen as conflicting with man page has existed since v3.12 and I am >> not aware of any complaints. >> 2) In addition to optimizations by Li Xinhai, modify code to truly ignore >> MPOL_MF_STRICT for huge page mappings. This would be fairly easy to do >> after a failure of migrate_pages(). We could simply traverse the list >> of pages that were not migrated looking for any non-hugetlb page. >> 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings." >> and modify code accordingly. >> >> My suggestion would be for 1 or 2. Thoughts? > > And why do we exactly need to do anything at all? There is an > inconsistency that has been there for years without anybody noticing. > NUMA API is a mess on its own and unfixable at this stage, there will > always be some corner cases. If there is no real workload hitting this > incosistency and suffering, I would rather not touch this at all. > Unless the change would clean up the code or make it more maintainable. That is a very valid point. Sometimes we as developers get focused on the actual code changes and fail to ask the question "does this really need to be changed?" or "what value do the code changes provide?". Li Xinhai came up with two optimizations in how the mbind code deals with hugetlb pages. This 'sub-optimal' code has existed for more than 6 years. Unless I am mistaken, nobody has actually complained or noticed this behavior. I believe Li Xinhai noticed this inefficient code via code inspection. Of course, based on what we know today one could write a test program to show the inefficient behavior. However, no real users have noticed this during the past 6 years. The proposed code changes are fairly simple. However, I would not say that they clean up the code or make it more maintainable. They essentially add or modify two checks to bail out early for hugetlb vma's if the flag which is documented to not apply to hugetlb pages (MPOL_MF_STRICT) is specified. If one is trying to follow the entire mbind code path for hugetlb pages, these patches will make that easier follow/understand. That is simply because one can ignore downstream code/functionality. Based on Michal's criteria above, I now believe the code changes should not be made. Yes, they are fairly simple. However, even simple changes have the potential to break something (build breakage with v1 of patch). We should leave this code as is unless issues are reported by users. -- Mike Kravetz ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-16 19:22 ` Mike Kravetz @ 2020-01-17 2:32 ` Yang Shi 2020-01-17 2:38 ` Li Xinhai 1 sibling, 0 replies; 30+ messages in thread From: Yang Shi @ 2020-01-17 2:32 UTC (permalink / raw) To: Mike Kravetz, Michal Hocko; +Cc: Li Xinhai, linux-mm, akpm, n-horiguchi On 1/16/20 11:22 AM, Mike Kravetz wrote: > On 1/15/20 11:59 PM, Michal Hocko wrote: >> On Wed 15-01-20 13:07:17, Mike Kravetz wrote: >>> What should we do? >>> ================== >>> 1) Nothing more than optimizations by Li Xinhai. Behavior that could be >>> seen as conflicting with man page has existed since v3.12 and I am >>> not aware of any complaints. >>> 2) In addition to optimizations by Li Xinhai, modify code to truly ignore >>> MPOL_MF_STRICT for huge page mappings. This would be fairly easy to do >>> after a failure of migrate_pages(). We could simply traverse the list >>> of pages that were not migrated looking for any non-hugetlb page. >>> 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings." >>> and modify code accordingly. >>> >>> My suggestion would be for 1 or 2. Thoughts? >> And why do we exactly need to do anything at all? There is an >> inconsistency that has been there for years without anybody noticing. >> NUMA API is a mess on its own and unfixable at this stage, there will >> always be some corner cases. If there is no real workload hitting this >> incosistency and suffering, I would rather not touch this at all. >> Unless the change would clean up the code or make it more maintainable. > That is a very valid point. Sometimes we as developers get focused on the > actual code changes and fail to ask the question "does this really need to > be changed?" or "what value do the code changes provide?". > > Li Xinhai came up with two optimizations in how the mbind code deals with > hugetlb pages. This 'sub-optimal' code has existed for more than 6 years. > Unless I am mistaken, nobody has actually complained or noticed this behavior. > I believe Li Xinhai noticed this inefficient code via code inspection. Of > course, based on what we know today one could write a test program to show > the inefficient behavior. However, no real users have noticed this during > the past 6 years. > > The proposed code changes are fairly simple. However, I would not say that > they clean up the code or make it more maintainable. They essentially add > or modify two checks to bail out early for hugetlb vma's if the flag which > is documented to not apply to hugetlb pages (MPOL_MF_STRICT) is specified. > If one is trying to follow the entire mbind code path for hugetlb pages, > these patches will make that easier follow/understand. That is simply > because one can ignore downstream code/functionality. > > Based on Michal's criteria above, I now believe the code changes should not > be made. Yes, they are fairly simple. However, even simple changes have > the potential to break something (build breakage with v1 of patch). We should > leave this code as is unless issues are reported by users. I tend to agree with you. And, according to what Horiguchi explained, the intention was not ignoring hugetlb mappings when hugetlb migration was added at the first place. And, I'm supposed all of us agree hugetlb pages should be not treated specially although it is not a good timing or there is not strong motivation to fix it right now (we may correct the behavior in the future). The patch may convey the wrong information. And, the code path is definitely not a hot path, so I'm fine to drop it. And, I'm wondering if we need add some comments in the code to explain the edge case just in case someone else repeat all the tedious history digging. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-16 19:22 ` Mike Kravetz 2020-01-17 2:32 ` Yang Shi @ 2020-01-17 2:38 ` Li Xinhai 2020-01-17 7:57 ` Michal Hocko 1 sibling, 1 reply; 30+ messages in thread From: Li Xinhai @ 2020-01-17 2:38 UTC (permalink / raw) To: Mike Kravetz, mhocko; +Cc: linux-mm, akpm, yang.shi, n-horiguchi On 2020-01-17 at 03:22 Mike Kravetz wrote: >On 1/15/20 11:59 PM, Michal Hocko wrote: >> On Wed 15-01-20 13:07:17, Mike Kravetz wrote: >>> What should we do? >>> ================== >>> 1) Nothing more than optimizations by Li Xinhai. Behavior that could be >>> seen as conflicting with man page has existed since v3.12 and I am >>> not aware of any complaints. >>> 2) In addition to optimizations by Li Xinhai, modify code to truly ignore >>> MPOL_MF_STRICT for huge page mappings. This would be fairly easy to do >>> after a failure of migrate_pages(). We could simply traverse the list >>> of pages that were not migrated looking for any non-hugetlb page. >>> 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings." >>> and modify code accordingly. >>> >>> My suggestion would be for 1 or 2. Thoughts? >> >> And why do we exactly need to do anything at all? There is an >> inconsistency that has been there for years without anybody noticing. >> NUMA API is a mess on its own and unfixable at this stage, there will >> always be some corner cases. If there is no real workload hitting this >> incosistency and suffering, I would rather not touch this at all. >> Unless the change would clean up the code or make it more maintainable. > >That is a very valid point. Sometimes we as developers get focused on the >actual code changes and fail to ask the question "does this really need to >be changed?" or "what value do the code changes provide?". > >Li Xinhai came up with two optimizations in how the mbind code deals with >hugetlb pages. This 'sub-optimal' code has existed for more than 6 years. >Unless I am mistaken, nobody has actually complained or noticed this behavior. >I believe Li Xinhai noticed this inefficient code via code inspection. Of >course, based on what we know today one could write a test program to show >the inefficient behavior. However, no real users have noticed this during >the past 6 years. > >The proposed code changes are fairly simple. However, I would not say that >they clean up the code or make it more maintainable. They essentially add >or modify two checks to bail out early for hugetlb vma's if the flag which >is documented to not apply to hugetlb pages (MPOL_MF_STRICT) is specified. >If one is trying to follow the entire mbind code path for hugetlb pages, >these patches will make that easier follow/understand. That is simply >because one can ignore downstream code/functionality. > >Based on Michal's criteria above, I now believe the code changes should not >be made. Yes, they are fairly simple. However, even simple changes have >the potential to break something (build breakage with v1 of patch). We should >leave this code as is unless issues are reported by users. Acctually I am the user of this API, and when using STRICT alone to know if pages are misplaced and take action later, it seems don't work consistantly on hugepage. Initially, I assumed that I didn't use it correctly, that flag must use with MOVE*? After check the code, found that flag didn't been handled, and finally noticed that there is a NOTE about it in MAN page. I'd like the STRICT been handled, but at present since this is not going to be implemented, I have to handle differently on hugetlb mapping. At meantime, I thought that this patch can be useful and posted it, because for scenarios where hugetlb mapping are handled with other mappings, less cost for the mbind call. (although it didn't help my current case) I think if we could provid better performance, why not do that only because that code has exists for 6 years? >-- >Mike Kravetz ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-17 2:38 ` Li Xinhai @ 2020-01-17 7:57 ` Michal Hocko 2020-01-17 12:05 ` Li Xinhai 0 siblings, 1 reply; 30+ messages in thread From: Michal Hocko @ 2020-01-17 7:57 UTC (permalink / raw) To: Li Xinhai; +Cc: Mike Kravetz, linux-mm, akpm, yang.shi, n-horiguchi On Fri 17-01-20 10:38:21, Li Xinhai wrote: > On 2020-01-17 at 03:22 Mike Kravetz wrote: > >On 1/15/20 11:59 PM, Michal Hocko wrote: > >> On Wed 15-01-20 13:07:17, Mike Kravetz wrote: > >>> What should we do? > >>> ================== > >>> 1) Nothing more than optimizations by Li Xinhai. Behavior that could be > >>> seen as conflicting with man page has existed since v3.12 and I am > >>> not aware of any complaints. > >>> 2) In addition to optimizations by Li Xinhai, modify code to truly ignore > >>> MPOL_MF_STRICT for huge page mappings. This would be fairly easy to do > >>> after a failure of migrate_pages(). We could simply traverse the list > >>> of pages that were not migrated looking for any non-hugetlb page. > >>> 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings." > >>> and modify code accordingly. > >>> > >>> My suggestion would be for 1 or 2. Thoughts? > >> > >> And why do we exactly need to do anything at all? There is an > >> inconsistency that has been there for years without anybody noticing. > >> NUMA API is a mess on its own and unfixable at this stage, there will > >> always be some corner cases. If there is no real workload hitting this > >> incosistency and suffering, I would rather not touch this at all. > >> Unless the change would clean up the code or make it more maintainable. > > > >That is a very valid point. Sometimes we as developers get focused on the > >actual code changes and fail to ask the question "does this really need to > >be changed?" or "what value do the code changes provide?". > > > >Li Xinhai came up with two optimizations in how the mbind code deals with > >hugetlb pages. This 'sub-optimal' code has existed for more than 6 years. > >Unless I am mistaken, nobody has actually complained or noticed this behavior. > >I believe Li Xinhai noticed this inefficient code via code inspection. Of > >course, based on what we know today one could write a test program to show > >the inefficient behavior. However, no real users have noticed this during > >the past 6 years. > > > >The proposed code changes are fairly simple. However, I would not say that > >they clean up the code or make it more maintainable. They essentially add > >or modify two checks to bail out early for hugetlb vma's if the flag which > >is documented to not apply to hugetlb pages (MPOL_MF_STRICT) is specified. > >If one is trying to follow the entire mbind code path for hugetlb pages, > >these patches will make that easier follow/understand. That is simply > >because one can ignore downstream code/functionality. > > > >Based on Michal's criteria above, I now believe the code changes should not > >be made. Yes, they are fairly simple. However, even simple changes have > >the potential to break something (build breakage with v1 of patch). We should > >leave this code as is unless issues are reported by users. > > Acctually I am the user of this API, and when using STRICT alone to know if > pages are misplaced and take action later, it seems don't work consistantly > on hugepage. Initially, I assumed that I didn't use it correctly, that flag must > use with MOVE*? After check the code, found that flag didn't been handled, > and finally noticed that there is a NOTE about it in MAN page. This is the first time you are mentioning an actual use case. This should have been expressed from the very begining. Including an explanation of what the use case is and how it is affected by the current behavior. > I'd like the STRICT been handled, but at present since this is not going to > be implemented, I have to handle differently on hugetlb mapping. > > At meantime, I thought that this patch can be useful and posted it, because > for scenarios where hugetlb mapping are handled with other mappings, less > cost for the mbind call. (although it didn't help my current case) > > I think if we could provid better performance, why not do that only because > that code has exists for 6 years? Do you have any numbers to prove performance improvements? I believe arguments against the patch have been already presented. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-17 7:57 ` Michal Hocko @ 2020-01-17 12:05 ` Li Xinhai 2020-01-17 15:20 ` Michal Hocko 0 siblings, 1 reply; 30+ messages in thread From: Li Xinhai @ 2020-01-17 12:05 UTC (permalink / raw) To: mhocko; +Cc: Mike Kravetz, linux-mm, akpm, yang.shi, n-horiguchi On 2020-01-17 at 15:57 Michal Hocko wrote: >On Fri 17-01-20 10:38:21, Li Xinhai wrote: >> On 2020-01-17 at 03:22 Mike Kravetz wrote: >> >On 1/15/20 11:59 PM, Michal Hocko wrote: >> >> On Wed 15-01-20 13:07:17, Mike Kravetz wrote: >> >>> What should we do? >> >>> ================== >> >>> 1) Nothing more than optimizations by Li Xinhai. Behavior that could be >> >>> seen as conflicting with man page has existed since v3.12 and I am >> >>> not aware of any complaints. >> >>> 2) In addition to optimizations by Li Xinhai, modify code to truly ignore >> >>> MPOL_MF_STRICT for huge page mappings. This would be fairly easy to do >> >>> after a failure of migrate_pages(). We could simply traverse the list >> >>> of pages that were not migrated looking for any non-hugetlb page. >> >>> 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings." >> >>> and modify code accordingly. >> >>> >> >>> My suggestion would be for 1 or 2. Thoughts? >> >> >> >> And why do we exactly need to do anything at all? There is an >> >> inconsistency that has been there for years without anybody noticing. >> >> NUMA API is a mess on its own and unfixable at this stage, there will >> >> always be some corner cases. If there is no real workload hitting this >> >> incosistency and suffering, I would rather not touch this at all. >> >> Unless the change would clean up the code or make it more maintainable. >> > >> >That is a very valid point. Sometimes we as developers get focused on the >> >actual code changes and fail to ask the question "does this really need to >> >be changed?" or "what value do the code changes provide?". >> > >> >Li Xinhai came up with two optimizations in how the mbind code deals with >> >hugetlb pages. This 'sub-optimal' code has existed for more than 6 years. >> >Unless I am mistaken, nobody has actually complained or noticed this behavior. >> >I believe Li Xinhai noticed this inefficient code via code inspection. Of >> >course, based on what we know today one could write a test program to show >> >the inefficient behavior. However, no real users have noticed this during >> >the past 6 years. >> > >> >The proposed code changes are fairly simple. However, I would not say that >> >they clean up the code or make it more maintainable. They essentially add >> >or modify two checks to bail out early for hugetlb vma's if the flag which >> >is documented to not apply to hugetlb pages (MPOL_MF_STRICT) is specified. >> >If one is trying to follow the entire mbind code path for hugetlb pages, >> >these patches will make that easier follow/understand. That is simply >> >because one can ignore downstream code/functionality. >> > >> >Based on Michal's criteria above, I now believe the code changes should not >> >be made. Yes, they are fairly simple. However, even simple changes have >> >the potential to break something (build breakage with v1 of patch). We should >> >leave this code as is unless issues are reported by users. >> >> Acctually I am the user of this API, and when using STRICT alone to know if >> pages are misplaced and take action later, it seems don't work consistantly >> on hugepage. Initially, I assumed that I didn't use it correctly, that flag must >> use with MOVE*? After check the code, found that flag didn't been handled, >> and finally noticed that there is a NOTE about it in MAN page. > >This is the first time you are mentioning an actual use case. This >should have been expressed from the very begining. Including an >explanation of what the use case is and how it is affected by the >current behavior. > OK, that is better practice, thanks. >> I'd like the STRICT been handled, but at present since this is not going to >> be implemented, I have to handle differently on hugetlb mapping. >> >> At meantime, I thought that this patch can be useful and posted it, because >> for scenarios where hugetlb mapping are handled with other mappings, less >> cost for the mbind call. (although it didn't help my current case) >> >> I think if we could provid better performance, why not do that only because >> that code has exists for 6 years? > >Do you have any numbers to prove performance improvements? I believe >arguments against the patch have been already presented. I didn't test to know difference with this patch since it doesn't help for my case. If walking the page table and checking the present PTE is basically no cost according to existing experience, then it is not needed. Now, for my case, I am doing workaround by call mbind then check the numa_maps, instead of just call mbind. So, I'd like option #3 is there. That is a simple change, and I am glad to add this function if no objection. >-- >Michal Hocko >SUSE Labs ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-17 12:05 ` Li Xinhai @ 2020-01-17 15:20 ` Michal Hocko 2020-01-17 15:46 ` Li Xinhai 0 siblings, 1 reply; 30+ messages in thread From: Michal Hocko @ 2020-01-17 15:20 UTC (permalink / raw) To: Li Xinhai; +Cc: Mike Kravetz, linux-mm, akpm, yang.shi, n-horiguchi On Fri 17-01-20 20:05:23, Li Xinhai wrote: [...] > Now, for my case, I am doing workaround workaround for what? > by call mbind then check the > numa_maps, instead of just call mbind. So, I'd like option #3 is there. That is > a simple change, and I am glad to add this function if no objection. Any reason you cannot use move_pages syscall? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-17 15:20 ` Michal Hocko @ 2020-01-17 15:46 ` Li Xinhai 2020-01-20 12:45 ` Michal Hocko 0 siblings, 1 reply; 30+ messages in thread From: Li Xinhai @ 2020-01-17 15:46 UTC (permalink / raw) To: mhocko; +Cc: Mike Kravetz, linux-mm, akpm, yang.shi, n-horiguchi On 2020-01-17 at 23:20 Michal Hocko wrote: >On Fri 17-01-20 20:05:23, Li Xinhai wrote: >[...] >> Now, for my case, I am doing workaround > >workaround for what? > I mean now call mbind without MOVE* or STRICT, only apply policy, because my purpose is to take action later on misplaced pages. If STRICT give correct feedback, I don't need to check numa_maps. For above description, I don't use MOVE*, as want to take action later. >> by call mbind then check the >> numa_maps, instead of just call mbind. So, I'd like option #3 is there. That is >> a simple change, and I am glad to add this function if no objection. > >Any reason you cannot use move_pages syscall? I use it, that is the 'take action later' mentioned above, I'd like to know if there are misplace pages before call it if mbind can help to give answer. >-- >Michal Hocko >SUSE Labs ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-17 15:46 ` Li Xinhai @ 2020-01-20 12:45 ` Michal Hocko 2020-01-21 14:15 ` Li Xinhai 0 siblings, 1 reply; 30+ messages in thread From: Michal Hocko @ 2020-01-20 12:45 UTC (permalink / raw) To: Li Xinhai; +Cc: Mike Kravetz, linux-mm, akpm, yang.shi, n-horiguchi On Fri 17-01-20 23:46:03, Li Xinhai wrote: [...] > >> by call mbind then check the > >> numa_maps, instead of just call mbind. So, I'd like option #3 is there. That is > >> a simple change, and I am glad to add this function if no objection. > > > >Any reason you cannot use move_pages syscall? > > I use it, that is the 'take action later' mentioned above, I'd like to > know if there are misplace pages before call it if mbind can help > to give answer. I am sorry but I do not follow? Why do you need to do two steps? Why don't you simply call move_pages right away? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-20 12:45 ` Michal Hocko @ 2020-01-21 14:15 ` Li Xinhai 2020-01-21 14:53 ` Michal Hocko 0 siblings, 1 reply; 30+ messages in thread From: Li Xinhai @ 2020-01-21 14:15 UTC (permalink / raw) To: mhocko; +Cc: Mike Kravetz, linux-mm, akpm, yang.shi, n-horiguchi On 2020-01-20 at 20:45 Michal Hocko wrote: >On Fri 17-01-20 23:46:03, Li Xinhai wrote: >[...] >> >> by call mbind then check the >> >> numa_maps, instead of just call mbind. So, I'd like option #3 is there. That is >> >> a simple change, and I am glad to add this function if no objection. >> > >> >Any reason you cannot use move_pages syscall? >> >> I use it, that is the 'take action later' mentioned above, I'd like to >> know if there are misplace pages before call it if mbind can help >> to give answer. > >I am sorry but I do not follow? Why do you need to do two steps? Why >don't you simply call move_pages right away? I am trying to decrease the number of call to move_pages(). For call move_pages(), we need prepare parameters in array for 'pages', 'nodes' (i.e., it do not support moving pages as specified through start addr and length), which could be cost. One way for handling my case is - mbind: set policy without detecting if any pages misplaced nor moving pages. (continue below part when timing is right) - move_pages: call on pages which are within range of above mbind. The other way is: - mbind: set policy and detects if any pages are misplaced. (continue below part when timing is right) - move_pages: call on pages which are within range has misplaced pages. Knowing about misplaced pages from mbind will give me flexibility for choosing different solutions. > >-- >Michal Hocko >SUSE Labs ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-21 14:15 ` Li Xinhai @ 2020-01-21 14:53 ` Michal Hocko 2020-01-22 13:55 ` Li Xinhai 0 siblings, 1 reply; 30+ messages in thread From: Michal Hocko @ 2020-01-21 14:53 UTC (permalink / raw) To: Li Xinhai; +Cc: Mike Kravetz, linux-mm, akpm, yang.shi, n-horiguchi On Tue 21-01-20 22:15:43, Li Xinhai wrote: [...] > Knowing about misplaced pages from mbind will give me flexibility for > choosing different solutions. Would /proc/pid/numa_maps help to give you rough idea which mappings of interest have misplaced pages and move_pages for those? It is still allocating arrays but is that really a big deal? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone 2020-01-21 14:53 ` Michal Hocko @ 2020-01-22 13:55 ` Li Xinhai 0 siblings, 0 replies; 30+ messages in thread From: Li Xinhai @ 2020-01-22 13:55 UTC (permalink / raw) To: mhocko; +Cc: Mike Kravetz, linux-mm, akpm, yang.shi, n-horiguchi On 2020-01-21 at 22:53 Michal Hocko wrote: >On Tue 21-01-20 22:15:43, Li Xinhai wrote: >[...] >> Knowing about misplaced pages from mbind will give me flexibility for >> choosing different solutions. > >Would /proc/pid/numa_maps help to give you rough idea which mappings >of interest have misplaced pages and move_pages for those? It is still >allocating arrays but is that really a big deal? numa_maps can be checked about misplaced pages, and I mentioned it in previous mail for this purpose. For my case, using move_pages() is necessary for later action. The only difference is about mbind() could give feedback of misplaced page on impacted range, numa_mpas requres me iterate over maps (that will collect states on those maps although don't been used) to reach impacted range and check it. I may assume there is no much difference on overall performance, it is more about coding effort in user sapce application. I've being try to apply patch for mbind(i.e., apply STRICT on hugetlb mapping), and may sharing it after verification and let's see if that is usable, thanks. >-- >Michal Hocko >SUSE Labs ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable 2020-01-14 9:16 [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable Li Xinhai 2020-01-14 9:16 ` [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone Li Xinhai @ 2020-01-14 19:12 ` Mike Kravetz 2020-01-15 1:25 ` Andrew Morton 2 siblings, 0 replies; 30+ messages in thread From: Mike Kravetz @ 2020-01-14 19:12 UTC (permalink / raw) To: Li Xinhai, linux-mm; +Cc: akpm, Michal Hocko On 1/14/20 1:16 AM, Li Xinhai wrote: > Checking hstate at early phase when isolating page, instead of during > unmap and move phase, to avoid useless isolation. > > Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Mike Kravetz <mike.kravetz@oracle.com> > --- > include/linux/mempolicy.h | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h > index 5228c62..986e51d 100644 > --- a/include/linux/mempolicy.h > +++ b/include/linux/mempolicy.h > @@ -185,10 +185,9 @@ static inline bool vma_migratable(struct vm_area_struct *vma) > if (vma_is_dax(vma)) > return false; > > -#ifndef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION > - if (vma->vm_flags & VM_HUGETLB) > + if (is_vm_hugetlb_page(vma) && > + !hugepage_migration_supported(hstate_vma(vma))) > return false; > -#endif > > /* > * Migration allocates pages in the highest zone. If we cannot > Thanks! It is indeed better to perform the check early. Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> -- Mike Kravetz ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable 2020-01-14 9:16 [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable Li Xinhai 2020-01-14 9:16 ` [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone Li Xinhai 2020-01-14 19:12 ` [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable Mike Kravetz @ 2020-01-15 1:25 ` Andrew Morton 2 siblings, 0 replies; 30+ messages in thread From: Andrew Morton @ 2020-01-15 1:25 UTC (permalink / raw) To: Li Xinhai; +Cc: linux-mm, Michal Hocko, Mike Kravetz On Tue, 14 Jan 2020 09:16:17 +0000 Li Xinhai <lixinhai.lxh@gmail.com> wrote: > Checking hstate at early phase when isolating page, instead of during > unmap and move phase, to avoid useless isolation. > > ... > > --- a/include/linux/mempolicy.h > +++ b/include/linux/mempolicy.h > @@ -185,10 +185,9 @@ static inline bool vma_migratable(struct vm_area_struct *vma) > if (vma_is_dax(vma)) > return false; > > -#ifndef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION > - if (vma->vm_flags & VM_HUGETLB) > + if (is_vm_hugetlb_page(vma) && > + !hugepage_migration_supported(hstate_vma(vma))) > return false; > -#endif > > /* > * Migration allocates pages in the highest zone. If we cannot x86_64 allmodconfig: In file included from ./include/linux/hugetlb.h:26:0, from arch/x86/mm/fault.c:15: ./include/linux/mempolicy.h: In function vma_migratable: ./include/linux/mempolicy.h:190:4: error: implicit declaration of function hugepage_migration_supported; did you mean thp_migration_supported? [-Werror=implicit-function-declaration] !hugepage_migration_supported(hstate_vma(vma))) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ the obvious fix didn't work. ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2020-01-22 13:55 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-01-14 9:16 [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable Li Xinhai 2020-01-14 9:16 ` [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone Li Xinhai 2020-01-14 14:09 ` Li Xinhai 2020-01-14 18:27 ` Yang Shi 2020-01-15 1:07 ` Mike Kravetz 2020-01-15 1:24 ` Yang Shi 2020-01-15 4:28 ` Mike Kravetz 2020-01-15 5:23 ` Yang Shi 2020-01-15 7:36 ` Li Xinhai 2020-01-15 17:16 ` Yang Shi 2020-01-15 21:07 ` Mike Kravetz 2020-01-15 21:30 ` Yang Shi 2020-01-15 21:45 ` Mike Kravetz 2020-01-15 21:59 ` Yang Shi 2020-01-16 8:07 ` HORIGUCHI NAOYA(堀口 直也) 2020-01-16 15:32 ` Li Xinhai 2020-01-16 7:59 ` Michal Hocko 2020-01-16 19:22 ` Mike Kravetz 2020-01-17 2:32 ` Yang Shi 2020-01-17 2:38 ` Li Xinhai 2020-01-17 7:57 ` Michal Hocko 2020-01-17 12:05 ` Li Xinhai 2020-01-17 15:20 ` Michal Hocko 2020-01-17 15:46 ` Li Xinhai 2020-01-20 12:45 ` Michal Hocko 2020-01-21 14:15 ` Li Xinhai 2020-01-21 14:53 ` Michal Hocko 2020-01-22 13:55 ` Li Xinhai 2020-01-14 19:12 ` [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable Mike Kravetz 2020-01-15 1:25 ` Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).