* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree [not found] <578eb28b.YbRUDGz5RloTVlrE%akpm@linux-foundation.org> @ 2016-07-21 7:43 ` Michal Hocko 2016-07-21 8:13 ` Naoya Horiguchi ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Michal Hocko @ 2016-07-21 7:43 UTC (permalink / raw) To: akpm Cc: zhongjiang, qiuxishi, vbabka, mm-commits, Mike Kravetz, Naoya Horiguchi, Mel Gorman, linux-mm We have further discussed the patch and I believe it is not correct. See [1]. I am proposing the following alternative. [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz --- ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-07-21 7:43 ` + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree Michal Hocko @ 2016-07-21 8:13 ` Naoya Horiguchi 2016-07-21 10:29 ` Michal Hocko 2016-07-21 10:54 ` zhong jiang 2016-07-29 11:27 ` Michal Hocko 2 siblings, 1 reply; 26+ messages in thread From: Naoya Horiguchi @ 2016-07-21 8:13 UTC (permalink / raw) To: Michal Hocko Cc: akpm, zhongjiang, qiuxishi, vbabka, mm-commits, Mike Kravetz, Mel Gorman, linux-mm On Thu, Jul 21, 2016 at 09:43:40AM +0200, Michal Hocko wrote: > We have further discussed the patch and I believe it is not correct. See [1]. > I am proposing the following alternative. > > [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz > --- > From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mhocko@suse.com> > Date: Thu, 21 Jul 2016 09:28:13 +0200 > Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON > > Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he > runs his database load with memory online and offline running in > parallel. The reason is that huge_pmd_share might detect a shared pmd > which is currently migrated and so it has migration pte which is > !pte_huge. > > There doesn't seem to be any easy way to prevent from the race and in > fact seeing the migration swap entry is not harmful. Both callers of > huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range > will copy the swap entry and make it COW if needed. hugetlb_fault will > back off and so the page fault is retries if the page is still under > migration and waits for its completion in hugetlb_fault. > > That means that the BUG_ON is wrong and we should update it. Let's > simply check that all present ptes are pte_huge instead. > > Reported-by: zhongjiang <zhongjiang@huawei.com> > Signed-off-by: Michal Hocko <mhocko@suse.com> In the early days of hugetlb, we had an assumption that !pte_none is equivalent to pmd_present() because there was no valid non-present entry on huge_pte. Situation has changed by hugepage migration and/or hwpoison, so we have to care about the separation of these two, and make sure that pte_present is true before checking pte_huge. So I think this change is right. Thank you Zhong, Michal. Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-07-21 8:13 ` Naoya Horiguchi @ 2016-07-21 10:29 ` Michal Hocko 0 siblings, 0 replies; 26+ messages in thread From: Michal Hocko @ 2016-07-21 10:29 UTC (permalink / raw) To: Naoya Horiguchi Cc: akpm, zhongjiang, qiuxishi, vbabka, mm-commits, Mike Kravetz, Mel Gorman, linux-mm On Thu 21-07-16 08:13:55, Naoya Horiguchi wrote: > On Thu, Jul 21, 2016 at 09:43:40AM +0200, Michal Hocko wrote: > > We have further discussed the patch and I believe it is not correct. See [1]. > > I am proposing the following alternative. > > > > [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz > > --- > > From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001 > > From: Michal Hocko <mhocko@suse.com> > > Date: Thu, 21 Jul 2016 09:28:13 +0200 > > Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON > > > > Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he > > runs his database load with memory online and offline running in > > parallel. The reason is that huge_pmd_share might detect a shared pmd > > which is currently migrated and so it has migration pte which is > > !pte_huge. > > > > There doesn't seem to be any easy way to prevent from the race and in > > fact seeing the migration swap entry is not harmful. Both callers of > > huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range > > will copy the swap entry and make it COW if needed. hugetlb_fault will > > back off and so the page fault is retries if the page is still under > > migration and waits for its completion in hugetlb_fault. > > > > That means that the BUG_ON is wrong and we should update it. Let's > > simply check that all present ptes are pte_huge instead. > > > > Reported-by: zhongjiang <zhongjiang@huawei.com> > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > In the early days of hugetlb, we had an assumption that !pte_none is > equivalent to pmd_present() because there was no valid non-present entry > on huge_pte. Situation has changed by hugepage migration and/or hwpoison, > so we have to care about the separation of these two, and make sure that > pte_present is true before checking pte_huge. > > So I think this change is right. Thank you Zhong, Michal. > > Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Thank you for double checking Naoya! IIUC Fixes: 290408d4a250 ("hugetlb: hugepage migration core") should help. Maybe we should even tag that for stable? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-07-21 7:43 ` + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree Michal Hocko 2016-07-21 8:13 ` Naoya Horiguchi @ 2016-07-21 10:54 ` zhong jiang 2016-07-21 11:27 ` Michal Hocko 2016-07-29 11:27 ` Michal Hocko 2 siblings, 1 reply; 26+ messages in thread From: zhong jiang @ 2016-07-21 10:54 UTC (permalink / raw) To: Michal Hocko Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz, Naoya Horiguchi, Mel Gorman, linux-mm On 2016/7/21 15:43, Michal Hocko wrote: > We have further discussed the patch and I believe it is not correct. See [1]. > I am proposing the following alternative. > > [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz > --- > >From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mhocko@suse.com> > Date: Thu, 21 Jul 2016 09:28:13 +0200 > Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON > > Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he > runs his database load with memory online and offline running in > parallel. The reason is that huge_pmd_share might detect a shared pmd > which is currently migrated and so it has migration pte which is > !pte_huge. > > There doesn't seem to be any easy way to prevent from the race and in > fact seeing the migration swap entry is not harmful. Both callers of > huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range > will copy the swap entry and make it COW if needed. hugetlb_fault will > back off and so the page fault is retries if the page is still under > migration and waits for its completion in hugetlb_fault. > > That means that the BUG_ON is wrong and we should update it. Let's > simply check that all present ptes are pte_huge instead. > > Reported-by: zhongjiang <zhongjiang@huawei.com> > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > mm/hugetlb.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 34379d653aa3..31dd2b8b86b3 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4303,7 +4303,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, > pte = (pte_t *)pmd_alloc(mm, pud, addr); > } > } > - BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte)); > + BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte)); > > return pte; > } I don't think that the patch can fix the question. The explain is as follow. cpu0 cpu1 copy_hugetlb_page_range try_to_unmap_one huge_pte_alloc #pmd may be shared lock dst_pte #dst_pte may be migrate lock src_pte #src_pte may be normal pt1 set_huge_pte_at #dst_pte points to normal spin_unlock (src_pt1) lock src_pte spin_unlock(dst_pt1) set src_pte migrate entry spin_unlock(src_pte) * dst_pte is a normal pte, but corresponding to the pfn is under migrate. it is dangerous. The race may occur. is right ? if the scenario exist. we should think about more. Thanks zhongjiang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-07-21 10:54 ` zhong jiang @ 2016-07-21 11:27 ` Michal Hocko 2016-07-21 12:14 ` zhong jiang 0 siblings, 1 reply; 26+ messages in thread From: Michal Hocko @ 2016-07-21 11:27 UTC (permalink / raw) To: zhong jiang Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz, Naoya Horiguchi, Mel Gorman, linux-mm On Thu 21-07-16 18:54:09, zhong jiang wrote: > On 2016/7/21 15:43, Michal Hocko wrote: > > We have further discussed the patch and I believe it is not correct. See [1]. > > I am proposing the following alternative. > > > > [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz > > --- > > >From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001 > > From: Michal Hocko <mhocko@suse.com> > > Date: Thu, 21 Jul 2016 09:28:13 +0200 > > Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON > > > > Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he > > runs his database load with memory online and offline running in > > parallel. The reason is that huge_pmd_share might detect a shared pmd > > which is currently migrated and so it has migration pte which is > > !pte_huge. > > > > There doesn't seem to be any easy way to prevent from the race and in > > fact seeing the migration swap entry is not harmful. Both callers of > > huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range > > will copy the swap entry and make it COW if needed. hugetlb_fault will > > back off and so the page fault is retries if the page is still under > > migration and waits for its completion in hugetlb_fault. > > > > That means that the BUG_ON is wrong and we should update it. Let's > > simply check that all present ptes are pte_huge instead. > > > > Reported-by: zhongjiang <zhongjiang@huawei.com> > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > --- > > mm/hugetlb.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 34379d653aa3..31dd2b8b86b3 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -4303,7 +4303,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, > > pte = (pte_t *)pmd_alloc(mm, pud, addr); > > } > > } > > - BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte)); > > + BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte)); > > > > return pte; > > } > > I don't think that the patch can fix the question. The explain is as follow. > > cpu0 cpu1 > copy_hugetlb_page_range try_to_unmap_one > huge_pte_alloc #pmd may be shared > lock dst_pte #dst_pte may be migrate > lock src_pte #src_pte may be normal pt1 > set_huge_pte_at #dst_pte points to normal > spin_unlock (src_pt1) > lock src_pte > spin_unlock(dst_pt1) set src_pte migrate entry > spin_unlock(src_pte) > * dst_pte is a normal pte, but corresponding to the > pfn is under migrate. it is dangerous. > > The race may occur. is right ? if the scenario exist. we should think about more. Can this happen at all? copy_hugetlb_page_range does the following to rule out shared page table entries. At least that is my understanding of c5c99429fa57 ("fix hugepages leak due to pagetable page sharing") /* If the pagetables are shared don't copy or take references */ if (dst_pte == src_pte) continue; -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-07-21 11:27 ` Michal Hocko @ 2016-07-21 12:14 ` zhong jiang 2016-07-21 12:30 ` Michal Hocko 0 siblings, 1 reply; 26+ messages in thread From: zhong jiang @ 2016-07-21 12:14 UTC (permalink / raw) To: Michal Hocko Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz, Naoya Horiguchi, Mel Gorman, linux-mm On 2016/7/21 19:27, Michal Hocko wrote: > On Thu 21-07-16 18:54:09, zhong jiang wrote: >> On 2016/7/21 15:43, Michal Hocko wrote: >>> We have further discussed the patch and I believe it is not correct. See [1]. >>> I am proposing the following alternative. >>> >>> [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz >>> --- >>> >From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001 >>> From: Michal Hocko <mhocko@suse.com> >>> Date: Thu, 21 Jul 2016 09:28:13 +0200 >>> Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON >>> >>> Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he >>> runs his database load with memory online and offline running in >>> parallel. The reason is that huge_pmd_share might detect a shared pmd >>> which is currently migrated and so it has migration pte which is >>> !pte_huge. >>> >>> There doesn't seem to be any easy way to prevent from the race and in >>> fact seeing the migration swap entry is not harmful. Both callers of >>> huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range >>> will copy the swap entry and make it COW if needed. hugetlb_fault will >>> back off and so the page fault is retries if the page is still under >>> migration and waits for its completion in hugetlb_fault. >>> >>> That means that the BUG_ON is wrong and we should update it. Let's >>> simply check that all present ptes are pte_huge instead. >>> >>> Reported-by: zhongjiang <zhongjiang@huawei.com> >>> Signed-off-by: Michal Hocko <mhocko@suse.com> >>> --- >>> mm/hugetlb.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >>> index 34379d653aa3..31dd2b8b86b3 100644 >>> --- a/mm/hugetlb.c >>> +++ b/mm/hugetlb.c >>> @@ -4303,7 +4303,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, >>> pte = (pte_t *)pmd_alloc(mm, pud, addr); >>> } >>> } >>> - BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte)); >>> + BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte)); >>> >>> return pte; >>> } >> I don't think that the patch can fix the question. The explain is as follow. >> >> cpu0 cpu1 >> copy_hugetlb_page_range try_to_unmap_one >> huge_pte_alloc #pmd may be shared >> lock dst_pte #dst_pte may be migrate >> lock src_pte #src_pte may be normal pt1 >> set_huge_pte_at #dst_pte points to normal >> spin_unlock (src_pt1) >> lock src_pte >> spin_unlock(dst_pt1) set src_pte migrate entry >> spin_unlock(src_pte) >> * dst_pte is a normal pte, but corresponding to the >> pfn is under migrate. it is dangerous. >> >> The race may occur. is right ? if the scenario exist. we should think about more. > Can this happen at all? copy_hugetlb_page_range does the following to > rule out shared page table entries. At least that is my understanding of > c5c99429fa57 ("fix hugepages leak due to pagetable page sharing") > > /* If the pagetables are shared don't copy or take references */ > if (dst_pte == src_pte) > continue; vm_file points to mapping should be shared, I am not sure, if it is so, the possibility is exist. of course, src_pte is the same as the dst_pte. when dst_pte is migrate entry and src pte is normal entry, if try_to_unmap_one is successful, then exec copy_hugetlb_page_range, it will lead to the dst_pte is under dangerous. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-07-21 12:14 ` zhong jiang @ 2016-07-21 12:30 ` Michal Hocko 2016-07-21 12:45 ` zhong jiang 0 siblings, 1 reply; 26+ messages in thread From: Michal Hocko @ 2016-07-21 12:30 UTC (permalink / raw) To: zhong jiang Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz, Naoya Horiguchi, Mel Gorman, linux-mm On Thu 21-07-16 20:14:41, zhong jiang wrote: > On 2016/7/21 19:27, Michal Hocko wrote: > > On Thu 21-07-16 18:54:09, zhong jiang wrote: > >> On 2016/7/21 15:43, Michal Hocko wrote: > >>> We have further discussed the patch and I believe it is not correct. See [1]. > >>> I am proposing the following alternative. > >>> > >>> [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz > >>> --- > >>> >From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001 > >>> From: Michal Hocko <mhocko@suse.com> > >>> Date: Thu, 21 Jul 2016 09:28:13 +0200 > >>> Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON > >>> > >>> Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he > >>> runs his database load with memory online and offline running in > >>> parallel. The reason is that huge_pmd_share might detect a shared pmd > >>> which is currently migrated and so it has migration pte which is > >>> !pte_huge. > >>> > >>> There doesn't seem to be any easy way to prevent from the race and in > >>> fact seeing the migration swap entry is not harmful. Both callers of > >>> huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range > >>> will copy the swap entry and make it COW if needed. hugetlb_fault will > >>> back off and so the page fault is retries if the page is still under > >>> migration and waits for its completion in hugetlb_fault. > >>> > >>> That means that the BUG_ON is wrong and we should update it. Let's > >>> simply check that all present ptes are pte_huge instead. > >>> > >>> Reported-by: zhongjiang <zhongjiang@huawei.com> > >>> Signed-off-by: Michal Hocko <mhocko@suse.com> > >>> --- > >>> mm/hugetlb.c | 2 +- > >>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>> > >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c > >>> index 34379d653aa3..31dd2b8b86b3 100644 > >>> --- a/mm/hugetlb.c > >>> +++ b/mm/hugetlb.c > >>> @@ -4303,7 +4303,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, > >>> pte = (pte_t *)pmd_alloc(mm, pud, addr); > >>> } > >>> } > >>> - BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte)); > >>> + BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte)); > >>> > >>> return pte; > >>> } > >> I don't think that the patch can fix the question. The explain is as follow. > >> > >> cpu0 cpu1 > >> copy_hugetlb_page_range try_to_unmap_one > >> huge_pte_alloc #pmd may be shared > >> lock dst_pte #dst_pte may be migrate > >> lock src_pte #src_pte may be normal pt1 > >> set_huge_pte_at #dst_pte points to normal > >> spin_unlock (src_pt1) > >> lock src_pte > >> spin_unlock(dst_pt1) set src_pte migrate entry > >> spin_unlock(src_pte) > >> * dst_pte is a normal pte, but corresponding to the > >> pfn is under migrate. it is dangerous. > >> > >> The race may occur. is right ? if the scenario exist. we should think about more. > > Can this happen at all? copy_hugetlb_page_range does the following to > > rule out shared page table entries. At least that is my understanding of > > c5c99429fa57 ("fix hugepages leak due to pagetable page sharing") > > > > /* If the pagetables are shared don't copy or take references */ > > if (dst_pte == src_pte) > > continue; > > vm_file points to mapping should be shared, I am not sure, if it is > so, the possibility is exist. of course, src_pte is the same as the > dst_pte. I am not sure I understand. This is a fork path where the ptes are copied over from the parent to the child. So how would vm_file differ? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-07-21 12:30 ` Michal Hocko @ 2016-07-21 12:45 ` zhong jiang 2016-07-21 12:55 ` Michal Hocko 0 siblings, 1 reply; 26+ messages in thread From: zhong jiang @ 2016-07-21 12:45 UTC (permalink / raw) To: Michal Hocko Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz, Naoya Horiguchi, Mel Gorman, linux-mm On 2016/7/21 20:30, Michal Hocko wrote: > On Thu 21-07-16 20:14:41, zhong jiang wrote: >> On 2016/7/21 19:27, Michal Hocko wrote: >>> On Thu 21-07-16 18:54:09, zhong jiang wrote: >>>> On 2016/7/21 15:43, Michal Hocko wrote: >>>>> We have further discussed the patch and I believe it is not correct. See [1]. >>>>> I am proposing the following alternative. >>>>> >>>>> [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz >>>>> --- >>>>> >From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001 >>>>> From: Michal Hocko <mhocko@suse.com> >>>>> Date: Thu, 21 Jul 2016 09:28:13 +0200 >>>>> Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON >>>>> >>>>> Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he >>>>> runs his database load with memory online and offline running in >>>>> parallel. The reason is that huge_pmd_share might detect a shared pmd >>>>> which is currently migrated and so it has migration pte which is >>>>> !pte_huge. >>>>> >>>>> There doesn't seem to be any easy way to prevent from the race and in >>>>> fact seeing the migration swap entry is not harmful. Both callers of >>>>> huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range >>>>> will copy the swap entry and make it COW if needed. hugetlb_fault will >>>>> back off and so the page fault is retries if the page is still under >>>>> migration and waits for its completion in hugetlb_fault. >>>>> >>>>> That means that the BUG_ON is wrong and we should update it. Let's >>>>> simply check that all present ptes are pte_huge instead. >>>>> >>>>> Reported-by: zhongjiang <zhongjiang@huawei.com> >>>>> Signed-off-by: Michal Hocko <mhocko@suse.com> >>>>> --- >>>>> mm/hugetlb.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >>>>> index 34379d653aa3..31dd2b8b86b3 100644 >>>>> --- a/mm/hugetlb.c >>>>> +++ b/mm/hugetlb.c >>>>> @@ -4303,7 +4303,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, >>>>> pte = (pte_t *)pmd_alloc(mm, pud, addr); >>>>> } >>>>> } >>>>> - BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte)); >>>>> + BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte)); >>>>> >>>>> return pte; >>>>> } >>>> I don't think that the patch can fix the question. The explain is as follow. >>>> >>>> cpu0 cpu1 >>>> copy_hugetlb_page_range try_to_unmap_one >>>> huge_pte_alloc #pmd may be shared >>>> lock dst_pte #dst_pte may be migrate >>>> lock src_pte #src_pte may be normal pt1 >>>> set_huge_pte_at #dst_pte points to normal >>>> spin_unlock (src_pt1) >>>> lock src_pte >>>> spin_unlock(dst_pt1) set src_pte migrate entry >>>> spin_unlock(src_pte) >>>> * dst_pte is a normal pte, but corresponding to the >>>> pfn is under migrate. it is dangerous. >>>> >>>> The race may occur. is right ? if the scenario exist. we should think about more. >>> Can this happen at all? copy_hugetlb_page_range does the following to >>> rule out shared page table entries. At least that is my understanding of >>> c5c99429fa57 ("fix hugepages leak due to pagetable page sharing") >>> >>> /* If the pagetables are shared don't copy or take references */ >>> if (dst_pte == src_pte) >>> continue; >> vm_file points to mapping should be shared, I am not sure, if it is >> so, the possibility is exist. of course, src_pte is the same as the >> dst_pte. > I am not sure I understand. This is a fork path where the ptes are > copied over from the parent to the child. So how would vm_file differ? I think you can misunderstand my meaning. A file refers to the mapping field can be shared by other process, parent process have the mapping , but is not only. This is only my viewpoint. is right ?? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-07-21 12:45 ` zhong jiang @ 2016-07-21 12:55 ` Michal Hocko 2016-07-21 13:25 ` zhong jiang 0 siblings, 1 reply; 26+ messages in thread From: Michal Hocko @ 2016-07-21 12:55 UTC (permalink / raw) To: zhong jiang Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz, Naoya Horiguchi, Mel Gorman, linux-mm On Thu 21-07-16 20:45:15, zhong jiang wrote: > On 2016/7/21 20:30, Michal Hocko wrote: > > On Thu 21-07-16 20:14:41, zhong jiang wrote: > >> On 2016/7/21 19:27, Michal Hocko wrote: > >>> On Thu 21-07-16 18:54:09, zhong jiang wrote: > >>>> On 2016/7/21 15:43, Michal Hocko wrote: > >>>>> We have further discussed the patch and I believe it is not correct. See [1]. > >>>>> I am proposing the following alternative. > >>>>> > >>>>> [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz > >>>>> --- > >>>>> >From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001 > >>>>> From: Michal Hocko <mhocko@suse.com> > >>>>> Date: Thu, 21 Jul 2016 09:28:13 +0200 > >>>>> Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON > >>>>> > >>>>> Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he > >>>>> runs his database load with memory online and offline running in > >>>>> parallel. The reason is that huge_pmd_share might detect a shared pmd > >>>>> which is currently migrated and so it has migration pte which is > >>>>> !pte_huge. > >>>>> > >>>>> There doesn't seem to be any easy way to prevent from the race and in > >>>>> fact seeing the migration swap entry is not harmful. Both callers of > >>>>> huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range > >>>>> will copy the swap entry and make it COW if needed. hugetlb_fault will > >>>>> back off and so the page fault is retries if the page is still under > >>>>> migration and waits for its completion in hugetlb_fault. > >>>>> > >>>>> That means that the BUG_ON is wrong and we should update it. Let's > >>>>> simply check that all present ptes are pte_huge instead. > >>>>> > >>>>> Reported-by: zhongjiang <zhongjiang@huawei.com> > >>>>> Signed-off-by: Michal Hocko <mhocko@suse.com> > >>>>> --- > >>>>> mm/hugetlb.c | 2 +- > >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>>> > >>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c > >>>>> index 34379d653aa3..31dd2b8b86b3 100644 > >>>>> --- a/mm/hugetlb.c > >>>>> +++ b/mm/hugetlb.c > >>>>> @@ -4303,7 +4303,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, > >>>>> pte = (pte_t *)pmd_alloc(mm, pud, addr); > >>>>> } > >>>>> } > >>>>> - BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte)); > >>>>> + BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte)); > >>>>> > >>>>> return pte; > >>>>> } > >>>> I don't think that the patch can fix the question. The explain is as follow. > >>>> > >>>> cpu0 cpu1 > >>>> copy_hugetlb_page_range try_to_unmap_one > >>>> huge_pte_alloc #pmd may be shared > >>>> lock dst_pte #dst_pte may be migrate > >>>> lock src_pte #src_pte may be normal pt1 > >>>> set_huge_pte_at #dst_pte points to normal > >>>> spin_unlock (src_pt1) > >>>> lock src_pte > >>>> spin_unlock(dst_pt1) set src_pte migrate entry > >>>> spin_unlock(src_pte) > >>>> * dst_pte is a normal pte, but corresponding to the > >>>> pfn is under migrate. it is dangerous. > >>>> > >>>> The race may occur. is right ? if the scenario exist. we should think about more. > >>> Can this happen at all? copy_hugetlb_page_range does the following to > >>> rule out shared page table entries. At least that is my understanding of > >>> c5c99429fa57 ("fix hugepages leak due to pagetable page sharing") > >>> > >>> /* If the pagetables are shared don't copy or take references */ > >>> if (dst_pte == src_pte) > >>> continue; > >> vm_file points to mapping should be shared, I am not sure, if it is > >> so, the possibility is exist. of course, src_pte is the same as the > >> dst_pte. > > I am not sure I understand. This is a fork path where the ptes are > > copied over from the parent to the child. So how would vm_file differ? > > I think you can misunderstand my meaning. A file refers to the > mapping field can be shared by other process, parent process have the > mapping , but is not only. This is only my viewpoint. is right ?? OK, now I understand what you mean. So you mean that a different process initiates the migration while this path copies to pte. That is certainly possible but I still fail to see what is the problem about that. huge_pte_alloc will return the identical pte whether it is regular or migration one. So what exactly is the problem? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-07-21 12:55 ` Michal Hocko @ 2016-07-21 13:25 ` zhong jiang 2016-07-21 13:40 ` Michal Hocko 0 siblings, 1 reply; 26+ messages in thread From: zhong jiang @ 2016-07-21 13:25 UTC (permalink / raw) To: Michal Hocko Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz, Naoya Horiguchi, Mel Gorman, linux-mm On 2016/7/21 20:55, Michal Hocko wrote: > On Thu 21-07-16 20:45:15, zhong jiang wrote: >> On 2016/7/21 20:30, Michal Hocko wrote: >>> On Thu 21-07-16 20:14:41, zhong jiang wrote: >>>> On 2016/7/21 19:27, Michal Hocko wrote: >>>>> On Thu 21-07-16 18:54:09, zhong jiang wrote: >>>>>> On 2016/7/21 15:43, Michal Hocko wrote: >>>>>>> We have further discussed the patch and I believe it is not correct. See [1]. >>>>>>> I am proposing the following alternative. >>>>>>> >>>>>>> [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz >>>>>>> --- >>>>>>> >From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001 >>>>>>> From: Michal Hocko <mhocko@suse.com> >>>>>>> Date: Thu, 21 Jul 2016 09:28:13 +0200 >>>>>>> Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON >>>>>>> >>>>>>> Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he >>>>>>> runs his database load with memory online and offline running in >>>>>>> parallel. The reason is that huge_pmd_share might detect a shared pmd >>>>>>> which is currently migrated and so it has migration pte which is >>>>>>> !pte_huge. >>>>>>> >>>>>>> There doesn't seem to be any easy way to prevent from the race and in >>>>>>> fact seeing the migration swap entry is not harmful. Both callers of >>>>>>> huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range >>>>>>> will copy the swap entry and make it COW if needed. hugetlb_fault will >>>>>>> back off and so the page fault is retries if the page is still under >>>>>>> migration and waits for its completion in hugetlb_fault. >>>>>>> >>>>>>> That means that the BUG_ON is wrong and we should update it. Let's >>>>>>> simply check that all present ptes are pte_huge instead. >>>>>>> >>>>>>> Reported-by: zhongjiang <zhongjiang@huawei.com> >>>>>>> Signed-off-by: Michal Hocko <mhocko@suse.com> >>>>>>> --- >>>>>>> mm/hugetlb.c | 2 +- >>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >>>>>>> index 34379d653aa3..31dd2b8b86b3 100644 >>>>>>> --- a/mm/hugetlb.c >>>>>>> +++ b/mm/hugetlb.c >>>>>>> @@ -4303,7 +4303,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, >>>>>>> pte = (pte_t *)pmd_alloc(mm, pud, addr); >>>>>>> } >>>>>>> } >>>>>>> - BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte)); >>>>>>> + BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte)); >>>>>>> >>>>>>> return pte; >>>>>>> } >>>>>> I don't think that the patch can fix the question. The explain is as follow. >>>>>> >>>>>> cpu0 cpu1 >>>>>> copy_hugetlb_page_range try_to_unmap_one >>>>>> huge_pte_alloc #pmd may be shared >>>>>> lock dst_pte #dst_pte may be migrate >>>>>> lock src_pte #src_pte may be normal pt1 >>>>>> set_huge_pte_at #dst_pte points to normal >>>>>> spin_unlock (src_pt1) >>>>>> lock src_pte >>>>>> spin_unlock(dst_pt1) set src_pte migrate entry >>>>>> spin_unlock(src_pte) >>>>>> * dst_pte is a normal pte, but corresponding to the >>>>>> pfn is under migrate. it is dangerous. >>>>>> >>>>>> The race may occur. is right ? if the scenario exist. we should think about more. >>>>> Can this happen at all? copy_hugetlb_page_range does the following to >>>>> rule out shared page table entries. At least that is my understanding of >>>>> c5c99429fa57 ("fix hugepages leak due to pagetable page sharing") >>>>> >>>>> /* If the pagetables are shared don't copy or take references */ >>>>> if (dst_pte == src_pte) >>>>> continue; >>>> vm_file points to mapping should be shared, I am not sure, if it is >>>> so, the possibility is exist. of course, src_pte is the same as the >>>> dst_pte. >>> I am not sure I understand. This is a fork path where the ptes are >>> copied over from the parent to the child. So how would vm_file differ? >> I think you can misunderstand my meaning. A file refers to the >> mapping field can be shared by other process, parent process have the >> mapping , but is not only. This is only my viewpoint. is right ?? > OK, now I understand what you mean. So you mean that a different process > initiates the migration while this path copies to pte. That is certainly > possible but I still fail to see what is the problem about that. > huge_pte_alloc will return the identical pte whether it is regular or > migration one. So what exactly is the problem? > copy_hugetlb_page_range obtain the shared dst_pte, it may be not equal to the src_pte. The dst_pte can come from other process sharing the mapping. /* If the pagetables are shared don't copy or take references */ if (dst_pte == src_pte) continue; Even it do the fork path, we scan the i_mmap to find same pte. I think that dst_pte may come from other process. It is not the parent. it will lead to the dst_pte is not equal to the src_pte from the parent. vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { is right ? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-07-21 13:25 ` zhong jiang @ 2016-07-21 13:40 ` Michal Hocko 2016-07-21 13:58 ` zhong jiang 0 siblings, 1 reply; 26+ messages in thread From: Michal Hocko @ 2016-07-21 13:40 UTC (permalink / raw) To: Naoya Horiguchi, zhong jiang Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz, Mel Gorman, linux-mm On Thu 21-07-16 21:25:38, zhong jiang wrote: > On 2016/7/21 20:55, Michal Hocko wrote: [...] > > OK, now I understand what you mean. So you mean that a different process > > initiates the migration while this path copies to pte. That is certainly > > possible but I still fail to see what is the problem about that. > > huge_pte_alloc will return the identical pte whether it is regular or > > migration one. So what exactly is the problem? > > > copy_hugetlb_page_range obtain the shared dst_pte, it may be not equal > to the src_pte. The dst_pte can come from other process sharing the > mapping. So you mean that the parent doesn't have the shared pte while the child would get one? > /* If the pagetables are shared don't copy or take references */ > if (dst_pte == src_pte) > continue; > > Even it do the fork path, we scan the i_mmap to find same pte. I think > that dst_pte may come from other process. It is not the parent. it > will lead to the dst_pte is not equal to the src_pte from the parent. Let's say this would be possible (I am not really sure but for the sake of argumentation), if the src is not shared while dst is shared and the page is under migration then all the page table should be marked as swap migrate entries no? If they are not and copy_hugetlb_page_range cannot handle with that then it is a bug in copy_hugetlb_page_range which doesn't have anything to do with the BUG_ON in huge_pte_alloc. So I would argue that if the problem exists at all it is a separate issue IMHO. Naoya, could you comment on that please? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-07-21 13:40 ` Michal Hocko @ 2016-07-21 13:58 ` zhong jiang 2016-07-21 14:01 ` Michal Hocko 0 siblings, 1 reply; 26+ messages in thread From: zhong jiang @ 2016-07-21 13:58 UTC (permalink / raw) To: Michal Hocko Cc: Naoya Horiguchi, akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz, Mel Gorman, linux-mm On 2016/7/21 21:40, Michal Hocko wrote: > On Thu 21-07-16 21:25:38, zhong jiang wrote: >> On 2016/7/21 20:55, Michal Hocko wrote: > [...] >>> OK, now I understand what you mean. So you mean that a different process >>> initiates the migration while this path copies to pte. That is certainly >>> possible but I still fail to see what is the problem about that. >>> huge_pte_alloc will return the identical pte whether it is regular or >>> migration one. So what exactly is the problem? >>> >> copy_hugetlb_page_range obtain the shared dst_pte, it may be not equal >> to the src_pte. The dst_pte can come from other process sharing the >> mapping. > So you mean that the parent doesn't have the shared pte while the child > would get one? > no, parent must have the shared pte because the the child copy the parent. but parent is not the only source pte we can get. when we scan the maping->i_mmap, firstly ,it can obtain a shared pte from other process. but I am not sure. >> /* If the pagetables are shared don't copy or take references */ >> if (dst_pte == src_pte) >> continue; >> >> Even it do the fork path, we scan the i_mmap to find same pte. I think >> that dst_pte may come from other process. It is not the parent. it >> will lead to the dst_pte is not equal to the src_pte from the parent. > Let's say this would be possible (I am not really sure but for the sake > of argumentation), if the src is not shared while dst is shared and the > page is under migration then all the page table should be marked as > swap migrate entries no? If they are not and copy_hugetlb_page_range > cannot handle with that then it is a bug in copy_hugetlb_page_range > which doesn't have anything to do with the BUG_ON in huge_pte_alloc. > So I would argue that if the problem exists at all it is a separate > issue IMHO. yes, it is a separate issule. > Naoya, could you comment on that please? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-07-21 13:58 ` zhong jiang @ 2016-07-21 14:01 ` Michal Hocko 2016-07-21 14:13 ` zhong jiang 0 siblings, 1 reply; 26+ messages in thread From: Michal Hocko @ 2016-07-21 14:01 UTC (permalink / raw) To: zhong jiang Cc: Naoya Horiguchi, akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz, Mel Gorman, linux-mm On Thu 21-07-16 21:58:23, zhong jiang wrote: > On 2016/7/21 21:40, Michal Hocko wrote: > > On Thu 21-07-16 21:25:38, zhong jiang wrote: > >> On 2016/7/21 20:55, Michal Hocko wrote: > > [...] > >>> OK, now I understand what you mean. So you mean that a different process > >>> initiates the migration while this path copies to pte. That is certainly > >>> possible but I still fail to see what is the problem about that. > >>> huge_pte_alloc will return the identical pte whether it is regular or > >>> migration one. So what exactly is the problem? > >>> > >> copy_hugetlb_page_range obtain the shared dst_pte, it may be not equal > >> to the src_pte. The dst_pte can come from other process sharing the > >> mapping. > > So you mean that the parent doesn't have the shared pte while the child > > would get one? > > > no, parent must have the shared pte because the the child copy the > parent. but parent is not the only source pte we can get. when we > scan the maping->i_mmap, firstly ,it can obtain a shared pte from > other process. but I am not sure. But then all the shared ptes should be identical, no? Or am I missing something? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-07-21 14:01 ` Michal Hocko @ 2016-07-21 14:13 ` zhong jiang 2016-07-21 14:27 ` Michal Hocko 0 siblings, 1 reply; 26+ messages in thread From: zhong jiang @ 2016-07-21 14:13 UTC (permalink / raw) To: Michal Hocko Cc: Naoya Horiguchi, akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz, Mel Gorman, linux-mm On 2016/7/21 22:01, Michal Hocko wrote: > On Thu 21-07-16 21:58:23, zhong jiang wrote: >> On 2016/7/21 21:40, Michal Hocko wrote: >>> On Thu 21-07-16 21:25:38, zhong jiang wrote: >>>> On 2016/7/21 20:55, Michal Hocko wrote: >>> [...] >>>>> OK, now I understand what you mean. So you mean that a different process >>>>> initiates the migration while this path copies to pte. That is certainly >>>>> possible but I still fail to see what is the problem about that. >>>>> huge_pte_alloc will return the identical pte whether it is regular or >>>>> migration one. So what exactly is the problem? >>>>> >>>> copy_hugetlb_page_range obtain the shared dst_pte, it may be not equal >>>> to the src_pte. The dst_pte can come from other process sharing the >>>> mapping. >>> So you mean that the parent doesn't have the shared pte while the child >>> would get one? >>> >> no, parent must have the shared pte because the the child copy the >> parent. but parent is not the only source pte we can get. when we >> scan the maping->i_mmap, firstly ,it can obtain a shared pte from >> other process. but I am not sure. > But then all the shared ptes should be identical, no? Or am I missing > something? all the shared ptes should be identical, but there is a possibility that new process want to share the pte from other process , other than the parent, For the first time the process is about to share pte with it. is it possiblity? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-07-21 14:13 ` zhong jiang @ 2016-07-21 14:27 ` Michal Hocko 2016-07-21 14:33 ` zhong jiang 0 siblings, 1 reply; 26+ messages in thread From: Michal Hocko @ 2016-07-21 14:27 UTC (permalink / raw) To: zhong jiang Cc: Naoya Horiguchi, akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz, Mel Gorman, linux-mm On Thu 21-07-16 22:13:55, zhong jiang wrote: > On 2016/7/21 22:01, Michal Hocko wrote: > > On Thu 21-07-16 21:58:23, zhong jiang wrote: > >> On 2016/7/21 21:40, Michal Hocko wrote: > >>> On Thu 21-07-16 21:25:38, zhong jiang wrote: > >>>> On 2016/7/21 20:55, Michal Hocko wrote: > >>> [...] > >>>>> OK, now I understand what you mean. So you mean that a different process > >>>>> initiates the migration while this path copies to pte. That is certainly > >>>>> possible but I still fail to see what is the problem about that. > >>>>> huge_pte_alloc will return the identical pte whether it is regular or > >>>>> migration one. So what exactly is the problem? > >>>>> > >>>> copy_hugetlb_page_range obtain the shared dst_pte, it may be not equal > >>>> to the src_pte. The dst_pte can come from other process sharing the > >>>> mapping. > >>> So you mean that the parent doesn't have the shared pte while the child > >>> would get one? > >>> > >> no, parent must have the shared pte because the the child copy the > >> parent. but parent is not the only source pte we can get. when we > >> scan the maping->i_mmap, firstly ,it can obtain a shared pte from > >> other process. but I am not sure. > > But then all the shared ptes should be identical, no? Or am I missing > > something? > all the shared ptes should be identical, but there is a possibility that new process > want to share the pte from other process , other than the parent, For the first time > the process is about to share pte with it. is it possiblity? I do not see how. They are opperating on the same mapping so I really do not see how different process makes any difference. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-07-21 14:27 ` Michal Hocko @ 2016-07-21 14:33 ` zhong jiang 2016-07-22 7:17 ` Naoya Horiguchi 0 siblings, 1 reply; 26+ messages in thread From: zhong jiang @ 2016-07-21 14:33 UTC (permalink / raw) To: Michal Hocko Cc: Naoya Horiguchi, akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz, Mel Gorman, linux-mm On 2016/7/21 22:27, Michal Hocko wrote: > On Thu 21-07-16 22:13:55, zhong jiang wrote: >> On 2016/7/21 22:01, Michal Hocko wrote: >>> On Thu 21-07-16 21:58:23, zhong jiang wrote: >>>> On 2016/7/21 21:40, Michal Hocko wrote: >>>>> On Thu 21-07-16 21:25:38, zhong jiang wrote: >>>>>> On 2016/7/21 20:55, Michal Hocko wrote: >>>>> [...] >>>>>>> OK, now I understand what you mean. So you mean that a different process >>>>>>> initiates the migration while this path copies to pte. That is certainly >>>>>>> possible but I still fail to see what is the problem about that. >>>>>>> huge_pte_alloc will return the identical pte whether it is regular or >>>>>>> migration one. So what exactly is the problem? >>>>>>> >>>>>> copy_hugetlb_page_range obtain the shared dst_pte, it may be not equal >>>>>> to the src_pte. The dst_pte can come from other process sharing the >>>>>> mapping. >>>>> So you mean that the parent doesn't have the shared pte while the child >>>>> would get one? >>>>> >>>> no, parent must have the shared pte because the the child copy the >>>> parent. but parent is not the only source pte we can get. when we >>>> scan the maping->i_mmap, firstly ,it can obtain a shared pte from >>>> other process. but I am not sure. >>> But then all the shared ptes should be identical, no? Or am I missing >>> something? >> all the shared ptes should be identical, but there is a possibility that new process >> want to share the pte from other process , other than the parent, For the first time >> the process is about to share pte with it. is it possiblity? > I do not see how. They are opperating on the same mapping so I really do > not see how different process makes any difference. > ok , In a words . the new process get the shared pte, The shared pte not come from the parent process. so , src_pte is not equal to dst_pte. because src_pte come from the parent, while dst_pte come from other process. obviously, it is not same. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-07-21 14:33 ` zhong jiang @ 2016-07-22 7:17 ` Naoya Horiguchi 2016-07-26 7:58 ` Michal Hocko 0 siblings, 1 reply; 26+ messages in thread From: Naoya Horiguchi @ 2016-07-22 7:17 UTC (permalink / raw) To: zhong jiang Cc: Michal Hocko, akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz, Mel Gorman, linux-mm On Thu, Jul 21, 2016 at 10:33:47PM +0800, zhong jiang wrote: > On 2016/7/21 22:27, Michal Hocko wrote: > > On Thu 21-07-16 22:13:55, zhong jiang wrote: > >> On 2016/7/21 22:01, Michal Hocko wrote: > >>> On Thu 21-07-16 21:58:23, zhong jiang wrote: > >>>> On 2016/7/21 21:40, Michal Hocko wrote: > >>>>> On Thu 21-07-16 21:25:38, zhong jiang wrote: > >>>>>> On 2016/7/21 20:55, Michal Hocko wrote: > >>>>> [...] > >>>>>>> OK, now I understand what you mean. So you mean that a different process > >>>>>>> initiates the migration while this path copies to pte. That is certainly > >>>>>>> possible but I still fail to see what is the problem about that. > >>>>>>> huge_pte_alloc will return the identical pte whether it is regular or > >>>>>>> migration one. So what exactly is the problem? > >>>>>>> > >>>>>> copy_hugetlb_page_range obtain the shared dst_pte, it may be not equal > >>>>>> to the src_pte. The dst_pte can come from other process sharing the > >>>>>> mapping. > >>>>> So you mean that the parent doesn't have the shared pte while the child > >>>>> would get one? > >>>>> > >>>> no, parent must have the shared pte because the the child copy the > >>>> parent. but parent is not the only source pte we can get. when we > >>>> scan the maping->i_mmap, firstly ,it can obtain a shared pte from > >>>> other process. but I am not sure. > >>> But then all the shared ptes should be identical, no? Or am I missing > >>> something? > >> all the shared ptes should be identical, but there is a possibility that new process > >> want to share the pte from other process , other than the parent, For the first time > >> the process is about to share pte with it. is it possiblity? > > I do not see how. They are opperating on the same mapping so I really do > > not see how different process makes any difference. > > > ok , In a words . the new process get the shared pte, The shared pte not come from the parent process. > so , src_pte is not equal to dst_pte. because src_pte come from the parent, while dst_pte come from > other process. obviously, it is not same. I think that (src_pte != dst_pte) can happen and that's ok if there's no migration entry. But even if we have both of normal entry and migration entry for one hugepage, that still looks fine to me because the running migration operation fails (because there remains mapcounts on the source hugepage), and all migration entries are turned back to normal entries pointing to the source hugepage. Could you try to see and share what happens on your workload with Michal's patch? If something weird/critical still happens, let's merge your patch. # I'm trying to write some test cases for it, but might take some time ... Thanks, Naoya Horiguchi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-07-22 7:17 ` Naoya Horiguchi @ 2016-07-26 7:58 ` Michal Hocko 2016-07-26 14:04 ` zhong jiang 0 siblings, 1 reply; 26+ messages in thread From: Michal Hocko @ 2016-07-26 7:58 UTC (permalink / raw) To: zhong jiang, Naoya Horiguchi Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz, Mel Gorman, linux-mm On Fri 22-07-16 07:17:37, Naoya Horiguchi wrote: [...] > I think that (src_pte != dst_pte) can happen and that's ok if there's no > migration entry. We have discussed that with Naoya off-list and couldn't find a scenario when parent would have !shared pmd while child would have it. The only plausible scenario was that parent created and poppulated mapping smaller than 1G and then enlarged it later on so the child would see sharedable pud. This doesn't seem to be possible because vma_merge would bail out due to VM_SPECIAL check. > But even if we have both of normal entry and migration entry > for one hugepage, that still looks fine to me because the running migration > operation fails (because there remains mapcounts on the source hugepage), > and all migration entries are turned back to normal entries pointing to the > source hugepage. Agreed. > Could you try to see and share what happens on your workload with > Michal's patch? Zhong Jiang did you have chance to retest with the BUG_ON changed? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-07-26 7:58 ` Michal Hocko @ 2016-07-26 14:04 ` zhong jiang 2016-07-27 14:44 ` Michal Hocko 0 siblings, 1 reply; 26+ messages in thread From: zhong jiang @ 2016-07-26 14:04 UTC (permalink / raw) To: Michal Hocko Cc: Naoya Horiguchi, akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz, Mel Gorman, linux-mm On 2016/7/26 15:58, Michal Hocko wrote: > On Fri 22-07-16 07:17:37, Naoya Horiguchi wrote: > [...] >> I think that (src_pte != dst_pte) can happen and that's ok if there's no >> migration entry. > We have discussed that with Naoya off-list and couldn't find a scenario > when parent would have !shared pmd while child would have it. The only > plausible scenario was that parent created and poppulated mapping smaller > than 1G and then enlarged it later on so the child would see sharedable > pud. This doesn't seem to be possible because vma_merge would bail out > due to VM_SPECIAL check. I do not understand that the process must have vm_special flags. if vm_special enable, the process must not be expanded. and what does it matter about vma_merge ?? >> But even if we have both of normal entry and migration entry >> for one hugepage, that still looks fine to me because the running migration >> operation fails (because there remains mapcounts on the source hugepage), >> and all migration entries are turned back to normal entries pointing to the >> source hugepage. In one case, try_to_unmap_one is first exec and successfully, mapcount turn into zero. then we get the pte lock, if src_pte!-dst_pte, it maybe lead to the dst_pte is from migrate pte to normal pte, while the normal pte turn into migaret pte,, is right ? > > Agreed. > >> Could you try to see and share what happens on your workload with >> Michal's patch? > Zhong Jiang did you have chance to retest with the BUG_ON changed? > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-07-26 14:04 ` zhong jiang @ 2016-07-27 14:44 ` Michal Hocko 0 siblings, 0 replies; 26+ messages in thread From: Michal Hocko @ 2016-07-27 14:44 UTC (permalink / raw) To: zhong jiang Cc: Naoya Horiguchi, akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz, Mel Gorman, linux-mm On Tue 26-07-16 22:04:16, zhong jiang wrote: > On 2016/7/26 15:58, Michal Hocko wrote: > > On Fri 22-07-16 07:17:37, Naoya Horiguchi wrote: > > [...] > >> I think that (src_pte != dst_pte) can happen and that's ok if there's no > >> migration entry. > > We have discussed that with Naoya off-list and couldn't find a scenario > > when parent would have !shared pmd while child would have it. The only > > plausible scenario was that parent created and poppulated mapping smaller > > than 1G and then enlarged it later on so the child would see sharedable > > pud. This doesn't seem to be possible because vma_merge would bail out > > due to VM_SPECIAL check. > I do not understand that the process must have vm_special flags. if > vm_special enable, the process must not be expanded. and what does it > matter about vma_merge ?? See if (vm_flags & VM_SPECIAL) return NULL; in vma_merge. > >> But even if we have both of normal entry and migration entry > >> for one hugepage, that still looks fine to me because the running migration > >> operation fails (because there remains mapcounts on the source hugepage), > >> and all migration entries are turned back to normal entries pointing to the > >> source hugepage. > > In one case, try_to_unmap_one is first exec and successfully, mapcount > turn into zero. then we get the pte lock, if src_pte!-dst_pte, it > maybe lead to the dst_pte is from migrate pte to normal pte, while the > normal pte turn into migaret pte,, is right ? I am sorry but I have hard time following your arguments here. Could you be more specific please? > > Agreed. > > > >> Could you try to see and share what happens on your workload with > >> Michal's patch? > > > > Zhong Jiang did you have chance to retest with the BUG_ON changed? Anything for this? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-07-21 7:43 ` + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree Michal Hocko 2016-07-21 8:13 ` Naoya Horiguchi 2016-07-21 10:54 ` zhong jiang @ 2016-07-29 11:27 ` Michal Hocko 2016-07-30 6:33 ` zhong jiang 2 siblings, 1 reply; 26+ messages in thread From: Michal Hocko @ 2016-07-29 11:27 UTC (permalink / raw) To: akpm Cc: zhongjiang, qiuxishi, vbabka, mm-commits, Mike Kravetz, Naoya Horiguchi, Mel Gorman, linux-mm On Thu 21-07-16 09:43:40, Michal Hocko wrote: > We have further discussed the patch and I believe it is not correct. See [1]. > I am proposing the following alternative. Andrew, please drop the mm-hugetlb-fix-race-when-migrate-pages.patch. It is clearly racy. Whether the BUG_ON update is really the right and sufficient fix is not 100% clear yet and we are waiting for Zhong Jiang testing. > [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz > --- > From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mhocko@suse.com> > Date: Thu, 21 Jul 2016 09:28:13 +0200 > Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON > > Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he > runs his database load with memory online and offline running in > parallel. The reason is that huge_pmd_share might detect a shared pmd > which is currently migrated and so it has migration pte which is > !pte_huge. > > There doesn't seem to be any easy way to prevent from the race and in > fact seeing the migration swap entry is not harmful. Both callers of > huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range > will copy the swap entry and make it COW if needed. hugetlb_fault will > back off and so the page fault is retries if the page is still under > migration and waits for its completion in hugetlb_fault. > > That means that the BUG_ON is wrong and we should update it. Let's > simply check that all present ptes are pte_huge instead. > > Reported-by: zhongjiang <zhongjiang@huawei.com> > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > mm/hugetlb.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 34379d653aa3..31dd2b8b86b3 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4303,7 +4303,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, > pte = (pte_t *)pmd_alloc(mm, pud, addr); > } > } > - BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte)); > + BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte)); > > return pte; > } > -- > 2.8.1 > > -- > Michal Hocko > SUSE Labs -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-07-29 11:27 ` Michal Hocko @ 2016-07-30 6:33 ` zhong jiang 2016-08-01 11:02 ` Michal Hocko 0 siblings, 1 reply; 26+ messages in thread From: zhong jiang @ 2016-07-30 6:33 UTC (permalink / raw) To: Michal Hocko Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz, Naoya Horiguchi, Mel Gorman, linux-mm On 2016/7/29 19:27, Michal Hocko wrote: > On Thu 21-07-16 09:43:40, Michal Hocko wrote: >> We have further discussed the patch and I believe it is not correct. See [1]. >> I am proposing the following alternative. > Andrew, please drop the mm-hugetlb-fix-race-when-migrate-pages.patch. It > is clearly racy. Whether the BUG_ON update is really the right and > sufficient fix is not 100% clear yet and we are waiting for Zhong Jiang > testing. The issue is very hard to recur. Without attaching any patch to kernel code. up to now, it still not happens to it. Thanks zhongjiang >> [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz >> --- >> From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001 >> From: Michal Hocko <mhocko@suse.com> >> Date: Thu, 21 Jul 2016 09:28:13 +0200 >> Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON >> >> Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he >> runs his database load with memory online and offline running in >> parallel. The reason is that huge_pmd_share might detect a shared pmd >> which is currently migrated and so it has migration pte which is >> !pte_huge. >> >> There doesn't seem to be any easy way to prevent from the race and in >> fact seeing the migration swap entry is not harmful. Both callers of >> huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range >> will copy the swap entry and make it COW if needed. hugetlb_fault will >> back off and so the page fault is retries if the page is still under >> migration and waits for its completion in hugetlb_fault. >> >> That means that the BUG_ON is wrong and we should update it. Let's >> simply check that all present ptes are pte_huge instead. >> >> Reported-by: zhongjiang <zhongjiang@huawei.com> >> Signed-off-by: Michal Hocko <mhocko@suse.com> >> --- >> mm/hugetlb.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index 34379d653aa3..31dd2b8b86b3 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -4303,7 +4303,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, >> pte = (pte_t *)pmd_alloc(mm, pud, addr); >> } >> } >> - BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte)); >> + BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte)); >> >> return pte; >> } >> -- >> 2.8.1 >> >> -- >> Michal Hocko >> SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-07-30 6:33 ` zhong jiang @ 2016-08-01 11:02 ` Michal Hocko 2016-08-01 15:04 ` zhong jiang 0 siblings, 1 reply; 26+ messages in thread From: Michal Hocko @ 2016-08-01 11:02 UTC (permalink / raw) To: zhong jiang Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz, Naoya Horiguchi, Mel Gorman, linux-mm On Sat 30-07-16 14:33:18, zhong jiang wrote: > On 2016/7/29 19:27, Michal Hocko wrote: > > On Thu 21-07-16 09:43:40, Michal Hocko wrote: > >> We have further discussed the patch and I believe it is not correct. See [1]. > >> I am proposing the following alternative. > > Andrew, please drop the mm-hugetlb-fix-race-when-migrate-pages.patch. It > > is clearly racy. Whether the BUG_ON update is really the right and > > sufficient fix is not 100% clear yet and we are waiting for Zhong Jiang > > testing. > > The issue is very hard to recur. Without attaching any patch to > kernel code. up to now, it still not happens to it. Hmm, OK. So what do you propose? Are you OK with the BUG_ON change or do you think that this needs a deeper fix? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-08-01 11:02 ` Michal Hocko @ 2016-08-01 15:04 ` zhong jiang 2016-08-01 15:31 ` Michal Hocko 0 siblings, 1 reply; 26+ messages in thread From: zhong jiang @ 2016-08-01 15:04 UTC (permalink / raw) To: Michal Hocko Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz, Naoya Horiguchi, Mel Gorman, linux-mm On 2016/8/1 19:02, Michal Hocko wrote: > On Sat 30-07-16 14:33:18, zhong jiang wrote: >> On 2016/7/29 19:27, Michal Hocko wrote: >>> On Thu 21-07-16 09:43:40, Michal Hocko wrote: >>>> We have further discussed the patch and I believe it is not correct. See [1]. >>>> I am proposing the following alternative. >>> Andrew, please drop the mm-hugetlb-fix-race-when-migrate-pages.patch. It >>> is clearly racy. Whether the BUG_ON update is really the right and >>> sufficient fix is not 100% clear yet and we are waiting for Zhong Jiang >>> testing. >> The issue is very hard to recur. Without attaching any patch to >> kernel code. up to now, it still not happens to it. > Hmm, OK. So what do you propose? Are you OK with the BUG_ON change or do > you think that this needs a deeper fix? yes, I agree with your change. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree 2016-08-01 15:04 ` zhong jiang @ 2016-08-01 15:31 ` Michal Hocko 0 siblings, 0 replies; 26+ messages in thread From: Michal Hocko @ 2016-08-01 15:31 UTC (permalink / raw) To: akpm Cc: zhong jiang, qiuxishi, vbabka, mm-commits, Mike Kravetz, Naoya Horiguchi, Mel Gorman, linux-mm On Mon 01-08-16 23:04:01, zhong jiang wrote: > On 2016/8/1 19:02, Michal Hocko wrote: > > On Sat 30-07-16 14:33:18, zhong jiang wrote: > >> On 2016/7/29 19:27, Michal Hocko wrote: > >>> On Thu 21-07-16 09:43:40, Michal Hocko wrote: > >>>> We have further discussed the patch and I believe it is not correct. See [1]. > >>>> I am proposing the following alternative. > >>> Andrew, please drop the mm-hugetlb-fix-race-when-migrate-pages.patch. It > >>> is clearly racy. Whether the BUG_ON update is really the right and > >>> sufficient fix is not 100% clear yet and we are waiting for Zhong Jiang > >>> testing. > >> The issue is very hard to recur. Without attaching any patch to > >> kernel code. up to now, it still not happens to it. > > Hmm, OK. So what do you propose? Are you OK with the BUG_ON change or do > > you think that this needs a deeper fix? > > yes, I agree with your change. OK, Andrew, could you merge http://lkml.kernel.org/r/20160721074340.GA26398@dhcp22.suse.cz with ack from Naoya http://lkml.kernel.org/r/20160721081355.GB25398@hori1.linux.bs1.fc.nec.co.jp Thanks! -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <003701d1e328$202ca9d0$6085fd70$@alibaba-inc.com>]
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree [not found] <003701d1e328$202ca9d0$6085fd70$@alibaba-inc.com> @ 2016-07-21 8:19 ` Hillf Danton 0 siblings, 0 replies; 26+ messages in thread From: Hillf Danton @ 2016-07-21 8:19 UTC (permalink / raw) To: Michal Hocko; +Cc: 'zhongjiang', linux-kernel, linux-mm, Andrew Morton > > From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mhocko@suse.com> > Date: Thu, 21 Jul 2016 09:28:13 +0200 > Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON > > Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he > runs his database load with memory online and offline running in > parallel. The reason is that huge_pmd_share might detect a shared pmd > which is currently migrated and so it has migration pte which is > !pte_huge. > > There doesn't seem to be any easy way to prevent from the race and in > fact seeing the migration swap entry is not harmful. Both callers of > huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range > will copy the swap entry and make it COW if needed. hugetlb_fault will > back off and so the page fault is retries if the page is still under > migration and waits for its completion in hugetlb_fault. > > That means that the BUG_ON is wrong and we should update it. Let's > simply check that all present ptes are pte_huge instead. > > Reported-by: zhongjiang <zhongjiang@huawei.com> > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> > mm/hugetlb.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 34379d653aa3..31dd2b8b86b3 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4303,7 +4303,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, > pte = (pte_t *)pmd_alloc(mm, pud, addr); > } > } > - BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte)); > + BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte)); > > return pte; > } > -- > 2.8.1 > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2016-08-01 15:31 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <578eb28b.YbRUDGz5RloTVlrE%akpm@linux-foundation.org> 2016-07-21 7:43 ` + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree Michal Hocko 2016-07-21 8:13 ` Naoya Horiguchi 2016-07-21 10:29 ` Michal Hocko 2016-07-21 10:54 ` zhong jiang 2016-07-21 11:27 ` Michal Hocko 2016-07-21 12:14 ` zhong jiang 2016-07-21 12:30 ` Michal Hocko 2016-07-21 12:45 ` zhong jiang 2016-07-21 12:55 ` Michal Hocko 2016-07-21 13:25 ` zhong jiang 2016-07-21 13:40 ` Michal Hocko 2016-07-21 13:58 ` zhong jiang 2016-07-21 14:01 ` Michal Hocko 2016-07-21 14:13 ` zhong jiang 2016-07-21 14:27 ` Michal Hocko 2016-07-21 14:33 ` zhong jiang 2016-07-22 7:17 ` Naoya Horiguchi 2016-07-26 7:58 ` Michal Hocko 2016-07-26 14:04 ` zhong jiang 2016-07-27 14:44 ` Michal Hocko 2016-07-29 11:27 ` Michal Hocko 2016-07-30 6:33 ` zhong jiang 2016-08-01 11:02 ` Michal Hocko 2016-08-01 15:04 ` zhong jiang 2016-08-01 15:31 ` Michal Hocko [not found] <003701d1e328$202ca9d0$6085fd70$@alibaba-inc.com> 2016-07-21 8:19 ` Hillf Danton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).