linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
       [not found] <578eb28b.YbRUDGz5RloTVlrE%akpm@linux-foundation.org>
@ 2016-07-21  7:43 ` Michal Hocko
  2016-07-21  8:13   ` Naoya Horiguchi
                     ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Michal Hocko @ 2016-07-21  7:43 UTC (permalink / raw)
  To: akpm
  Cc: zhongjiang, qiuxishi, vbabka, mm-commits, Mike Kravetz,
	Naoya Horiguchi, Mel Gorman, linux-mm

We have further discussed the patch and I believe it is not correct. See [1].
I am proposing the following alternative.

[1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz
---

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-07-21  7:43 ` + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree Michal Hocko
@ 2016-07-21  8:13   ` Naoya Horiguchi
  2016-07-21 10:29     ` Michal Hocko
  2016-07-21 10:54   ` zhong jiang
  2016-07-29 11:27   ` Michal Hocko
  2 siblings, 1 reply; 26+ messages in thread
From: Naoya Horiguchi @ 2016-07-21  8:13 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, zhongjiang, qiuxishi, vbabka, mm-commits, Mike Kravetz,
	Mel Gorman, linux-mm

On Thu, Jul 21, 2016 at 09:43:40AM +0200, Michal Hocko wrote:
> We have further discussed the patch and I believe it is not correct. See [1].
> I am proposing the following alternative.
>
> [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz
> ---
> From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 21 Jul 2016 09:28:13 +0200
> Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON
>
> Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he
> runs his database load with memory online and offline running in
> parallel. The reason is that huge_pmd_share might detect a shared pmd
> which is currently migrated and so it has migration pte which is
> !pte_huge.
>
> There doesn't seem to be any easy way to prevent from the race and in
> fact seeing the migration swap entry is not harmful. Both callers of
> huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range
> will copy the swap entry and make it COW if needed. hugetlb_fault will
> back off and so the page fault is retries if the page is still under
> migration and waits for its completion in hugetlb_fault.
>
> That means that the BUG_ON is wrong and we should update it. Let's
> simply check that all present ptes are pte_huge instead.
>
> Reported-by: zhongjiang <zhongjiang@huawei.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

In the early days of hugetlb, we had an assumption that !pte_none is
equivalent to pmd_present() because there was no valid non-present entry
on huge_pte. Situation has changed by hugepage migration and/or hwpoison,
so we have to care about the separation of these two, and make sure that
pte_present is true before checking pte_huge.

So I think this change is right. Thank you Zhong, Michal.

Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-07-21  8:13   ` Naoya Horiguchi
@ 2016-07-21 10:29     ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2016-07-21 10:29 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: akpm, zhongjiang, qiuxishi, vbabka, mm-commits, Mike Kravetz,
	Mel Gorman, linux-mm

On Thu 21-07-16 08:13:55, Naoya Horiguchi wrote:
> On Thu, Jul 21, 2016 at 09:43:40AM +0200, Michal Hocko wrote:
> > We have further discussed the patch and I believe it is not correct. See [1].
> > I am proposing the following alternative.
> >
> > [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz
> > ---
> > From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Thu, 21 Jul 2016 09:28:13 +0200
> > Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON
> >
> > Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he
> > runs his database load with memory online and offline running in
> > parallel. The reason is that huge_pmd_share might detect a shared pmd
> > which is currently migrated and so it has migration pte which is
> > !pte_huge.
> >
> > There doesn't seem to be any easy way to prevent from the race and in
> > fact seeing the migration swap entry is not harmful. Both callers of
> > huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range
> > will copy the swap entry and make it COW if needed. hugetlb_fault will
> > back off and so the page fault is retries if the page is still under
> > migration and waits for its completion in hugetlb_fault.
> >
> > That means that the BUG_ON is wrong and we should update it. Let's
> > simply check that all present ptes are pte_huge instead.
> >
> > Reported-by: zhongjiang <zhongjiang@huawei.com>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> In the early days of hugetlb, we had an assumption that !pte_none is
> equivalent to pmd_present() because there was no valid non-present entry
> on huge_pte. Situation has changed by hugepage migration and/or hwpoison,
> so we have to care about the separation of these two, and make sure that
> pte_present is true before checking pte_huge.
> 
> So I think this change is right. Thank you Zhong, Michal.
> 
> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Thank you for double checking Naoya!

IIUC
Fixes: 290408d4a250 ("hugetlb: hugepage migration core")

should help. Maybe we should even tag that for stable?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-07-21  7:43 ` + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree Michal Hocko
  2016-07-21  8:13   ` Naoya Horiguchi
@ 2016-07-21 10:54   ` zhong jiang
  2016-07-21 11:27     ` Michal Hocko
  2016-07-29 11:27   ` Michal Hocko
  2 siblings, 1 reply; 26+ messages in thread
From: zhong jiang @ 2016-07-21 10:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz,
	Naoya Horiguchi, Mel Gorman, linux-mm

On 2016/7/21 15:43, Michal Hocko wrote:
> We have further discussed the patch and I believe it is not correct. See [1].
> I am proposing the following alternative.
>
> [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz
> ---
> >From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 21 Jul 2016 09:28:13 +0200
> Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON
>
> Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he
> runs his database load with memory online and offline running in
> parallel. The reason is that huge_pmd_share might detect a shared pmd
> which is currently migrated and so it has migration pte which is
> !pte_huge.
>
> There doesn't seem to be any easy way to prevent from the race and in
> fact seeing the migration swap entry is not harmful. Both callers of
> huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range
> will copy the swap entry and make it COW if needed. hugetlb_fault will
> back off and so the page fault is retries if the page is still under
> migration and waits for its completion in hugetlb_fault.
>
> That means that the BUG_ON is wrong and we should update it. Let's
> simply check that all present ptes are pte_huge instead.
>
> Reported-by: zhongjiang <zhongjiang@huawei.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/hugetlb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 34379d653aa3..31dd2b8b86b3 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4303,7 +4303,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>  				pte = (pte_t *)pmd_alloc(mm, pud, addr);
>  		}
>  	}
> -	BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
> +	BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte));
>  
>  	return pte;
>  }
  I don't think that the patch can fix the question.   The explain is as follow.

               cpu0                                                                                      cpu1
  copy_hugetlb_page_range                                                       try_to_unmap_one
             huge_pte_alloc  #pmd may be shared                           
             lock dst_pte     #dst_pte may be migrate                    
            lock src_pte     #src_pte may be normal pt1       
           set_huge_pte_at    #dst_pte points to normal
           spin_unlock (src_pt1)
                                                                                                          lock src_pte
           spin_unlock(dst_pt1)                                                          set src_pte migrate entry
                                                                                                         spin_unlock(src_pte)
   *       dst_pte is a normal pte, but corresponding to the
            pfn is under migrate.  it is dangerous.

The race may occur. is right ?  if the scenario exist.  we should think about more.

Thanks
zhongjiang


       
      
 
        
          
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-07-21 10:54   ` zhong jiang
@ 2016-07-21 11:27     ` Michal Hocko
  2016-07-21 12:14       ` zhong jiang
  0 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2016-07-21 11:27 UTC (permalink / raw)
  To: zhong jiang
  Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz,
	Naoya Horiguchi, Mel Gorman, linux-mm

On Thu 21-07-16 18:54:09, zhong jiang wrote:
> On 2016/7/21 15:43, Michal Hocko wrote:
> > We have further discussed the patch and I believe it is not correct. See [1].
> > I am proposing the following alternative.
> >
> > [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz
> > ---
> > >From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Thu, 21 Jul 2016 09:28:13 +0200
> > Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON
> >
> > Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he
> > runs his database load with memory online and offline running in
> > parallel. The reason is that huge_pmd_share might detect a shared pmd
> > which is currently migrated and so it has migration pte which is
> > !pte_huge.
> >
> > There doesn't seem to be any easy way to prevent from the race and in
> > fact seeing the migration swap entry is not harmful. Both callers of
> > huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range
> > will copy the swap entry and make it COW if needed. hugetlb_fault will
> > back off and so the page fault is retries if the page is still under
> > migration and waits for its completion in hugetlb_fault.
> >
> > That means that the BUG_ON is wrong and we should update it. Let's
> > simply check that all present ptes are pte_huge instead.
> >
> > Reported-by: zhongjiang <zhongjiang@huawei.com>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  mm/hugetlb.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 34379d653aa3..31dd2b8b86b3 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -4303,7 +4303,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
> >  				pte = (pte_t *)pmd_alloc(mm, pud, addr);
> >  		}
> >  	}
> > -	BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
> > +	BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte));
> >  
> >  	return pte;
> >  }
>
>   I don't think that the patch can fix the question.   The explain is as follow.
> 
>                cpu0                                                                                      cpu1
> copy_hugetlb_page_range                                                       try_to_unmap_one
>              huge_pte_alloc  #pmd may be shared                           
>              lock dst_pte     #dst_pte may be migrate                    
>             lock src_pte     #src_pte may be normal pt1       
>            set_huge_pte_at    #dst_pte points to normal
>            spin_unlock (src_pt1)
>                                                                                                           lock src_pte
>            spin_unlock(dst_pt1)                                                          set src_pte migrate entry
>                                                                                                          spin_unlock(src_pte)
>    *       dst_pte is a normal pte, but corresponding to the
>             pfn is under migrate.  it is dangerous.
> 
> The race may occur. is right ?  if the scenario exist.  we should think about more.

Can this happen at all? copy_hugetlb_page_range does the following to
rule out shared page table entries. At least that is my understanding of
c5c99429fa57 ("fix hugepages leak due to pagetable page sharing")

		/* If the pagetables are shared don't copy or take references */
		if (dst_pte == src_pte)
			continue;
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-07-21 11:27     ` Michal Hocko
@ 2016-07-21 12:14       ` zhong jiang
  2016-07-21 12:30         ` Michal Hocko
  0 siblings, 1 reply; 26+ messages in thread
From: zhong jiang @ 2016-07-21 12:14 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz,
	Naoya Horiguchi, Mel Gorman, linux-mm

On 2016/7/21 19:27, Michal Hocko wrote:
> On Thu 21-07-16 18:54:09, zhong jiang wrote:
>> On 2016/7/21 15:43, Michal Hocko wrote:
>>> We have further discussed the patch and I believe it is not correct. See [1].
>>> I am proposing the following alternative.
>>>
>>> [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz
>>> ---
>>> >From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001
>>> From: Michal Hocko <mhocko@suse.com>
>>> Date: Thu, 21 Jul 2016 09:28:13 +0200
>>> Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON
>>>
>>> Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he
>>> runs his database load with memory online and offline running in
>>> parallel. The reason is that huge_pmd_share might detect a shared pmd
>>> which is currently migrated and so it has migration pte which is
>>> !pte_huge.
>>>
>>> There doesn't seem to be any easy way to prevent from the race and in
>>> fact seeing the migration swap entry is not harmful. Both callers of
>>> huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range
>>> will copy the swap entry and make it COW if needed. hugetlb_fault will
>>> back off and so the page fault is retries if the page is still under
>>> migration and waits for its completion in hugetlb_fault.
>>>
>>> That means that the BUG_ON is wrong and we should update it. Let's
>>> simply check that all present ptes are pte_huge instead.
>>>
>>> Reported-by: zhongjiang <zhongjiang@huawei.com>
>>> Signed-off-by: Michal Hocko <mhocko@suse.com>
>>> ---
>>>  mm/hugetlb.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index 34379d653aa3..31dd2b8b86b3 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -4303,7 +4303,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>>>  				pte = (pte_t *)pmd_alloc(mm, pud, addr);
>>>  		}
>>>  	}
>>> -	BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
>>> +	BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte));
>>>  
>>>  	return pte;
>>>  }
>>   I don't think that the patch can fix the question.   The explain is as follow.
>>
>>                cpu0                                                                                      cpu1
>> copy_hugetlb_page_range                                                       try_to_unmap_one
>>              huge_pte_alloc  #pmd may be shared                           
>>              lock dst_pte     #dst_pte may be migrate                    
>>             lock src_pte     #src_pte may be normal pt1       
>>            set_huge_pte_at    #dst_pte points to normal
>>            spin_unlock (src_pt1)
>>                                                                                                           lock src_pte
>>            spin_unlock(dst_pt1)                                                          set src_pte migrate entry
>>                                                                                                          spin_unlock(src_pte)
>>    *       dst_pte is a normal pte, but corresponding to the
>>             pfn is under migrate.  it is dangerous.
>>
>> The race may occur. is right ?  if the scenario exist.  we should think about more.
> Can this happen at all? copy_hugetlb_page_range does the following to
> rule out shared page table entries. At least that is my understanding of
> c5c99429fa57 ("fix hugepages leak due to pagetable page sharing")
>
> 		/* If the pagetables are shared don't copy or take references */
> 		if (dst_pte == src_pte)
> 			continue;
  vm_file points to mapping should be shared,  I am not sure, if it is so,  the  possibility is exist.
  of course, src_pte is the same as the dst_pte.

  when dst_pte is migrate entry and src pte is normal entry,  if  try_to_unmap_one is successful,
   then exec copy_hugetlb_page_range,   it will lead to the dst_pte is under dangerous. 
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-07-21 12:14       ` zhong jiang
@ 2016-07-21 12:30         ` Michal Hocko
  2016-07-21 12:45           ` zhong jiang
  0 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2016-07-21 12:30 UTC (permalink / raw)
  To: zhong jiang
  Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz,
	Naoya Horiguchi, Mel Gorman, linux-mm

On Thu 21-07-16 20:14:41, zhong jiang wrote:
> On 2016/7/21 19:27, Michal Hocko wrote:
> > On Thu 21-07-16 18:54:09, zhong jiang wrote:
> >> On 2016/7/21 15:43, Michal Hocko wrote:
> >>> We have further discussed the patch and I believe it is not correct. See [1].
> >>> I am proposing the following alternative.
> >>>
> >>> [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz
> >>> ---
> >>> >From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001
> >>> From: Michal Hocko <mhocko@suse.com>
> >>> Date: Thu, 21 Jul 2016 09:28:13 +0200
> >>> Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON
> >>>
> >>> Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he
> >>> runs his database load with memory online and offline running in
> >>> parallel. The reason is that huge_pmd_share might detect a shared pmd
> >>> which is currently migrated and so it has migration pte which is
> >>> !pte_huge.
> >>>
> >>> There doesn't seem to be any easy way to prevent from the race and in
> >>> fact seeing the migration swap entry is not harmful. Both callers of
> >>> huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range
> >>> will copy the swap entry and make it COW if needed. hugetlb_fault will
> >>> back off and so the page fault is retries if the page is still under
> >>> migration and waits for its completion in hugetlb_fault.
> >>>
> >>> That means that the BUG_ON is wrong and we should update it. Let's
> >>> simply check that all present ptes are pte_huge instead.
> >>>
> >>> Reported-by: zhongjiang <zhongjiang@huawei.com>
> >>> Signed-off-by: Michal Hocko <mhocko@suse.com>
> >>> ---
> >>>  mm/hugetlb.c | 2 +-
> >>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >>> index 34379d653aa3..31dd2b8b86b3 100644
> >>> --- a/mm/hugetlb.c
> >>> +++ b/mm/hugetlb.c
> >>> @@ -4303,7 +4303,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
> >>>  				pte = (pte_t *)pmd_alloc(mm, pud, addr);
> >>>  		}
> >>>  	}
> >>> -	BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
> >>> +	BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte));
> >>>  
> >>>  	return pte;
> >>>  }
> >>   I don't think that the patch can fix the question.   The explain is as follow.
> >>
> >>                cpu0                                                                                      cpu1
> >> copy_hugetlb_page_range                                                       try_to_unmap_one
> >>              huge_pte_alloc  #pmd may be shared                           
> >>              lock dst_pte     #dst_pte may be migrate                    
> >>             lock src_pte     #src_pte may be normal pt1       
> >>            set_huge_pte_at    #dst_pte points to normal
> >>            spin_unlock (src_pt1)
> >>                                                                                                           lock src_pte
> >>            spin_unlock(dst_pt1)                                                          set src_pte migrate entry
> >>                                                                                                          spin_unlock(src_pte)
> >>    *       dst_pte is a normal pte, but corresponding to the
> >>             pfn is under migrate.  it is dangerous.
> >>
> >> The race may occur. is right ?  if the scenario exist.  we should think about more.
> > Can this happen at all? copy_hugetlb_page_range does the following to
> > rule out shared page table entries. At least that is my understanding of
> > c5c99429fa57 ("fix hugepages leak due to pagetable page sharing")
> >
> > 		/* If the pagetables are shared don't copy or take references */
> > 		if (dst_pte == src_pte)
> > 			continue;
> 
> vm_file points to mapping should be shared, I am not sure, if it is
> so, the possibility is exist. of course, src_pte is the same as the
> dst_pte.

I am not sure I understand. This is a fork path where the ptes are
copied over from the parent to the child. So how would vm_file differ?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-07-21 12:30         ` Michal Hocko
@ 2016-07-21 12:45           ` zhong jiang
  2016-07-21 12:55             ` Michal Hocko
  0 siblings, 1 reply; 26+ messages in thread
From: zhong jiang @ 2016-07-21 12:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz,
	Naoya Horiguchi, Mel Gorman, linux-mm

On 2016/7/21 20:30, Michal Hocko wrote:
> On Thu 21-07-16 20:14:41, zhong jiang wrote:
>> On 2016/7/21 19:27, Michal Hocko wrote:
>>> On Thu 21-07-16 18:54:09, zhong jiang wrote:
>>>> On 2016/7/21 15:43, Michal Hocko wrote:
>>>>> We have further discussed the patch and I believe it is not correct. See [1].
>>>>> I am proposing the following alternative.
>>>>>
>>>>> [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz
>>>>> ---
>>>>> >From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001
>>>>> From: Michal Hocko <mhocko@suse.com>
>>>>> Date: Thu, 21 Jul 2016 09:28:13 +0200
>>>>> Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON
>>>>>
>>>>> Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he
>>>>> runs his database load with memory online and offline running in
>>>>> parallel. The reason is that huge_pmd_share might detect a shared pmd
>>>>> which is currently migrated and so it has migration pte which is
>>>>> !pte_huge.
>>>>>
>>>>> There doesn't seem to be any easy way to prevent from the race and in
>>>>> fact seeing the migration swap entry is not harmful. Both callers of
>>>>> huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range
>>>>> will copy the swap entry and make it COW if needed. hugetlb_fault will
>>>>> back off and so the page fault is retries if the page is still under
>>>>> migration and waits for its completion in hugetlb_fault.
>>>>>
>>>>> That means that the BUG_ON is wrong and we should update it. Let's
>>>>> simply check that all present ptes are pte_huge instead.
>>>>>
>>>>> Reported-by: zhongjiang <zhongjiang@huawei.com>
>>>>> Signed-off-by: Michal Hocko <mhocko@suse.com>
>>>>> ---
>>>>>  mm/hugetlb.c | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>>> index 34379d653aa3..31dd2b8b86b3 100644
>>>>> --- a/mm/hugetlb.c
>>>>> +++ b/mm/hugetlb.c
>>>>> @@ -4303,7 +4303,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>>>>>  				pte = (pte_t *)pmd_alloc(mm, pud, addr);
>>>>>  		}
>>>>>  	}
>>>>> -	BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
>>>>> +	BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte));
>>>>>  
>>>>>  	return pte;
>>>>>  }
>>>>   I don't think that the patch can fix the question.   The explain is as follow.
>>>>
>>>>                cpu0                                                                                      cpu1
>>>> copy_hugetlb_page_range                                                       try_to_unmap_one
>>>>              huge_pte_alloc  #pmd may be shared                           
>>>>              lock dst_pte     #dst_pte may be migrate                    
>>>>             lock src_pte     #src_pte may be normal pt1       
>>>>            set_huge_pte_at    #dst_pte points to normal
>>>>            spin_unlock (src_pt1)
>>>>                                                                                                           lock src_pte
>>>>            spin_unlock(dst_pt1)                                                          set src_pte migrate entry
>>>>                                                                                                          spin_unlock(src_pte)
>>>>    *       dst_pte is a normal pte, but corresponding to the
>>>>             pfn is under migrate.  it is dangerous.
>>>>
>>>> The race may occur. is right ?  if the scenario exist.  we should think about more.
>>> Can this happen at all? copy_hugetlb_page_range does the following to
>>> rule out shared page table entries. At least that is my understanding of
>>> c5c99429fa57 ("fix hugepages leak due to pagetable page sharing")
>>>
>>> 		/* If the pagetables are shared don't copy or take references */
>>> 		if (dst_pte == src_pte)
>>> 			continue;
>> vm_file points to mapping should be shared, I am not sure, if it is
>> so, the possibility is exist. of course, src_pte is the same as the
>> dst_pte.
> I am not sure I understand. This is a fork path where the ptes are
> copied over from the parent to the child. So how would vm_file differ?
  I think you can misunderstand my meaning.  A file refers to the mapping field can be shared by other process,
  parent process have the mapping , but is not only.  This is only my viewpoint. is right ??

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-07-21 12:45           ` zhong jiang
@ 2016-07-21 12:55             ` Michal Hocko
  2016-07-21 13:25               ` zhong jiang
  0 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2016-07-21 12:55 UTC (permalink / raw)
  To: zhong jiang
  Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz,
	Naoya Horiguchi, Mel Gorman, linux-mm

On Thu 21-07-16 20:45:15, zhong jiang wrote:
> On 2016/7/21 20:30, Michal Hocko wrote:
> > On Thu 21-07-16 20:14:41, zhong jiang wrote:
> >> On 2016/7/21 19:27, Michal Hocko wrote:
> >>> On Thu 21-07-16 18:54:09, zhong jiang wrote:
> >>>> On 2016/7/21 15:43, Michal Hocko wrote:
> >>>>> We have further discussed the patch and I believe it is not correct. See [1].
> >>>>> I am proposing the following alternative.
> >>>>>
> >>>>> [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz
> >>>>> ---
> >>>>> >From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001
> >>>>> From: Michal Hocko <mhocko@suse.com>
> >>>>> Date: Thu, 21 Jul 2016 09:28:13 +0200
> >>>>> Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON
> >>>>>
> >>>>> Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he
> >>>>> runs his database load with memory online and offline running in
> >>>>> parallel. The reason is that huge_pmd_share might detect a shared pmd
> >>>>> which is currently migrated and so it has migration pte which is
> >>>>> !pte_huge.
> >>>>>
> >>>>> There doesn't seem to be any easy way to prevent from the race and in
> >>>>> fact seeing the migration swap entry is not harmful. Both callers of
> >>>>> huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range
> >>>>> will copy the swap entry and make it COW if needed. hugetlb_fault will
> >>>>> back off and so the page fault is retries if the page is still under
> >>>>> migration and waits for its completion in hugetlb_fault.
> >>>>>
> >>>>> That means that the BUG_ON is wrong and we should update it. Let's
> >>>>> simply check that all present ptes are pte_huge instead.
> >>>>>
> >>>>> Reported-by: zhongjiang <zhongjiang@huawei.com>
> >>>>> Signed-off-by: Michal Hocko <mhocko@suse.com>
> >>>>> ---
> >>>>>  mm/hugetlb.c | 2 +-
> >>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >>>>> index 34379d653aa3..31dd2b8b86b3 100644
> >>>>> --- a/mm/hugetlb.c
> >>>>> +++ b/mm/hugetlb.c
> >>>>> @@ -4303,7 +4303,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
> >>>>>  				pte = (pte_t *)pmd_alloc(mm, pud, addr);
> >>>>>  		}
> >>>>>  	}
> >>>>> -	BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
> >>>>> +	BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte));
> >>>>>  
> >>>>>  	return pte;
> >>>>>  }
> >>>>   I don't think that the patch can fix the question.   The explain is as follow.
> >>>>
> >>>>                cpu0                                                                                      cpu1
> >>>> copy_hugetlb_page_range                                                       try_to_unmap_one
> >>>>              huge_pte_alloc  #pmd may be shared                           
> >>>>              lock dst_pte     #dst_pte may be migrate                    
> >>>>             lock src_pte     #src_pte may be normal pt1       
> >>>>            set_huge_pte_at    #dst_pte points to normal
> >>>>            spin_unlock (src_pt1)
> >>>>                                                                                                           lock src_pte
> >>>>            spin_unlock(dst_pt1)                                                          set src_pte migrate entry
> >>>>                                                                                                          spin_unlock(src_pte)
> >>>>    *       dst_pte is a normal pte, but corresponding to the
> >>>>             pfn is under migrate.  it is dangerous.
> >>>>
> >>>> The race may occur. is right ?  if the scenario exist.  we should think about more.
> >>> Can this happen at all? copy_hugetlb_page_range does the following to
> >>> rule out shared page table entries. At least that is my understanding of
> >>> c5c99429fa57 ("fix hugepages leak due to pagetable page sharing")
> >>>
> >>> 		/* If the pagetables are shared don't copy or take references */
> >>> 		if (dst_pte == src_pte)
> >>> 			continue;
> >> vm_file points to mapping should be shared, I am not sure, if it is
> >> so, the possibility is exist. of course, src_pte is the same as the
> >> dst_pte.
> > I am not sure I understand. This is a fork path where the ptes are
> > copied over from the parent to the child. So how would vm_file differ?
>
> I think you can misunderstand my meaning.  A file refers to the
> mapping field can be shared by other process, parent process have the
> mapping , but is not only.  This is only my viewpoint. is right ??

OK, now I understand what you mean. So you mean that a different process
initiates the migration while this path copies to pte. That is certainly
possible but I still fail to see what is the problem about that.
huge_pte_alloc will return the identical pte whether it is regular or
migration one. So what exactly is the problem?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-07-21 12:55             ` Michal Hocko
@ 2016-07-21 13:25               ` zhong jiang
  2016-07-21 13:40                 ` Michal Hocko
  0 siblings, 1 reply; 26+ messages in thread
From: zhong jiang @ 2016-07-21 13:25 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz,
	Naoya Horiguchi, Mel Gorman, linux-mm

On 2016/7/21 20:55, Michal Hocko wrote:
> On Thu 21-07-16 20:45:15, zhong jiang wrote:
>> On 2016/7/21 20:30, Michal Hocko wrote:
>>> On Thu 21-07-16 20:14:41, zhong jiang wrote:
>>>> On 2016/7/21 19:27, Michal Hocko wrote:
>>>>> On Thu 21-07-16 18:54:09, zhong jiang wrote:
>>>>>> On 2016/7/21 15:43, Michal Hocko wrote:
>>>>>>> We have further discussed the patch and I believe it is not correct. See [1].
>>>>>>> I am proposing the following alternative.
>>>>>>>
>>>>>>> [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz
>>>>>>> ---
>>>>>>> >From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001
>>>>>>> From: Michal Hocko <mhocko@suse.com>
>>>>>>> Date: Thu, 21 Jul 2016 09:28:13 +0200
>>>>>>> Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON
>>>>>>>
>>>>>>> Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he
>>>>>>> runs his database load with memory online and offline running in
>>>>>>> parallel. The reason is that huge_pmd_share might detect a shared pmd
>>>>>>> which is currently migrated and so it has migration pte which is
>>>>>>> !pte_huge.
>>>>>>>
>>>>>>> There doesn't seem to be any easy way to prevent from the race and in
>>>>>>> fact seeing the migration swap entry is not harmful. Both callers of
>>>>>>> huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range
>>>>>>> will copy the swap entry and make it COW if needed. hugetlb_fault will
>>>>>>> back off and so the page fault is retries if the page is still under
>>>>>>> migration and waits for its completion in hugetlb_fault.
>>>>>>>
>>>>>>> That means that the BUG_ON is wrong and we should update it. Let's
>>>>>>> simply check that all present ptes are pte_huge instead.
>>>>>>>
>>>>>>> Reported-by: zhongjiang <zhongjiang@huawei.com>
>>>>>>> Signed-off-by: Michal Hocko <mhocko@suse.com>
>>>>>>> ---
>>>>>>>  mm/hugetlb.c | 2 +-
>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>>>>> index 34379d653aa3..31dd2b8b86b3 100644
>>>>>>> --- a/mm/hugetlb.c
>>>>>>> +++ b/mm/hugetlb.c
>>>>>>> @@ -4303,7 +4303,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>>>>>>>  				pte = (pte_t *)pmd_alloc(mm, pud, addr);
>>>>>>>  		}
>>>>>>>  	}
>>>>>>> -	BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
>>>>>>> +	BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte));
>>>>>>>  
>>>>>>>  	return pte;
>>>>>>>  }
>>>>>>   I don't think that the patch can fix the question.   The explain is as follow.
>>>>>>
>>>>>>                cpu0                                                                                      cpu1
>>>>>> copy_hugetlb_page_range                                                       try_to_unmap_one
>>>>>>              huge_pte_alloc  #pmd may be shared                           
>>>>>>              lock dst_pte     #dst_pte may be migrate                    
>>>>>>             lock src_pte     #src_pte may be normal pt1       
>>>>>>            set_huge_pte_at    #dst_pte points to normal
>>>>>>            spin_unlock (src_pt1)
>>>>>>                                                                                                           lock src_pte
>>>>>>            spin_unlock(dst_pt1)                                                          set src_pte migrate entry
>>>>>>                                                                                                          spin_unlock(src_pte)
>>>>>>    *       dst_pte is a normal pte, but corresponding to the
>>>>>>             pfn is under migrate.  it is dangerous.
>>>>>>
>>>>>> The race may occur. is right ?  if the scenario exist.  we should think about more.
>>>>> Can this happen at all? copy_hugetlb_page_range does the following to
>>>>> rule out shared page table entries. At least that is my understanding of
>>>>> c5c99429fa57 ("fix hugepages leak due to pagetable page sharing")
>>>>>
>>>>> 		/* If the pagetables are shared don't copy or take references */
>>>>> 		if (dst_pte == src_pte)
>>>>> 			continue;
>>>> vm_file points to mapping should be shared, I am not sure, if it is
>>>> so, the possibility is exist. of course, src_pte is the same as the
>>>> dst_pte.
>>> I am not sure I understand. This is a fork path where the ptes are
>>> copied over from the parent to the child. So how would vm_file differ?
>> I think you can misunderstand my meaning.  A file refers to the
>> mapping field can be shared by other process, parent process have the
>> mapping , but is not only.  This is only my viewpoint. is right ??
> OK, now I understand what you mean. So you mean that a different process
> initiates the migration while this path copies to pte. That is certainly
> possible but I still fail to see what is the problem about that.
> huge_pte_alloc will return the identical pte whether it is regular or
> migration one. So what exactly is the problem?
>
  copy_hugetlb_page_range obtain the shared dst_pte,  it may be not equal to  the src_pte.
  The dst_pte can come from other process sharing the mapping.    

		/* If the pagetables are shared don't copy or take references */
		if (dst_pte == src_pte)
			continue;
 
 Even it do the fork path, we scan the i_mmap to find same pte. I think that dst_pte
 may come from other process. It is not the parent. it will lead to the dst_pte is not 
 equal to the src_pte from the parent. 
     
    vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) {


is right ? 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-07-21 13:25               ` zhong jiang
@ 2016-07-21 13:40                 ` Michal Hocko
  2016-07-21 13:58                   ` zhong jiang
  0 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2016-07-21 13:40 UTC (permalink / raw)
  To: Naoya Horiguchi, zhong jiang
  Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz, Mel Gorman, linux-mm

On Thu 21-07-16 21:25:38, zhong jiang wrote:
> On 2016/7/21 20:55, Michal Hocko wrote:
[...]
> > OK, now I understand what you mean. So you mean that a different process
> > initiates the migration while this path copies to pte. That is certainly
> > possible but I still fail to see what is the problem about that.
> > huge_pte_alloc will return the identical pte whether it is regular or
> > migration one. So what exactly is the problem?
> >
> copy_hugetlb_page_range obtain the shared dst_pte, it may be not equal
> to the src_pte.  The dst_pte can come from other process sharing the
> mapping.

So you mean that the parent doesn't have the shared pte while the child
would get one?
 
> 		/* If the pagetables are shared don't copy or take references */
> 		if (dst_pte == src_pte)
> 			continue;
>  
> Even it do the fork path, we scan the i_mmap to find same pte. I think
> that dst_pte may come from other process. It is not the parent. it
> will lead to the dst_pte is not equal to the src_pte from the parent.

Let's say this would be possible (I am not really sure but for the sake
of argumentation), if the src is not shared while dst is shared and the
page is under migration then all the page table should be marked as
swap migrate entries no? If they are not and copy_hugetlb_page_range
cannot handle with that then it is a bug in copy_hugetlb_page_range
which doesn't have anything to do with the BUG_ON in  huge_pte_alloc.
So I would argue that if the problem exists at all it is a separate
issue IMHO.

Naoya, could you comment on that please?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-07-21 13:40                 ` Michal Hocko
@ 2016-07-21 13:58                   ` zhong jiang
  2016-07-21 14:01                     ` Michal Hocko
  0 siblings, 1 reply; 26+ messages in thread
From: zhong jiang @ 2016-07-21 13:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Naoya Horiguchi, akpm, qiuxishi, vbabka, mm-commits,
	Mike Kravetz, Mel Gorman, linux-mm

On 2016/7/21 21:40, Michal Hocko wrote:
> On Thu 21-07-16 21:25:38, zhong jiang wrote:
>> On 2016/7/21 20:55, Michal Hocko wrote:
> [...]
>>> OK, now I understand what you mean. So you mean that a different process
>>> initiates the migration while this path copies to pte. That is certainly
>>> possible but I still fail to see what is the problem about that.
>>> huge_pte_alloc will return the identical pte whether it is regular or
>>> migration one. So what exactly is the problem?
>>>
>> copy_hugetlb_page_range obtain the shared dst_pte, it may be not equal
>> to the src_pte.  The dst_pte can come from other process sharing the
>> mapping.
> So you mean that the parent doesn't have the shared pte while the child
> would get one?
>  
   no,  parent must have the shared pte because the the child copy the parent.  but parent is
  not the only source pte we can get.  when we scan the maping->i_mmap, firstly ,it can obtain
  a shared pte from other process.   but I am not sure.
>> 		/* If the pagetables are shared don't copy or take references */
>> 		if (dst_pte == src_pte)
>> 			continue;
>>  
>> Even it do the fork path, we scan the i_mmap to find same pte. I think
>> that dst_pte may come from other process. It is not the parent. it
>> will lead to the dst_pte is not equal to the src_pte from the parent.
> Let's say this would be possible (I am not really sure but for the sake
> of argumentation), if the src is not shared while dst is shared and the
> page is under migration then all the page table should be marked as
> swap migrate entries no? If they are not and copy_hugetlb_page_range
> cannot handle with that then it is a bug in copy_hugetlb_page_range
> which doesn't have anything to do with the BUG_ON in  huge_pte_alloc.
> So I would argue that if the problem exists at all it is a separate
> issue IMHO.
  yes,  it is a separate issule.
> Naoya, could you comment on that please?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-07-21 13:58                   ` zhong jiang
@ 2016-07-21 14:01                     ` Michal Hocko
  2016-07-21 14:13                       ` zhong jiang
  0 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2016-07-21 14:01 UTC (permalink / raw)
  To: zhong jiang
  Cc: Naoya Horiguchi, akpm, qiuxishi, vbabka, mm-commits,
	Mike Kravetz, Mel Gorman, linux-mm

On Thu 21-07-16 21:58:23, zhong jiang wrote:
> On 2016/7/21 21:40, Michal Hocko wrote:
> > On Thu 21-07-16 21:25:38, zhong jiang wrote:
> >> On 2016/7/21 20:55, Michal Hocko wrote:
> > [...]
> >>> OK, now I understand what you mean. So you mean that a different process
> >>> initiates the migration while this path copies to pte. That is certainly
> >>> possible but I still fail to see what is the problem about that.
> >>> huge_pte_alloc will return the identical pte whether it is regular or
> >>> migration one. So what exactly is the problem?
> >>>
> >> copy_hugetlb_page_range obtain the shared dst_pte, it may be not equal
> >> to the src_pte.  The dst_pte can come from other process sharing the
> >> mapping.
> > So you mean that the parent doesn't have the shared pte while the child
> > would get one?
> >  
>  no, parent must have the shared pte because the the child copy the
> parent. but parent is not the only source pte we can get. when we
> scan the maping->i_mmap, firstly ,it can obtain a shared pte from
> other process. but I am not sure.

But then all the shared ptes should be identical, no? Or am I missing
something?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-07-21 14:01                     ` Michal Hocko
@ 2016-07-21 14:13                       ` zhong jiang
  2016-07-21 14:27                         ` Michal Hocko
  0 siblings, 1 reply; 26+ messages in thread
From: zhong jiang @ 2016-07-21 14:13 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Naoya Horiguchi, akpm, qiuxishi, vbabka, mm-commits,
	Mike Kravetz, Mel Gorman, linux-mm

On 2016/7/21 22:01, Michal Hocko wrote:
> On Thu 21-07-16 21:58:23, zhong jiang wrote:
>> On 2016/7/21 21:40, Michal Hocko wrote:
>>> On Thu 21-07-16 21:25:38, zhong jiang wrote:
>>>> On 2016/7/21 20:55, Michal Hocko wrote:
>>> [...]
>>>>> OK, now I understand what you mean. So you mean that a different process
>>>>> initiates the migration while this path copies to pte. That is certainly
>>>>> possible but I still fail to see what is the problem about that.
>>>>> huge_pte_alloc will return the identical pte whether it is regular or
>>>>> migration one. So what exactly is the problem?
>>>>>
>>>> copy_hugetlb_page_range obtain the shared dst_pte, it may be not equal
>>>> to the src_pte.  The dst_pte can come from other process sharing the
>>>> mapping.
>>> So you mean that the parent doesn't have the shared pte while the child
>>> would get one?
>>>  
>>  no, parent must have the shared pte because the the child copy the
>> parent. but parent is not the only source pte we can get. when we
>> scan the maping->i_mmap, firstly ,it can obtain a shared pte from
>> other process. but I am not sure.
> But then all the shared ptes should be identical, no? Or am I missing
> something?
 all the shared ptes should be identical, but  there is  a possibility that new process
 want to share the pte from other process ,  other than the parent,  For the first time
 the process is about to share pte with it.   is it possiblity?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-07-21 14:13                       ` zhong jiang
@ 2016-07-21 14:27                         ` Michal Hocko
  2016-07-21 14:33                           ` zhong jiang
  0 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2016-07-21 14:27 UTC (permalink / raw)
  To: zhong jiang
  Cc: Naoya Horiguchi, akpm, qiuxishi, vbabka, mm-commits,
	Mike Kravetz, Mel Gorman, linux-mm

On Thu 21-07-16 22:13:55, zhong jiang wrote:
> On 2016/7/21 22:01, Michal Hocko wrote:
> > On Thu 21-07-16 21:58:23, zhong jiang wrote:
> >> On 2016/7/21 21:40, Michal Hocko wrote:
> >>> On Thu 21-07-16 21:25:38, zhong jiang wrote:
> >>>> On 2016/7/21 20:55, Michal Hocko wrote:
> >>> [...]
> >>>>> OK, now I understand what you mean. So you mean that a different process
> >>>>> initiates the migration while this path copies to pte. That is certainly
> >>>>> possible but I still fail to see what is the problem about that.
> >>>>> huge_pte_alloc will return the identical pte whether it is regular or
> >>>>> migration one. So what exactly is the problem?
> >>>>>
> >>>> copy_hugetlb_page_range obtain the shared dst_pte, it may be not equal
> >>>> to the src_pte.  The dst_pte can come from other process sharing the
> >>>> mapping.
> >>> So you mean that the parent doesn't have the shared pte while the child
> >>> would get one?
> >>>  
> >>  no, parent must have the shared pte because the the child copy the
> >> parent. but parent is not the only source pte we can get. when we
> >> scan the maping->i_mmap, firstly ,it can obtain a shared pte from
> >> other process. but I am not sure.
> > But then all the shared ptes should be identical, no? Or am I missing
> > something?
>  all the shared ptes should be identical, but  there is  a possibility that new process
>  want to share the pte from other process ,  other than the parent,  For the first time
>  the process is about to share pte with it.   is it possiblity?

I do not see how. They are opperating on the same mapping so I really do
not see how different process makes any difference.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-07-21 14:27                         ` Michal Hocko
@ 2016-07-21 14:33                           ` zhong jiang
  2016-07-22  7:17                             ` Naoya Horiguchi
  0 siblings, 1 reply; 26+ messages in thread
From: zhong jiang @ 2016-07-21 14:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Naoya Horiguchi, akpm, qiuxishi, vbabka, mm-commits,
	Mike Kravetz, Mel Gorman, linux-mm

On 2016/7/21 22:27, Michal Hocko wrote:
> On Thu 21-07-16 22:13:55, zhong jiang wrote:
>> On 2016/7/21 22:01, Michal Hocko wrote:
>>> On Thu 21-07-16 21:58:23, zhong jiang wrote:
>>>> On 2016/7/21 21:40, Michal Hocko wrote:
>>>>> On Thu 21-07-16 21:25:38, zhong jiang wrote:
>>>>>> On 2016/7/21 20:55, Michal Hocko wrote:
>>>>> [...]
>>>>>>> OK, now I understand what you mean. So you mean that a different process
>>>>>>> initiates the migration while this path copies to pte. That is certainly
>>>>>>> possible but I still fail to see what is the problem about that.
>>>>>>> huge_pte_alloc will return the identical pte whether it is regular or
>>>>>>> migration one. So what exactly is the problem?
>>>>>>>
>>>>>> copy_hugetlb_page_range obtain the shared dst_pte, it may be not equal
>>>>>> to the src_pte.  The dst_pte can come from other process sharing the
>>>>>> mapping.
>>>>> So you mean that the parent doesn't have the shared pte while the child
>>>>> would get one?
>>>>>  
>>>>  no, parent must have the shared pte because the the child copy the
>>>> parent. but parent is not the only source pte we can get. when we
>>>> scan the maping->i_mmap, firstly ,it can obtain a shared pte from
>>>> other process. but I am not sure.
>>> But then all the shared ptes should be identical, no? Or am I missing
>>> something?
>>  all the shared ptes should be identical, but  there is  a possibility that new process
>>  want to share the pte from other process ,  other than the parent,  For the first time
>>  the process is about to share pte with it.   is it possiblity?
> I do not see how. They are opperating on the same mapping so I really do
> not see how different process makes any difference.
>
   ok , In a words . the new process get the shared pte, The shared pte not come from the parent process.
  so , src_pte is not equal to dst_pte.  because src_pte come from the parent, while dst_pte come from
  other process.    obviously, it is not same. 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-07-21 14:33                           ` zhong jiang
@ 2016-07-22  7:17                             ` Naoya Horiguchi
  2016-07-26  7:58                               ` Michal Hocko
  0 siblings, 1 reply; 26+ messages in thread
From: Naoya Horiguchi @ 2016-07-22  7:17 UTC (permalink / raw)
  To: zhong jiang
  Cc: Michal Hocko, akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz,
	Mel Gorman, linux-mm

On Thu, Jul 21, 2016 at 10:33:47PM +0800, zhong jiang wrote:
> On 2016/7/21 22:27, Michal Hocko wrote:
> > On Thu 21-07-16 22:13:55, zhong jiang wrote:
> >> On 2016/7/21 22:01, Michal Hocko wrote:
> >>> On Thu 21-07-16 21:58:23, zhong jiang wrote:
> >>>> On 2016/7/21 21:40, Michal Hocko wrote:
> >>>>> On Thu 21-07-16 21:25:38, zhong jiang wrote:
> >>>>>> On 2016/7/21 20:55, Michal Hocko wrote:
> >>>>> [...]
> >>>>>>> OK, now I understand what you mean. So you mean that a different process
> >>>>>>> initiates the migration while this path copies to pte. That is certainly
> >>>>>>> possible but I still fail to see what is the problem about that.
> >>>>>>> huge_pte_alloc will return the identical pte whether it is regular or
> >>>>>>> migration one. So what exactly is the problem?
> >>>>>>>
> >>>>>> copy_hugetlb_page_range obtain the shared dst_pte, it may be not equal
> >>>>>> to the src_pte.  The dst_pte can come from other process sharing the
> >>>>>> mapping.
> >>>>> So you mean that the parent doesn't have the shared pte while the child
> >>>>> would get one?
> >>>>>  
> >>>>  no, parent must have the shared pte because the the child copy the
> >>>> parent. but parent is not the only source pte we can get. when we
> >>>> scan the maping->i_mmap, firstly ,it can obtain a shared pte from
> >>>> other process. but I am not sure.
> >>> But then all the shared ptes should be identical, no? Or am I missing
> >>> something?
> >>  all the shared ptes should be identical, but  there is  a possibility that new process
> >>  want to share the pte from other process ,  other than the parent,  For the first time
> >>  the process is about to share pte with it.   is it possiblity?
> > I do not see how. They are opperating on the same mapping so I really do
> > not see how different process makes any difference.
> >
>    ok , In a words . the new process get the shared pte, The shared pte not come from the parent process.
>   so , src_pte is not equal to dst_pte.  because src_pte come from the parent, while dst_pte come from
>   other process.    obviously, it is not same. 

I think that (src_pte != dst_pte) can happen and that's ok if there's no
migration entry.  But even if we have both of normal entry and migration entry
for one hugepage, that still looks fine to me because the running migration
operation fails (because there remains mapcounts on the source hugepage),
and all migration entries are turned back to normal entries pointing to the
source hugepage.

Could you try to see and share what happens on your workload with Michal's patch?
If something weird/critical still happens, let's merge your patch.
# I'm trying to write some test cases for it, but might take some time ...

Thanks,
Naoya Horiguchi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-07-22  7:17                             ` Naoya Horiguchi
@ 2016-07-26  7:58                               ` Michal Hocko
  2016-07-26 14:04                                 ` zhong jiang
  0 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2016-07-26  7:58 UTC (permalink / raw)
  To: zhong jiang, Naoya Horiguchi
  Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz, Mel Gorman, linux-mm

On Fri 22-07-16 07:17:37, Naoya Horiguchi wrote:
[...]
> I think that (src_pte != dst_pte) can happen and that's ok if there's no
> migration entry. 

We have discussed that with Naoya off-list and couldn't find a scenario
when parent would have !shared pmd while child would have it. The only
plausible scenario was that parent created and poppulated mapping smaller
than 1G and then enlarged it later on so the child would see sharedable
pud. This doesn't seem to be possible because vma_merge would bail out
due to VM_SPECIAL check.

> But even if we have both of normal entry and migration entry
> for one hugepage, that still looks fine to me because the running migration
> operation fails (because there remains mapcounts on the source hugepage),
> and all migration entries are turned back to normal entries pointing to the
> source hugepage.

Agreed.

> Could you try to see and share what happens on your workload with
> Michal's patch?

Zhong Jiang did you have chance to retest with the BUG_ON changed?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-07-26  7:58                               ` Michal Hocko
@ 2016-07-26 14:04                                 ` zhong jiang
  2016-07-27 14:44                                   ` Michal Hocko
  0 siblings, 1 reply; 26+ messages in thread
From: zhong jiang @ 2016-07-26 14:04 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Naoya Horiguchi, akpm, qiuxishi, vbabka, mm-commits,
	Mike Kravetz, Mel Gorman, linux-mm

On 2016/7/26 15:58, Michal Hocko wrote:
> On Fri 22-07-16 07:17:37, Naoya Horiguchi wrote:
> [...]
>> I think that (src_pte != dst_pte) can happen and that's ok if there's no
>> migration entry. 
> We have discussed that with Naoya off-list and couldn't find a scenario
> when parent would have !shared pmd while child would have it. The only
> plausible scenario was that parent created and poppulated mapping smaller
> than 1G and then enlarged it later on so the child would see sharedable
> pud. This doesn't seem to be possible because vma_merge would bail out
> due to VM_SPECIAL check.
  I do not understand that the process must have vm_special flags. if vm_special enable,
 the process must not be expanded.  and   what does it matter about vma_merge ??
>> But even if we have both of normal entry and migration entry
>> for one hugepage, that still looks fine to me because the running migration
>> operation fails (because there remains mapcounts on the source hugepage),
>> and all migration entries are turned back to normal entries pointing to the
>> source hugepage.
    In one case,  try_to_unmap_one is first exec and successfully,  mapcount turn into zero.
   then we get the pte lock,  if src_pte!-dst_pte, it maybe lead to the dst_pte is from migrate pte
    to normal pte, while the normal pte turn into migaret pte,, is right ?
>  
> Agreed.
>
>> Could you try to see and share what happens on your workload with
>> Michal's patch?
> Zhong Jiang did you have chance to retest with the BUG_ON changed?
>
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-07-26 14:04                                 ` zhong jiang
@ 2016-07-27 14:44                                   ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2016-07-27 14:44 UTC (permalink / raw)
  To: zhong jiang
  Cc: Naoya Horiguchi, akpm, qiuxishi, vbabka, mm-commits,
	Mike Kravetz, Mel Gorman, linux-mm

On Tue 26-07-16 22:04:16, zhong jiang wrote:
> On 2016/7/26 15:58, Michal Hocko wrote:
> > On Fri 22-07-16 07:17:37, Naoya Horiguchi wrote:
> > [...]
> >> I think that (src_pte != dst_pte) can happen and that's ok if there's no
> >> migration entry. 
> > We have discussed that with Naoya off-list and couldn't find a scenario
> > when parent would have !shared pmd while child would have it. The only
> > plausible scenario was that parent created and poppulated mapping smaller
> > than 1G and then enlarged it later on so the child would see sharedable
> > pud. This doesn't seem to be possible because vma_merge would bail out
> > due to VM_SPECIAL check.

> I do not understand that the process must have vm_special flags. if
> vm_special enable, the process must not be expanded. and what does it
> matter about vma_merge ??

See 
	if (vm_flags & VM_SPECIAL)
		return NULL;

in vma_merge.

> >> But even if we have both of normal entry and migration entry
> >> for one hugepage, that still looks fine to me because the running migration
> >> operation fails (because there remains mapcounts on the source hugepage),
> >> and all migration entries are turned back to normal entries pointing to the
> >> source hugepage.
>
> In one case, try_to_unmap_one is first exec and successfully, mapcount
> turn into zero. then we get the pte lock, if src_pte!-dst_pte, it
> maybe lead to the dst_pte is from migrate pte to normal pte, while the
> normal pte turn into migaret pte,, is right ?

I am sorry but I have hard time following your arguments here. Could you
be more specific please?

> > Agreed.
> >
> >> Could you try to see and share what happens on your workload with
> >> Michal's patch?
> >
> > Zhong Jiang did you have chance to retest with the BUG_ON changed?

Anything for this?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-07-21  7:43 ` + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree Michal Hocko
  2016-07-21  8:13   ` Naoya Horiguchi
  2016-07-21 10:54   ` zhong jiang
@ 2016-07-29 11:27   ` Michal Hocko
  2016-07-30  6:33     ` zhong jiang
  2 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2016-07-29 11:27 UTC (permalink / raw)
  To: akpm
  Cc: zhongjiang, qiuxishi, vbabka, mm-commits, Mike Kravetz,
	Naoya Horiguchi, Mel Gorman, linux-mm

On Thu 21-07-16 09:43:40, Michal Hocko wrote:
> We have further discussed the patch and I believe it is not correct. See [1].
> I am proposing the following alternative.

Andrew, please drop the mm-hugetlb-fix-race-when-migrate-pages.patch. It
is clearly racy. Whether the BUG_ON update is really the right and
sufficient fix is not 100% clear yet and we are waiting for Zhong Jiang
testing.

> [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz
> ---
> From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 21 Jul 2016 09:28:13 +0200
> Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON
> 
> Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he
> runs his database load with memory online and offline running in
> parallel. The reason is that huge_pmd_share might detect a shared pmd
> which is currently migrated and so it has migration pte which is
> !pte_huge.
> 
> There doesn't seem to be any easy way to prevent from the race and in
> fact seeing the migration swap entry is not harmful. Both callers of
> huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range
> will copy the swap entry and make it COW if needed. hugetlb_fault will
> back off and so the page fault is retries if the page is still under
> migration and waits for its completion in hugetlb_fault.
> 
> That means that the BUG_ON is wrong and we should update it. Let's
> simply check that all present ptes are pte_huge instead.
> 
> Reported-by: zhongjiang <zhongjiang@huawei.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/hugetlb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 34379d653aa3..31dd2b8b86b3 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4303,7 +4303,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>  				pte = (pte_t *)pmd_alloc(mm, pud, addr);
>  		}
>  	}
> -	BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
> +	BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte));
>  
>  	return pte;
>  }
> -- 
> 2.8.1
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-07-29 11:27   ` Michal Hocko
@ 2016-07-30  6:33     ` zhong jiang
  2016-08-01 11:02       ` Michal Hocko
  0 siblings, 1 reply; 26+ messages in thread
From: zhong jiang @ 2016-07-30  6:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz,
	Naoya Horiguchi, Mel Gorman, linux-mm

On 2016/7/29 19:27, Michal Hocko wrote:
> On Thu 21-07-16 09:43:40, Michal Hocko wrote:
>> We have further discussed the patch and I believe it is not correct. See [1].
>> I am proposing the following alternative.
> Andrew, please drop the mm-hugetlb-fix-race-when-migrate-pages.patch. It
> is clearly racy. Whether the BUG_ON update is really the right and
> sufficient fix is not 100% clear yet and we are waiting for Zhong Jiang
> testing.
  The issue is very hard  to recur.   Without attaching any patch to kernel code. up to now,
   it still not happens to it.

  Thanks
  zhongjiang
>> [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz
>> ---
>> From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001
>> From: Michal Hocko <mhocko@suse.com>
>> Date: Thu, 21 Jul 2016 09:28:13 +0200
>> Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON
>>
>> Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he
>> runs his database load with memory online and offline running in
>> parallel. The reason is that huge_pmd_share might detect a shared pmd
>> which is currently migrated and so it has migration pte which is
>> !pte_huge.
>>
>> There doesn't seem to be any easy way to prevent from the race and in
>> fact seeing the migration swap entry is not harmful. Both callers of
>> huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range
>> will copy the swap entry and make it COW if needed. hugetlb_fault will
>> back off and so the page fault is retries if the page is still under
>> migration and waits for its completion in hugetlb_fault.
>>
>> That means that the BUG_ON is wrong and we should update it. Let's
>> simply check that all present ptes are pte_huge instead.
>>
>> Reported-by: zhongjiang <zhongjiang@huawei.com>
>> Signed-off-by: Michal Hocko <mhocko@suse.com>
>> ---
>>  mm/hugetlb.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 34379d653aa3..31dd2b8b86b3 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -4303,7 +4303,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>>  				pte = (pte_t *)pmd_alloc(mm, pud, addr);
>>  		}
>>  	}
>> -	BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
>> +	BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte));
>>  
>>  	return pte;
>>  }
>> -- 
>> 2.8.1
>>
>> -- 
>> Michal Hocko
>> SUSE Labs


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-07-30  6:33     ` zhong jiang
@ 2016-08-01 11:02       ` Michal Hocko
  2016-08-01 15:04         ` zhong jiang
  0 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2016-08-01 11:02 UTC (permalink / raw)
  To: zhong jiang
  Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz,
	Naoya Horiguchi, Mel Gorman, linux-mm

On Sat 30-07-16 14:33:18, zhong jiang wrote:
> On 2016/7/29 19:27, Michal Hocko wrote:
> > On Thu 21-07-16 09:43:40, Michal Hocko wrote:
> >> We have further discussed the patch and I believe it is not correct. See [1].
> >> I am proposing the following alternative.
> > Andrew, please drop the mm-hugetlb-fix-race-when-migrate-pages.patch. It
> > is clearly racy. Whether the BUG_ON update is really the right and
> > sufficient fix is not 100% clear yet and we are waiting for Zhong Jiang
> > testing.
>
> The issue is very hard to recur.  Without attaching any patch to
> kernel code. up to now, it still not happens to it.

Hmm, OK. So what do you propose? Are you OK with the BUG_ON change or do
you think that this needs a deeper fix?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-08-01 11:02       ` Michal Hocko
@ 2016-08-01 15:04         ` zhong jiang
  2016-08-01 15:31           ` Michal Hocko
  0 siblings, 1 reply; 26+ messages in thread
From: zhong jiang @ 2016-08-01 15:04 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, qiuxishi, vbabka, mm-commits, Mike Kravetz,
	Naoya Horiguchi, Mel Gorman, linux-mm

On 2016/8/1 19:02, Michal Hocko wrote:
> On Sat 30-07-16 14:33:18, zhong jiang wrote:
>> On 2016/7/29 19:27, Michal Hocko wrote:
>>> On Thu 21-07-16 09:43:40, Michal Hocko wrote:
>>>> We have further discussed the patch and I believe it is not correct. See [1].
>>>> I am proposing the following alternative.
>>> Andrew, please drop the mm-hugetlb-fix-race-when-migrate-pages.patch. It
>>> is clearly racy. Whether the BUG_ON update is really the right and
>>> sufficient fix is not 100% clear yet and we are waiting for Zhong Jiang
>>> testing.
>> The issue is very hard to recur.  Without attaching any patch to
>> kernel code. up to now, it still not happens to it.
> Hmm, OK. So what do you propose? Are you OK with the BUG_ON change or do
> you think that this needs a deeper fix?
  yes,  I  agree  with your change.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
  2016-08-01 15:04         ` zhong jiang
@ 2016-08-01 15:31           ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2016-08-01 15:31 UTC (permalink / raw)
  To: akpm
  Cc: zhong jiang, qiuxishi, vbabka, mm-commits, Mike Kravetz,
	Naoya Horiguchi, Mel Gorman, linux-mm

On Mon 01-08-16 23:04:01, zhong jiang wrote:
> On 2016/8/1 19:02, Michal Hocko wrote:
> > On Sat 30-07-16 14:33:18, zhong jiang wrote:
> >> On 2016/7/29 19:27, Michal Hocko wrote:
> >>> On Thu 21-07-16 09:43:40, Michal Hocko wrote:
> >>>> We have further discussed the patch and I believe it is not correct. See [1].
> >>>> I am proposing the following alternative.
> >>> Andrew, please drop the mm-hugetlb-fix-race-when-migrate-pages.patch. It
> >>> is clearly racy. Whether the BUG_ON update is really the right and
> >>> sufficient fix is not 100% clear yet and we are waiting for Zhong Jiang
> >>> testing.
> >> The issue is very hard to recur.  Without attaching any patch to
> >> kernel code. up to now, it still not happens to it.
> > Hmm, OK. So what do you propose? Are you OK with the BUG_ON change or do
> > you think that this needs a deeper fix?
>
>   yes,  I  agree  with your change.

OK, Andrew, could you merge
http://lkml.kernel.org/r/20160721074340.GA26398@dhcp22.suse.cz with ack
from Naoya
http://lkml.kernel.org/r/20160721081355.GB25398@hori1.linux.bs1.fc.nec.co.jp

Thanks!

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
       [not found] <003701d1e328$202ca9d0$6085fd70$@alibaba-inc.com>
@ 2016-07-21  8:19 ` Hillf Danton
  0 siblings, 0 replies; 26+ messages in thread
From: Hillf Danton @ 2016-07-21  8:19 UTC (permalink / raw)
  To: Michal Hocko; +Cc: 'zhongjiang', linux-kernel, linux-mm, Andrew Morton

> 
> From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 21 Jul 2016 09:28:13 +0200
> Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON
> 
> Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he
> runs his database load with memory online and offline running in
> parallel. The reason is that huge_pmd_share might detect a shared pmd
> which is currently migrated and so it has migration pte which is
> !pte_huge.
> 
> There doesn't seem to be any easy way to prevent from the race and in
> fact seeing the migration swap entry is not harmful. Both callers of
> huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range
> will copy the swap entry and make it COW if needed. hugetlb_fault will
> back off and so the page fault is retries if the page is still under
> migration and waits for its completion in hugetlb_fault.
> 
> That means that the BUG_ON is wrong and we should update it. Let's
> simply check that all present ptes are pte_huge instead.
> 
> Reported-by: zhongjiang <zhongjiang@huawei.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>

>  mm/hugetlb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 34379d653aa3..31dd2b8b86b3 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4303,7 +4303,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>  				pte = (pte_t *)pmd_alloc(mm, pud, addr);
>  		}
>  	}
> -	BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
> +	BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte));
> 
>  	return pte;
>  }
> --
> 2.8.1
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-08-01 15:31 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <578eb28b.YbRUDGz5RloTVlrE%akpm@linux-foundation.org>
2016-07-21  7:43 ` + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree Michal Hocko
2016-07-21  8:13   ` Naoya Horiguchi
2016-07-21 10:29     ` Michal Hocko
2016-07-21 10:54   ` zhong jiang
2016-07-21 11:27     ` Michal Hocko
2016-07-21 12:14       ` zhong jiang
2016-07-21 12:30         ` Michal Hocko
2016-07-21 12:45           ` zhong jiang
2016-07-21 12:55             ` Michal Hocko
2016-07-21 13:25               ` zhong jiang
2016-07-21 13:40                 ` Michal Hocko
2016-07-21 13:58                   ` zhong jiang
2016-07-21 14:01                     ` Michal Hocko
2016-07-21 14:13                       ` zhong jiang
2016-07-21 14:27                         ` Michal Hocko
2016-07-21 14:33                           ` zhong jiang
2016-07-22  7:17                             ` Naoya Horiguchi
2016-07-26  7:58                               ` Michal Hocko
2016-07-26 14:04                                 ` zhong jiang
2016-07-27 14:44                                   ` Michal Hocko
2016-07-29 11:27   ` Michal Hocko
2016-07-30  6:33     ` zhong jiang
2016-08-01 11:02       ` Michal Hocko
2016-08-01 15:04         ` zhong jiang
2016-08-01 15:31           ` Michal Hocko
     [not found] <003701d1e328$202ca9d0$6085fd70$@alibaba-inc.com>
2016-07-21  8:19 ` Hillf Danton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).