linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] mm/hugetlb: fix race when migrate pages
@ 2016-07-19 13:04 zhongjiang
  0 siblings, 0 replies; 9+ messages in thread
From: zhongjiang @ 2016-07-19 13:04 UTC (permalink / raw)
  To: mhocko, vbabka, qiuxishi, akpm; +Cc: linux-mm

From: zhong jiang <zhongjiang@huawei.com>

I hit the following code in huge_pte_alloc when run the database and
online-offline memory in the system.

BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));

when pmd share function enable, we may be obtain a shared pmd entry.
due to ongoing offline memory , the pmd entry points to the page will
turn into migrate condition. therefore, the bug will come up.

The patch fix it by checking the pmd entry when we obtain the lock.
if the shared pmd entry points to page is under migration. we should
allocate a new pmd entry.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 mm/hugetlb.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6384dfd..3454051 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4213,7 +4213,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
 	struct vm_area_struct *svma;
 	unsigned long saddr;
 	pte_t *spte = NULL;
-	pte_t *pte;
+	pte_t *pte, entry;
 	spinlock_t *ptl;
 
 	if (!vma_shareable(vma, addr))
@@ -4240,6 +4240,11 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
 
 	ptl = huge_pte_lockptr(hstate_vma(vma), mm, spte);
 	spin_lock(ptl);
+	entry = huge_ptep_get(spte);
+ 	if (is_hugetlb_entry_migration(entry) || 
+			is_hugetlb_entry_hwpoisoned(entry)) {
+		goto out_unlock;
+	}	
 	if (pud_none(*pud)) {
 		pud_populate(mm, pud,
 				(pmd_t *)((unsigned long)spte & PAGE_MASK));
@@ -4247,6 +4252,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
 		put_page(virt_to_page(spte));
 		mm_dec_nr_pmds(mm);
 	}
+out_unlock:
 	spin_unlock(ptl);
 out:
 	pte = (pte_t *)pmd_alloc(mm, pud, addr);
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] mm/hugetlb: fix race when migrate pages
  2016-07-20 13:24           ` Michal Hocko
@ 2016-07-21  9:25             ` zhong jiang
  0 siblings, 0 replies; 9+ messages in thread
From: zhong jiang @ 2016-07-21  9:25 UTC (permalink / raw)
  To: Michal Hocko
  Cc: vbabka, qiuxishi, akpm, linux-mm, Mike Kravetz, Naoya Horiguchi,
	Mel Gorman

On 2016/7/20 21:24, Michal Hocko wrote:
> [Sorry for spammin, I though my headache would force me to not stare
>  into the code more but I couldn't resist ]
> On Wed 20-07-16 15:00:55, Michal Hocko wrote:
>> On Wed 20-07-16 14:45:01, Michal Hocko wrote:
>> [...]
>>> I was talking to Mel (CCed) and he has raised a good question. So if you
>>> encounter a page under migration and fail to share the pmd with it how
>>> can you have a proper content of the target page in the end?
>> Hmm, I was staring into the code some more and it seems this would be OK
>> because we should hit hugetlb_no_page with the newel instantiated pmd
>> and associate it with a page from the radix tree. So unless I am missing
>> something the corruption shouldn't be possible.
>>
>> I believe the post pmd_populate race is still there, though, so I
>> believe the approach should be rethought.
> So I think the swap entry is OK in fact. huge_pte_alloc would return a
> shared pmd which would just happen to be swap entry. hugetlb_fault would
> then recognize that by:
> 	/*
> 	 * entry could be a migration/hwpoison entry at this point, so this
> 	 * check prevents the kernel from going below assuming that we have
> 	 * a active hugepage in pagecache. This goto expects the 2nd page fault,
> 	 * and is_hugetlb_entry_(migration|hwpoisoned) check will properly
> 	 * handle it.
> 	 */
> 	if (!pte_present(entry))
> 		goto out_mutex;
>
> We would get back from the page fault and if the page was still under
> migration on the retry we would end up waiting for the migration entry
> on the next fault.
>
> 	ptep = huge_pte_offset(mm, address);
> 	if (ptep) {
> 		entry = huge_ptep_get(ptep);
> 		if (unlikely(is_hugetlb_entry_migration(entry))) {
> 			migration_entry_wait_huge(vma, mm, ptep);
> 			return 0;
>
> Or am I missing something? If not we should just reconsider the BUG_ON
> but I still have to think wether it is useful/needed at all.
  you are right. I agree.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] mm/hugetlb: fix race when migrate pages
  2016-07-20 13:00         ` Michal Hocko
@ 2016-07-20 13:24           ` Michal Hocko
  2016-07-21  9:25             ` zhong jiang
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2016-07-20 13:24 UTC (permalink / raw)
  To: zhong jiang
  Cc: vbabka, qiuxishi, akpm, linux-mm, Mike Kravetz, Naoya Horiguchi,
	Mel Gorman

[Sorry for spammin, I though my headache would force me to not stare
 into the code more but I couldn't resist ]
On Wed 20-07-16 15:00:55, Michal Hocko wrote:
> On Wed 20-07-16 14:45:01, Michal Hocko wrote:
> [...]
> > I was talking to Mel (CCed) and he has raised a good question. So if you
> > encounter a page under migration and fail to share the pmd with it how
> > can you have a proper content of the target page in the end?
> 
> Hmm, I was staring into the code some more and it seems this would be OK
> because we should hit hugetlb_no_page with the newel instantiated pmd
> and associate it with a page from the radix tree. So unless I am missing
> something the corruption shouldn't be possible.
> 
> I believe the post pmd_populate race is still there, though, so I
> believe the approach should be rethought.

So I think the swap entry is OK in fact. huge_pte_alloc would return a
shared pmd which would just happen to be swap entry. hugetlb_fault would
then recognize that by:
	/*
	 * entry could be a migration/hwpoison entry at this point, so this
	 * check prevents the kernel from going below assuming that we have
	 * a active hugepage in pagecache. This goto expects the 2nd page fault,
	 * and is_hugetlb_entry_(migration|hwpoisoned) check will properly
	 * handle it.
	 */
	if (!pte_present(entry))
		goto out_mutex;

We would get back from the page fault and if the page was still under
migration on the retry we would end up waiting for the migration entry
on the next fault.

	ptep = huge_pte_offset(mm, address);
	if (ptep) {
		entry = huge_ptep_get(ptep);
		if (unlikely(is_hugetlb_entry_migration(entry))) {
			migration_entry_wait_huge(vma, mm, ptep);
			return 0;

Or am I missing something? If not we should just reconsider the BUG_ON
but I still have to think wether it is useful/needed at all.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] mm/hugetlb: fix race when migrate pages
  2016-07-20 12:45       ` Michal Hocko
@ 2016-07-20 13:00         ` Michal Hocko
  2016-07-20 13:24           ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2016-07-20 13:00 UTC (permalink / raw)
  To: zhong jiang
  Cc: vbabka, qiuxishi, akpm, linux-mm, Mike Kravetz, Naoya Horiguchi,
	Mel Gorman

On Wed 20-07-16 14:45:01, Michal Hocko wrote:
[...]
> I was talking to Mel (CCed) and he has raised a good question. So if you
> encounter a page under migration and fail to share the pmd with it how
> can you have a proper content of the target page in the end?

Hmm, I was staring into the code some more and it seems this would be OK
because we should hit hugetlb_no_page with the newel instantiated pmd
and associate it with a page from the radix tree. So unless I am missing
something the corruption shouldn't be possible.

I believe the post pmd_populate race is still there, though, so I
believe the approach should be rethought.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] mm/hugetlb: fix race when migrate pages
  2016-07-20 12:16     ` Michal Hocko
@ 2016-07-20 12:45       ` Michal Hocko
  2016-07-20 13:00         ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2016-07-20 12:45 UTC (permalink / raw)
  To: zhong jiang
  Cc: vbabka, qiuxishi, akpm, linux-mm, Mike Kravetz, Naoya Horiguchi,
	Mel Gorman

On Wed 20-07-16 14:16:45, Michal Hocko wrote:
> On Wed 20-07-16 18:03:40, zhong jiang wrote:
> > On 2016/7/20 15:38, Michal Hocko wrote:
> > > [CC Mike and Naoya]
> > > On Tue 19-07-16 21:45:58, zhongjiang wrote:
> > >> From: zhong jiang <zhongjiang@huawei.com>
> > >>
> > >> I hit the following code in huge_pte_alloc when run the database and
> > >> online-offline memory in the system.
> > >>
> > >> BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
> > >>
> > >> when pmd share function enable, we may be obtain a shared pmd entry.
> > >> due to ongoing offline memory , the pmd entry points to the page will
> > >> turn into migrate condition. therefore, the bug will come up.
> > >>
> > >> The patch fix it by checking the pmd entry when we obtain the lock.
> > >> if the shared pmd entry points to page is under migration. we should
> > >> allocate a new pmd entry.
> > >
> > > I am still not 100% sure this is correct. Does huge_pte_lockptr work
> > > properly for the migration swapentry?
> 
> What about this part?
> 
> > > If yes and we populate the pud
> > > with a migration entry then is it really bad/harmful (other than hitting
> > > the BUG_ON which might be update to handle that case)? This might be a
> > > stupid question, sorry about that, but I have really problem to grasp
> > > the whole issue properly and the changelog didn't help me much. I would
> > > really appreciate some clarification here. The pmd sharing code is clear
> > > as mud and adding new tweaks there doesn't sound like it would make it
> > > more clear.
> >
> >     ok, Maybe the following explain will better.
> >     cpu0                                                                      cpu1
> >     try_to_unmap_one                                 huge_pmd_share                             
> >         page_check_address                                 huge_pte_lockptr
> >                       spin_lock                                                        
> >        (page entry can be set to migrate or
> >        Posion )      
> >  
> >          pte_unmap_unlock
> >                                                                             spin_lock
> >                                                                             (page entry have changed)
> 
> Well, but this is just one possible race. We can also race after
> pud_populate:
> 
> cpu0				cpu1
> try_to_unmap_on			huge_pmd_share
>   page_check_address		  huge_pte_lockptr
>   				    spin_lock
> 				    pud_populate
> 				    spin_unlock
>   spin_lock(ptl)
>   set_migration_entry
>   spun_unlock()
>   				  pmd_alloc # we will get migration entry
> 
> So unless I am missing something here we either have to be able to cope
> with migration entries somehow already or we need some additional care
> for shared pmd entries (aka shared pud page) for this to behave
> properly.

I was talking to Mel (CCed) and he has raised a good question. So if you
encounter a page under migration and fail to share the pmd with it how
can you have a proper content of the target page in the end? So not only
the patch doesn't catch the other race I believe it can lead to a
corruption (which is my fault because your previous version did retry
rather than fail). But anyway the primary question is whether seeing
migration entry here is really a problem and just the BUG_ON needs to be
fixed.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] mm/hugetlb: fix race when migrate pages
  2016-07-20 10:03   ` zhong jiang
@ 2016-07-20 12:16     ` Michal Hocko
  2016-07-20 12:45       ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2016-07-20 12:16 UTC (permalink / raw)
  To: zhong jiang
  Cc: vbabka, qiuxishi, akpm, linux-mm, Mike Kravetz, Naoya Horiguchi

On Wed 20-07-16 18:03:40, zhong jiang wrote:
> On 2016/7/20 15:38, Michal Hocko wrote:
> > [CC Mike and Naoya]
> > On Tue 19-07-16 21:45:58, zhongjiang wrote:
> >> From: zhong jiang <zhongjiang@huawei.com>
> >>
> >> I hit the following code in huge_pte_alloc when run the database and
> >> online-offline memory in the system.
> >>
> >> BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
> >>
> >> when pmd share function enable, we may be obtain a shared pmd entry.
> >> due to ongoing offline memory , the pmd entry points to the page will
> >> turn into migrate condition. therefore, the bug will come up.
> >>
> >> The patch fix it by checking the pmd entry when we obtain the lock.
> >> if the shared pmd entry points to page is under migration. we should
> >> allocate a new pmd entry.
> >
> > I am still not 100% sure this is correct. Does huge_pte_lockptr work
> > properly for the migration swapentry?

What about this part?

> > If yes and we populate the pud
> > with a migration entry then is it really bad/harmful (other than hitting
> > the BUG_ON which might be update to handle that case)? This might be a
> > stupid question, sorry about that, but I have really problem to grasp
> > the whole issue properly and the changelog didn't help me much. I would
> > really appreciate some clarification here. The pmd sharing code is clear
> > as mud and adding new tweaks there doesn't sound like it would make it
> > more clear.
>
>     ok, Maybe the following explain will better.
>     cpu0                                                                      cpu1
>     try_to_unmap_one                                 huge_pmd_share                             
>         page_check_address                                 huge_pte_lockptr
>                       spin_lock                                                        
>        (page entry can be set to migrate or
>        Posion )      
>  
>          pte_unmap_unlock
>                                                                             spin_lock
>                                                                             (page entry have changed)

Well, but this is just one possible race. We can also race after
pud_populate:

cpu0				cpu1
try_to_unmap_on			huge_pmd_share
  page_check_address		  huge_pte_lockptr
  				    spin_lock
				    pud_populate
				    spin_unlock
  spin_lock(ptl)
  set_migration_entry
  spun_unlock()
  				  pmd_alloc # we will get migration entry

So unless I am missing something here we either have to be able to cope
with migration entries somehow already or we need some additional care
for shared pmd entries (aka shared pud page) for this to behave
properly.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] mm/hugetlb: fix race when migrate pages
  2016-07-20  7:38 ` Michal Hocko
@ 2016-07-20 10:03   ` zhong jiang
  2016-07-20 12:16     ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: zhong jiang @ 2016-07-20 10:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: vbabka, qiuxishi, akpm, linux-mm, Mike Kravetz, Naoya Horiguchi

On 2016/7/20 15:38, Michal Hocko wrote:
> [CC Mike and Naoya]
> On Tue 19-07-16 21:45:58, zhongjiang wrote:
>> From: zhong jiang <zhongjiang@huawei.com>
>>
>> I hit the following code in huge_pte_alloc when run the database and
>> online-offline memory in the system.
>>
>> BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
>>
>> when pmd share function enable, we may be obtain a shared pmd entry.
>> due to ongoing offline memory , the pmd entry points to the page will
>> turn into migrate condition. therefore, the bug will come up.
>>
>> The patch fix it by checking the pmd entry when we obtain the lock.
>> if the shared pmd entry points to page is under migration. we should
>> allocate a new pmd entry.
> I am still not 100% sure this is correct. Does huge_pte_lockptr work
> properly for the migration swapentry? If yes and we populate the pud
> with a migration entry then is it really bad/harmful (other than hitting
> the BUG_ON which might be update to handle that case)? This might be a
> stupid question, sorry about that, but I have really problem to grasp
> the whole issue properly and the changelog didn't help me much. I would
> really appreciate some clarification here. The pmd sharing code is clear
> as mud and adding new tweaks there doesn't sound like it would make it
> more clear.
    ok, Maybe the following explain will better.
    cpu0                                                                      cpu1
    try_to_unmap_one                                 huge_pmd_share                             
        page_check_address                                 huge_pte_lockptr
                      spin_lock                                                        
       (page entry can be set to migrate or
       Posion )      
 
         pte_unmap_unlock
                                                                            spin_lock
                                                                            (page entry have changed)                                                                  
> Also is the hwpoison check really needed?
  Yes,  page can be posion before spin_lock in try_to_unmap_one, so we can see that
   it will also set page entry to hwpoison entry if PageHWPoison(page) is true.
>> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
>> ---
>>  mm/hugetlb.c | 9 ++++++++-
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 6384dfd..797db55 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -4213,7 +4213,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
>>  	struct vm_area_struct *svma;
>>  	unsigned long saddr;
>>  	pte_t *spte = NULL;
>> -	pte_t *pte;
>> +	pte_t *pte, entry;
>>  	spinlock_t *ptl;
>>  
>>  	if (!vma_shareable(vma, addr))
>> @@ -4240,6 +4240,11 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
>>  
>>  	ptl = huge_pte_lockptr(hstate_vma(vma), mm, spte);
>>  	spin_lock(ptl);
>> +	entry = huge_ptep_get(spte);
>> +	if (is_hugetlb_entry_migration(entry) ||
>> +			is_hugetlb_entry_hwpoisoned(entry)) {
>> +		goto out_unlock;
>> +	}
>>  	if (pud_none(*pud)) {
>>  		pud_populate(mm, pud,
>>  				(pmd_t *)((unsigned long)spte & PAGE_MASK));
>> @@ -4247,6 +4252,8 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
>>  		put_page(virt_to_page(spte));
>>  		mm_dec_nr_pmds(mm);
>>  	}
>> +
>> +out_unlock:
>>  	spin_unlock(ptl);
>>  out:
>>  	pte = (pte_t *)pmd_alloc(mm, pud, addr);
>> -- 
>> 1.8.3.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] mm/hugetlb: fix race when migrate pages
  2016-07-19 13:45 zhongjiang
@ 2016-07-20  7:38 ` Michal Hocko
  2016-07-20 10:03   ` zhong jiang
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2016-07-20  7:38 UTC (permalink / raw)
  To: zhongjiang
  Cc: vbabka, qiuxishi, akpm, linux-mm, Mike Kravetz, Naoya Horiguchi

[CC Mike and Naoya]
On Tue 19-07-16 21:45:58, zhongjiang wrote:
> From: zhong jiang <zhongjiang@huawei.com>
> 
> I hit the following code in huge_pte_alloc when run the database and
> online-offline memory in the system.
> 
> BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
> 
> when pmd share function enable, we may be obtain a shared pmd entry.
> due to ongoing offline memory , the pmd entry points to the page will
> turn into migrate condition. therefore, the bug will come up.
> 
> The patch fix it by checking the pmd entry when we obtain the lock.
> if the shared pmd entry points to page is under migration. we should
> allocate a new pmd entry.

I am still not 100% sure this is correct. Does huge_pte_lockptr work
properly for the migration swapentry? If yes and we populate the pud
with a migration entry then is it really bad/harmful (other than hitting
the BUG_ON which might be update to handle that case)? This might be a
stupid question, sorry about that, but I have really problem to grasp
the whole issue properly and the changelog didn't help me much. I would
really appreciate some clarification here. The pmd sharing code is clear
as mud and adding new tweaks there doesn't sound like it would make it
more clear.

Also is the hwpoison check really needed?

> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
> ---
>  mm/hugetlb.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 6384dfd..797db55 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4213,7 +4213,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
>  	struct vm_area_struct *svma;
>  	unsigned long saddr;
>  	pte_t *spte = NULL;
> -	pte_t *pte;
> +	pte_t *pte, entry;
>  	spinlock_t *ptl;
>  
>  	if (!vma_shareable(vma, addr))
> @@ -4240,6 +4240,11 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
>  
>  	ptl = huge_pte_lockptr(hstate_vma(vma), mm, spte);
>  	spin_lock(ptl);
> +	entry = huge_ptep_get(spte);
> +	if (is_hugetlb_entry_migration(entry) ||
> +			is_hugetlb_entry_hwpoisoned(entry)) {
> +		goto out_unlock;
> +	}
>  	if (pud_none(*pud)) {
>  		pud_populate(mm, pud,
>  				(pmd_t *)((unsigned long)spte & PAGE_MASK));
> @@ -4247,6 +4252,8 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
>  		put_page(virt_to_page(spte));
>  		mm_dec_nr_pmds(mm);
>  	}
> +
> +out_unlock:
>  	spin_unlock(ptl);
>  out:
>  	pte = (pte_t *)pmd_alloc(mm, pud, addr);
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] mm/hugetlb: fix race when migrate pages
@ 2016-07-19 13:45 zhongjiang
  2016-07-20  7:38 ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: zhongjiang @ 2016-07-19 13:45 UTC (permalink / raw)
  To: mhocko, vbabka, qiuxishi, akpm; +Cc: linux-mm

From: zhong jiang <zhongjiang@huawei.com>

I hit the following code in huge_pte_alloc when run the database and
online-offline memory in the system.

BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));

when pmd share function enable, we may be obtain a shared pmd entry.
due to ongoing offline memory , the pmd entry points to the page will
turn into migrate condition. therefore, the bug will come up.

The patch fix it by checking the pmd entry when we obtain the lock.
if the shared pmd entry points to page is under migration. we should
allocate a new pmd entry.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 mm/hugetlb.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6384dfd..797db55 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4213,7 +4213,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
 	struct vm_area_struct *svma;
 	unsigned long saddr;
 	pte_t *spte = NULL;
-	pte_t *pte;
+	pte_t *pte, entry;
 	spinlock_t *ptl;
 
 	if (!vma_shareable(vma, addr))
@@ -4240,6 +4240,11 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
 
 	ptl = huge_pte_lockptr(hstate_vma(vma), mm, spte);
 	spin_lock(ptl);
+	entry = huge_ptep_get(spte);
+	if (is_hugetlb_entry_migration(entry) ||
+			is_hugetlb_entry_hwpoisoned(entry)) {
+		goto out_unlock;
+	}
 	if (pud_none(*pud)) {
 		pud_populate(mm, pud,
 				(pmd_t *)((unsigned long)spte & PAGE_MASK));
@@ -4247,6 +4252,8 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
 		put_page(virt_to_page(spte));
 		mm_dec_nr_pmds(mm);
 	}
+
+out_unlock:
 	spin_unlock(ptl);
 out:
 	pte = (pte_t *)pmd_alloc(mm, pud, addr);
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-07-21  9:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-19 13:04 [PATCH v2] mm/hugetlb: fix race when migrate pages zhongjiang
2016-07-19 13:45 zhongjiang
2016-07-20  7:38 ` Michal Hocko
2016-07-20 10:03   ` zhong jiang
2016-07-20 12:16     ` Michal Hocko
2016-07-20 12:45       ` Michal Hocko
2016-07-20 13:00         ` Michal Hocko
2016-07-20 13:24           ` Michal Hocko
2016-07-21  9:25             ` zhong jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).