All of lore.kernel.org
 help / color / mirror / Atom feed
From: zhong jiang <zhongjiang@huawei.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: akpm@linux-foundation.org, qiuxishi@huawei.com, vbabka@suse.cz,
	mm-commits@vger.kernel.org,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Mel Gorman <mgorman@suse.de>,
	linux-mm@kvack.org
Subject: Re: + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree
Date: Thu, 21 Jul 2016 21:25:38 +0800	[thread overview]
Message-ID: <5790CD52.6050200@huawei.com> (raw)
In-Reply-To: <20160721125555.GJ26379@dhcp22.suse.cz>

On 2016/7/21 20:55, Michal Hocko wrote:
> On Thu 21-07-16 20:45:15, zhong jiang wrote:
>> On 2016/7/21 20:30, Michal Hocko wrote:
>>> On Thu 21-07-16 20:14:41, zhong jiang wrote:
>>>> On 2016/7/21 19:27, Michal Hocko wrote:
>>>>> On Thu 21-07-16 18:54:09, zhong jiang wrote:
>>>>>> On 2016/7/21 15:43, Michal Hocko wrote:
>>>>>>> We have further discussed the patch and I believe it is not correct. See [1].
>>>>>>> I am proposing the following alternative.
>>>>>>>
>>>>>>> [1] http://lkml.kernel.org/r/20160720132431.GM11249@dhcp22.suse.cz
>>>>>>> ---
>>>>>>> >From b1e9b3214f1859fdf7d134cdcb56f5871933539c Mon Sep 17 00:00:00 2001
>>>>>>> From: Michal Hocko <mhocko@suse.com>
>>>>>>> Date: Thu, 21 Jul 2016 09:28:13 +0200
>>>>>>> Subject: [PATCH] mm, hugetlb: fix huge_pte_alloc BUG_ON
>>>>>>>
>>>>>>> Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he
>>>>>>> runs his database load with memory online and offline running in
>>>>>>> parallel. The reason is that huge_pmd_share might detect a shared pmd
>>>>>>> which is currently migrated and so it has migration pte which is
>>>>>>> !pte_huge.
>>>>>>>
>>>>>>> There doesn't seem to be any easy way to prevent from the race and in
>>>>>>> fact seeing the migration swap entry is not harmful. Both callers of
>>>>>>> huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range
>>>>>>> will copy the swap entry and make it COW if needed. hugetlb_fault will
>>>>>>> back off and so the page fault is retries if the page is still under
>>>>>>> migration and waits for its completion in hugetlb_fault.
>>>>>>>
>>>>>>> That means that the BUG_ON is wrong and we should update it. Let's
>>>>>>> simply check that all present ptes are pte_huge instead.
>>>>>>>
>>>>>>> Reported-by: zhongjiang <zhongjiang@huawei.com>
>>>>>>> Signed-off-by: Michal Hocko <mhocko@suse.com>
>>>>>>> ---
>>>>>>>  mm/hugetlb.c | 2 +-
>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>>>>> index 34379d653aa3..31dd2b8b86b3 100644
>>>>>>> --- a/mm/hugetlb.c
>>>>>>> +++ b/mm/hugetlb.c
>>>>>>> @@ -4303,7 +4303,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>>>>>>>  				pte = (pte_t *)pmd_alloc(mm, pud, addr);
>>>>>>>  		}
>>>>>>>  	}
>>>>>>> -	BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
>>>>>>> +	BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte));
>>>>>>>  
>>>>>>>  	return pte;
>>>>>>>  }
>>>>>>   I don't think that the patch can fix the question.   The explain is as follow.
>>>>>>
>>>>>>                cpu0                                                                                      cpu1
>>>>>> copy_hugetlb_page_range                                                       try_to_unmap_one
>>>>>>              huge_pte_alloc  #pmd may be shared                           
>>>>>>              lock dst_pte     #dst_pte may be migrate                    
>>>>>>             lock src_pte     #src_pte may be normal pt1       
>>>>>>            set_huge_pte_at    #dst_pte points to normal
>>>>>>            spin_unlock (src_pt1)
>>>>>>                                                                                                           lock src_pte
>>>>>>            spin_unlock(dst_pt1)                                                          set src_pte migrate entry
>>>>>>                                                                                                          spin_unlock(src_pte)
>>>>>>    *       dst_pte is a normal pte, but corresponding to the
>>>>>>             pfn is under migrate.  it is dangerous.
>>>>>>
>>>>>> The race may occur. is right ?  if the scenario exist.  we should think about more.
>>>>> Can this happen at all? copy_hugetlb_page_range does the following to
>>>>> rule out shared page table entries. At least that is my understanding of
>>>>> c5c99429fa57 ("fix hugepages leak due to pagetable page sharing")
>>>>>
>>>>> 		/* If the pagetables are shared don't copy or take references */
>>>>> 		if (dst_pte == src_pte)
>>>>> 			continue;
>>>> vm_file points to mapping should be shared, I am not sure, if it is
>>>> so, the possibility is exist. of course, src_pte is the same as the
>>>> dst_pte.
>>> I am not sure I understand. This is a fork path where the ptes are
>>> copied over from the parent to the child. So how would vm_file differ?
>> I think you can misunderstand my meaning.  A file refers to the
>> mapping field can be shared by other process, parent process have the
>> mapping , but is not only.  This is only my viewpoint. is right ??
> OK, now I understand what you mean. So you mean that a different process
> initiates the migration while this path copies to pte. That is certainly
> possible but I still fail to see what is the problem about that.
> huge_pte_alloc will return the identical pte whether it is regular or
> migration one. So what exactly is the problem?
>
  copy_hugetlb_page_range obtain the shared dst_pte,  it may be not equal to  the src_pte.
  The dst_pte can come from other process sharing the mapping.    

		/* If the pagetables are shared don't copy or take references */
		if (dst_pte == src_pte)
			continue;
 
 Even it do the fork path, we scan the i_mmap to find same pte. I think that dst_pte
 may come from other process. It is not the parent. it will lead to the dst_pte is not 
 equal to the src_pte from the parent. 
     
    vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) {


is right ? 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-07-21 13:34 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-19 23:06 + mm-hugetlb-fix-race-when-migrate-pages.patch added to -mm tree akpm
2016-07-21  7:43 ` Michal Hocko
2016-07-21  8:13   ` Naoya Horiguchi
2016-07-21 10:29     ` Michal Hocko
2016-07-21 10:54   ` zhong jiang
2016-07-21 11:27     ` Michal Hocko
2016-07-21 12:14       ` zhong jiang
2016-07-21 12:30         ` Michal Hocko
2016-07-21 12:45           ` zhong jiang
2016-07-21 12:55             ` Michal Hocko
2016-07-21 13:25               ` zhong jiang [this message]
2016-07-21 13:40                 ` Michal Hocko
2016-07-21 13:58                   ` zhong jiang
2016-07-21 14:01                     ` Michal Hocko
2016-07-21 14:13                       ` zhong jiang
2016-07-21 14:27                         ` Michal Hocko
2016-07-21 14:33                           ` zhong jiang
2016-07-22  7:17                             ` Naoya Horiguchi
2016-07-26  7:58                               ` Michal Hocko
2016-07-26 14:04                                 ` zhong jiang
2016-07-27 14:44                                   ` Michal Hocko
2016-07-29 11:27   ` Michal Hocko
2016-07-30  6:33     ` zhong jiang
2016-08-01 11:02       ` Michal Hocko
2016-08-01 15:04         ` zhong jiang
2016-08-01 15:31           ` Michal Hocko
     [not found] <003701d1e328$202ca9d0$6085fd70$@alibaba-inc.com>
2016-07-21  8:19 ` Hillf Danton
2016-07-21  8:19   ` Hillf Danton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5790CD52.6050200@huawei.com \
    --to=zhongjiang@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=mm-commits@vger.kernel.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=qiuxishi@huawei.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.