kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Longpeng (Mike)" <longpeng2@huawei.com>
To: Mike Kravetz <mike.kravetz@oracle.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>
Cc: <akpm@linux-foundation.org>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, <arei.gonglei@huawei.com>,
	<weidong.huang@huawei.com>, <weifuqiang@huawei.com>,
	<kvm@vger.kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH] mm/hugetlb: avoid get wrong ptep caused by race
Date: Sat, 22 Feb 2020 10:15:47 +0800	[thread overview]
Message-ID: <e2b9af10-b048-e5d2-e5b5-609622226a3c@huawei.com> (raw)
In-Reply-To: <a82956f7-26e4-5c1c-8d5d-4b2510f6b17d@oracle.com>

在 2020/2/21 8:22, Mike Kravetz 写道:
> On 2/19/20 6:30 PM, Longpeng (Mike) wrote:
>> 在 2020/2/20 3:33, Mike Kravetz 写道:
>>> + Kirill
>>> On 2/18/20 5:58 PM, Sean Christopherson wrote:
>>>> On Wed, Feb 19, 2020 at 09:39:59AM +0800, Longpeng (Mike) wrote:
> <snip>
>>>> The race and the fix make sense.  I assumed dereferencing garbage from the
>>>> huge page was the issue, but I wasn't 100% that was the case, which is why
>>>> I asked about alternative fixes.
>>>>
>>>>> We change the code from
>>>>> 	if (pud_huge(*pud) || !pud_present(*pud))
>>>>> to
>>>>> 	if (pud_huge(*pud)
>>>>> 		return (pte_t *)pud;
>>>>> 	busy loop for 500ms
>>>>> 	if (!pud_present(*pud))
>>>>> 		return (pte_t *)pud;
>>>>> and the panic will be hit quickly.
>>>>>
>>>>> ARM64 has already use READ/WRITE_ONCE to access the pagetable, look at this
>>>>> commit 20a004e7 (arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables).
>>>>>
>>>>> The root cause is: 'if (pud_huge(*pud) || !pud_present(*pud))' read entry from
>>>>> pud twice and the *pud maybe change in a race, so if we only read the pud once.
>>>>> I use READ_ONCE here is just for safe, to prevents the complier mischief if
>>>>> possible.
>>>>
>>>> FWIW, I'd be in favor of going the READ/WRITE_ONCE() route for x86, e.g.
>>>> convert everything as a follow-up patch (or patches).  I'm fairly confident
>>>> that KVM's usage of lookup_address_in_mm() is safe, but I wouldn't exactly
>>>> bet my life on it.  I'd much rather the failing scenario be that KVM uses
>>>> a sub-optimal page size as opposed to exploding on a bad pointer.
>>>
>>> Longpeng(Mike) asked in another e-mail specifically about making similar
>>> changes to lookup_address_in_mm().  Replying here as there is more context.
>>>
>>> I 'think' lookup_address_in_mm is safe from this issue.  Why?  IIUC, the
>>> problem with the huge_pte_offset routine is that the pud changes from
>>> pud_none() to pud_huge() in the middle of
>>> 'if (pud_huge(*pud) || !pud_present(*pud))'.  In the case of
>>> lookup_address_in_mm, we know pud was not pud_none() as it was previously
>>> checked.  I am not aware of any other state transitions which could cause
>>> us trouble.  However, I am no expert in this area.
> 
> Bad copy/paste by me.  Longpeng(Mike) was asking about lookup_address_in_pgd.
> 
>> So... I need just fix huge_pte_offset in mm/hugetlb.c, right?
> 
> Let's start with just a fix for huge_pte_offset() as you can easily reproduce
> that issue by adding a delay.
> 
>> Is it possible the pud changes from pud_huge() to pud_none() while another CPU
>> is walking the pagetable ?
> 
All right, I'll send V2 to fix it, thanks :)

> I believe it is possible.  If we hole punch a hugetlbfs file, we will clear
> the corresponding pud's.  Hence, we can go from pud_huge() to pud_none().
> Unless I am missing something, that does imply we could have issues in places
> such as lookup_address_in_pgd:
> 
> 	pud = pud_offset(p4d, address);
> 	if (pud_none(*pud))
> 		return NULL;
> 
> 	*level = PG_LEVEL_1G;
> 	if (pud_large(*pud) || !pud_present(*pud))
> 		return (pte_t *)pud;
> 
> I hope I am wrong, but it seems like pud_none(*pud) could become true after
> the initial check, and before the (pud_large) check.  If so, there could be
> a problem (addressing exception) when the code continues and looks up the pmd.
> 
> 	pmd = pmd_offset(pud, address);
> 	if (pmd_none(*pmd))
> 		return NULL;
> 
> It has been mentioned before that there are many page table walks like this.
> What am I missing that prevents races like this?  Or, have we just been lucky?
> 
That's what I worry about. Maybe there is no usecase to hit it.

-- 
Regards,
Longpeng(Mike)


  reply	other threads:[~2020-02-22  2:16 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-18 12:10 [PATCH] mm/hugetlb: avoid get wrong ptep caused by race Longpeng(Mike)
2020-02-18 20:37 ` Sean Christopherson
2020-02-19  0:51   ` Mike Kravetz
2020-02-19  1:39   ` Longpeng (Mike)
2020-02-19  1:58     ` Sean Christopherson
2020-02-19 12:21       ` Longpeng (Mike)
2020-02-19 16:22         ` Sean Christopherson
2020-02-20  2:32           ` Longpeng (Mike)
2020-02-19 19:33       ` Mike Kravetz
2020-02-20  2:30         ` Longpeng (Mike)
2020-02-21  0:22           ` Mike Kravetz
2020-02-22  2:15             ` Longpeng (Mike) [this message]
2020-02-18 20:52 ` Matthew Wilcox
2020-02-19  2:09   ` Longpeng (Mike)
2020-02-19  3:49     ` Mike Kravetz
2020-02-19 12:52       ` Longpeng (Mike)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e2b9af10-b048-e5d2-e5b5-609622226a3c@huawei.com \
    --to=longpeng2@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=arei.gonglei@huawei.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=sean.j.christopherson@intel.com \
    --cc=weidong.huang@huawei.com \
    --cc=weifuqiang@huawei.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).