All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Longpeng (Mike)" <longpeng2@huawei.com>
To: Mike Kravetz <mike.kravetz@oracle.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>
Cc: <akpm@linux-foundation.org>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, <arei.gonglei@huawei.com>,
	<weidong.huang@huawei.com>, <weifuqiang@huawei.com>,
	<kvm@vger.kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH] mm/hugetlb: avoid get wrong ptep caused by race
Date: Sat, 22 Feb 2020 10:15:47 +0800	[thread overview]
Message-ID: <e2b9af10-b048-e5d2-e5b5-609622226a3c@huawei.com> (raw)
In-Reply-To: <a82956f7-26e4-5c1c-8d5d-4b2510f6b17d@oracle.com>

在 2020/2/21 8:22, Mike Kravetz 写道:
> On 2/19/20 6:30 PM, Longpeng (Mike) wrote:
>> 在 2020/2/20 3:33, Mike Kravetz 写道:
>>> + Kirill
>>> On 2/18/20 5:58 PM, Sean Christopherson wrote:
>>>> On Wed, Feb 19, 2020 at 09:39:59AM +0800, Longpeng (Mike) wrote:
> <snip>
>>>> The race and the fix make sense.  I assumed dereferencing garbage from the
>>>> huge page was the issue, but I wasn't 100% that was the case, which is why
>>>> I asked about alternative fixes.
>>>>
>>>>> We change the code from
>>>>> 	if (pud_huge(*pud) || !pud_present(*pud))
>>>>> to
>>>>> 	if (pud_huge(*pud)
>>>>> 		return (pte_t *)pud;
>>>>> 	busy loop for 500ms
>>>>> 	if (!pud_present(*pud))
>>>>> 		return (pte_t *)pud;
>>>>> and the panic will be hit quickly.
>>>>>
>>>>> ARM64 has already use READ/WRITE_ONCE to access the pagetable, look at this
>>>>> commit 20a004e7 (arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables).
>>>>>
>>>>> The root cause is: 'if (pud_huge(*pud) || !pud_present(*pud))' read entry from
>>>>> pud twice and the *pud maybe change in a race, so if we only read the pud once.
>>>>> I use READ_ONCE here is just for safe, to prevents the complier mischief if
>>>>> possible.
>>>>
>>>> FWIW, I'd be in favor of going the READ/WRITE_ONCE() route for x86, e.g.
>>>> convert everything as a follow-up patch (or patches).  I'm fairly confident
>>>> that KVM's usage of lookup_address_in_mm() is safe, but I wouldn't exactly
>>>> bet my life on it.  I'd much rather the failing scenario be that KVM uses
>>>> a sub-optimal page size as opposed to exploding on a bad pointer.
>>>
>>> Longpeng(Mike) asked in another e-mail specifically about making similar
>>> changes to lookup_address_in_mm().  Replying here as there is more context.
>>>
>>> I 'think' lookup_address_in_mm is safe from this issue.  Why?  IIUC, the
>>> problem with the huge_pte_offset routine is that the pud changes from
>>> pud_none() to pud_huge() in the middle of
>>> 'if (pud_huge(*pud) || !pud_present(*pud))'.  In the case of
>>> lookup_address_in_mm, we know pud was not pud_none() as it was previously
>>> checked.  I am not aware of any other state transitions which could cause
>>> us trouble.  However, I am no expert in this area.
> 
> Bad copy/paste by me.  Longpeng(Mike) was asking about lookup_address_in_pgd.
> 
>> So... I need just fix huge_pte_offset in mm/hugetlb.c, right?
> 
> Let's start with just a fix for huge_pte_offset() as you can easily reproduce
> that issue by adding a delay.
> 
>> Is it possible the pud changes from pud_huge() to pud_none() while another CPU
>> is walking the pagetable ?
> 
All right, I'll send V2 to fix it, thanks :)

> I believe it is possible.  If we hole punch a hugetlbfs file, we will clear
> the corresponding pud's.  Hence, we can go from pud_huge() to pud_none().
> Unless I am missing something, that does imply we could have issues in places
> such as lookup_address_in_pgd:
> 
> 	pud = pud_offset(p4d, address);
> 	if (pud_none(*pud))
> 		return NULL;
> 
> 	*level = PG_LEVEL_1G;
> 	if (pud_large(*pud) || !pud_present(*pud))
> 		return (pte_t *)pud;
> 
> I hope I am wrong, but it seems like pud_none(*pud) could become true after
> the initial check, and before the (pud_large) check.  If so, there could be
> a problem (addressing exception) when the code continues and looks up the pmd.
> 
> 	pmd = pmd_offset(pud, address);
> 	if (pmd_none(*pmd))
> 		return NULL;
> 
> It has been mentioned before that there are many page table walks like this.
> What am I missing that prevents races like this?  Or, have we just been lucky?
> 
That's what I worry about. Maybe there is no usecase to hit it.

-- 
Regards,
Longpeng(Mike)


  reply	other threads:[~2020-02-22  2:16 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-18 12:10 [PATCH] mm/hugetlb: avoid get wrong ptep caused by race Longpeng(Mike)
2020-02-18 20:37 ` Sean Christopherson
2020-02-19  0:51   ` Mike Kravetz
2020-02-19  1:39   ` Longpeng (Mike)
2020-02-19  1:58     ` Sean Christopherson
2020-02-19 12:21       ` Longpeng (Mike)
2020-02-19 16:22         ` Sean Christopherson
2020-02-20  2:32           ` Longpeng (Mike)
2020-02-19 19:33       ` Mike Kravetz
2020-02-20  2:30         ` Longpeng (Mike)
2020-02-21  0:22           ` Mike Kravetz
2020-02-22  2:15             ` Longpeng (Mike) [this message]
2020-02-18 20:52 ` Matthew Wilcox
2020-02-19  2:09   ` Longpeng (Mike)
2020-02-19  3:49     ` Mike Kravetz
2020-02-19 12:52       ` Longpeng (Mike)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e2b9af10-b048-e5d2-e5b5-609622226a3c@huawei.com \
    --to=longpeng2@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=arei.gonglei@huawei.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=sean.j.christopherson@intel.com \
    --cc=weidong.huang@huawei.com \
    --cc=weifuqiang@huawei.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.