* + mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch added to -mm tree
@ 2020-03-28 22:10 akpm
2020-03-31 3:35 ` Mike Kravetz
0 siblings, 1 reply; 7+ messages in thread
From: akpm @ 2020-03-28 22:10 UTC (permalink / raw)
To: jgg, longpeng2, mike.kravetz, mm-commits, sean.j.christopherson,
stable, willy
The patch titled
Subject: mm/hugetlb: fix a addressing exception caused by huge_pte_offset
has been added to the -mm tree. Its filename is
mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Longpeng <longpeng2@huawei.com>
Subject: mm/hugetlb: fix a addressing exception caused by huge_pte_offset
Our machine encountered a panic(addressing exception) after run
for a long time and the calltrace is:
RIP: 0010:[<ffffffff9dff0587>] [<ffffffff9dff0587>] hugetlb_fault+0x307/0xbe0
RSP: 0018:ffff9567fc27f808 EFLAGS: 00010286
RAX: e800c03ff1258d48 RBX: ffffd3bb003b69c0 RCX: e800c03ff1258d48
RDX: 17ff3fc00eda72b7 RSI: 00003ffffffff000 RDI: e800c03ff1258d48
RBP: ffff9567fc27f8c8 R08: e800c03ff1258d48 R09: 0000000000000080
R10: ffffaba0704c22a8 R11: 0000000000000001 R12: ffff95c87b4b60d8
R13: 00005fff00000000 R14: 0000000000000000 R15: ffff9567face8074
FS: 00007fe2d9ffb700(0000) GS:ffff956900e40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffd3bb003b69c0 CR3: 000000be67374000 CR4: 00000000003627e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffff9df9b71b>] ? unlock_page+0x2b/0x30
[<ffffffff9dff04a2>] ? hugetlb_fault+0x222/0xbe0
[<ffffffff9dff1405>] follow_hugetlb_page+0x175/0x540
[<ffffffff9e15b825>] ? cpumask_next_and+0x35/0x50
[<ffffffff9dfc7230>] __get_user_pages+0x2a0/0x7e0
[<ffffffff9dfc648d>] __get_user_pages_unlocked+0x15d/0x210
[<ffffffffc068cfc5>] __gfn_to_pfn_memslot+0x3c5/0x460 [kvm]
[<ffffffffc06b28be>] try_async_pf+0x6e/0x2a0 [kvm]
[<ffffffffc06b4b41>] tdp_page_fault+0x151/0x2d0 [kvm]
...
[<ffffffffc06a6f90>] kvm_arch_vcpu_ioctl_run+0x330/0x490 [kvm]
[<ffffffffc068d919>] kvm_vcpu_ioctl+0x309/0x6d0 [kvm]
[<ffffffff9deaa8c2>] ? dequeue_signal+0x32/0x180
[<ffffffff9deae34d>] ? do_sigtimedwait+0xcd/0x230
[<ffffffff9e03aed0>] do_vfs_ioctl+0x3f0/0x540
[<ffffffff9e03b0c1>] SyS_ioctl+0xa1/0xc0
[<ffffffff9e53879b>] system_call_fastpath+0x22/0x27
For 1G hugepages, huge_pte_offset() wants to return NULL or pudp, but it
may return a wrong 'pmdp' if there is a race. Please look at the following
code snippet:
...
pud = pud_offset(p4d, addr);
if (sz != PUD_SIZE && pud_none(*pud))
return NULL;
/* hugepage or swap? */
if (pud_huge(*pud) || !pud_present(*pud))
return (pte_t *)pud;
pmd = pmd_offset(pud, addr);
if (sz != PMD_SIZE && pmd_none(*pmd))
return NULL;
/* hugepage or swap? */
if (pmd_huge(*pmd) || !pmd_present(*pmd))
return (pte_t *)pmd;
...
The following sequence would trigger this bug:
1. CPU0: sz = PUD_SIZE and *pud = 0 , continue
1. CPU0: "pud_huge(*pud)" is false
2. CPU1: calling hugetlb_no_page and set *pud to xxxx8e7(PRESENT)
3. CPU0: "!pud_present(*pud)" is false, continue
4. CPU0: pmd = pmd_offset(pud, addr) and maybe return a wrong pmdp
However, we want CPU0 to return NULL or pudp in this case.
Also, according to the section 'COMPILER BARRIER' of memory-barriers.txt:
'''
(*) The compiler is within its rights to reorder loads and stores
to the same variable, and in some cases, the CPU is within its
rights to reorder loads to the same variable. This means that
the following code:
a[0] = x;
a[1] = x;
Might result in an older value of x stored in a[1] than in a[0].
'''
there're several other data races in huge_pte_offset, for example:
'''
p4d = p4d_offset(pgd, addr)
if (!p4d_present(*p4d))
return NULL;
pud = pud_offset(p4d, addr) <-- will be unwinded as:
pud = (pud_t *)p4d_page_vaddr(*p4d) + pud_index(address);
'''
which is free for the compiler/CPU to execute as:
'''
p4d = p4d_offset(pgd, addr)
p4d_for_vaddr = *p4d;
if (!p4d_present(*p4d))
return NULL;
pud = (pud_t *)p4d_page_vaddr(p4d_for_vaddr) + pud_index(address);
'''
so in the case where *p4d goes from '!present' to 'present':
p4d_present(*p4d) == true and p4d_for_vaddr == none, meaning the
p4d_page_vaddr() will crash.
For these reasons, we must make sure there is exactly one dereference of
p4d, pud and pmd.
Link: http://lkml.kernel.org/r/20200327235748.2048-1-longpeng2@huawei.com
Signed-off-by: Longpeng <longpeng2@huawei.com>
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/hugetlb.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset
+++ a/mm/hugetlb.c
@@ -4909,29 +4909,33 @@ pte_t *huge_pte_offset(struct mm_struct
unsigned long addr, unsigned long sz)
{
pgd_t *pgd;
- p4d_t *p4d;
- pud_t *pud;
- pmd_t *pmd;
+ p4d_t *p4d, p4d_entry;
+ pud_t *pud, pud_entry;
+ pmd_t *pmd, pmd_entry;
pgd = pgd_offset(mm, addr);
if (!pgd_present(*pgd))
return NULL;
+
p4d = p4d_offset(pgd, addr);
- if (!p4d_present(*p4d))
+ p4d_entry = READ_ONCE(*p4d);
+ if (!p4d_present(p4d_entry))
return NULL;
- pud = pud_offset(p4d, addr);
- if (sz != PUD_SIZE && pud_none(*pud))
+ pud = pud_offset(&p4d_entry, addr);
+ pud_entry = READ_ONCE(*pud);
+ if (sz != PUD_SIZE && pud_none(pud_entry))
return NULL;
/* hugepage or swap? */
- if (pud_huge(*pud) || !pud_present(*pud))
+ if (pud_huge(pud_entry) || !pud_present(pud_entry))
return (pte_t *)pud;
- pmd = pmd_offset(pud, addr);
- if (sz != PMD_SIZE && pmd_none(*pmd))
+ pmd = pmd_offset(&pud_entry, addr);
+ pmd_entry = READ_ONCE(*pmd);
+ if (sz != PMD_SIZE && pmd_none(pmd_entry))
return NULL;
/* hugepage or swap? */
- if (pmd_huge(*pmd) || !pmd_present(*pmd))
+ if (pmd_huge(pmd_entry) || !pmd_present(pmd_entry))
return (pte_t *)pmd;
return NULL;
_
Patches currently in -mm which might be from longpeng2@huawei.com are
mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: + mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch added to -mm tree
2020-03-28 22:10 + mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch added to -mm tree akpm
@ 2020-03-31 3:35 ` Mike Kravetz
2020-03-31 4:44 ` Sean Christopherson
0 siblings, 1 reply; 7+ messages in thread
From: Mike Kravetz @ 2020-03-31 3:35 UTC (permalink / raw)
To: akpm, jgg, longpeng2, mm-commits, sean.j.christopherson, stable,
willy, Naresh Kamboju
On 3/28/20 3:10 PM, akpm@linux-foundation.org wrote:
> The patch titled
> Subject: mm/hugetlb: fix a addressing exception caused by huge_pte_offset
> has been added to the -mm tree. Its filename is
> mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
>
> This patch should soon appear at
> http://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
> and later at
> http://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
>
> Before you just go and hit "reply", please:
> a) Consider who else should be cc'ed
> b) Prefer to cc a suitable mailing list as well
> c) Ideally: find the original patch on the mailing list and do a
> reply-to-all to that, adding suitable additional cc's
>
> *** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
>
> The -mm tree is included into linux-next and is updated
> there every 3-4 working days
>
> ------------------------------------------------------
> From: Longpeng <longpeng2@huawei.com>
> Subject: mm/hugetlb: fix a addressing exception caused by huge_pte_offset
This patch is what caused the BUG reported on i386 non-PAE kernel here:
https://lore.kernel.org/linux-mm/CA+G9fYsJgZhhWLMzUxu_ZQ+THdCcJmFbHQ2ETA_YPP8M6yxOYA@mail.gmail.com/
As a clue, when building in this environment I get:
CC mm/hugetlb.o
mm/hugetlb.c: In function ‘huge_pte_offset’:
cc1: warning: function may return address of local variable [-Wreturn-local-addr]
mm/hugetlb.c:5361:14: note: declared here
pud_t *pud, pud_entry;
^~~~~~~~~
cc1: warning: function may return address of local variable [-Wreturn-local-addr]
mm/hugetlb.c:5361:14: note: declared here
cc1: warning: function may return address of local variable [-Wreturn-local-addr]
mm/hugetlb.c:5360:14: note: declared here
p4d_t *p4d, p4d_entry;
^~~~~~~~~
I'm shutting down for the night and will look into it more tomorrow if
someone else does not beat me to it.
--
Mike Kravetz
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: + mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch added to -mm tree
2020-03-31 3:35 ` Mike Kravetz
@ 2020-03-31 4:44 ` Sean Christopherson
2020-03-31 14:08 ` Jason Gunthorpe
0 siblings, 1 reply; 7+ messages in thread
From: Sean Christopherson @ 2020-03-31 4:44 UTC (permalink / raw)
To: Mike Kravetz
Cc: akpm, jgg, longpeng2, mm-commits, stable, willy, Naresh Kamboju
On Mon, Mar 30, 2020 at 08:35:29PM -0700, Mike Kravetz wrote:
> On 3/28/20 3:10 PM, akpm@linux-foundation.org wrote:
> > The patch titled
> > Subject: mm/hugetlb: fix a addressing exception caused by huge_pte_offset
> > has been added to the -mm tree. Its filename is
> > mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
> >
> > This patch should soon appear at
> > http://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
> > and later at
> > http://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
> >
> > Before you just go and hit "reply", please:
> > a) Consider who else should be cc'ed
> > b) Prefer to cc a suitable mailing list as well
> > c) Ideally: find the original patch on the mailing list and do a
> > reply-to-all to that, adding suitable additional cc's
> >
> > *** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
> >
> > The -mm tree is included into linux-next and is updated
> > there every 3-4 working days
> >
> > ------------------------------------------------------
> > From: Longpeng <longpeng2@huawei.com>
> > Subject: mm/hugetlb: fix a addressing exception caused by huge_pte_offset
>
> This patch is what caused the BUG reported on i386 non-PAE kernel here:
>
> https://lore.kernel.org/linux-mm/CA+G9fYsJgZhhWLMzUxu_ZQ+THdCcJmFbHQ2ETA_YPP8M6yxOYA@mail.gmail.com/
>
> As a clue, when building in this environment I get:
>
> CC mm/hugetlb.o
> mm/hugetlb.c: In function ‘huge_pte_offset’:
> cc1: warning: function may return address of local variable [-Wreturn-local-addr]
> mm/hugetlb.c:5361:14: note: declared here
> pud_t *pud, pud_entry;
> ^~~~~~~~~
> cc1: warning: function may return address of local variable [-Wreturn-local-addr]
> mm/hugetlb.c:5361:14: note: declared here
> cc1: warning: function may return address of local variable [-Wreturn-local-addr]
> mm/hugetlb.c:5360:14: note: declared here
> p4d_t *p4d, p4d_entry;
> ^~~~~~~~~
>
> I'm shutting down for the night and will look into it more tomorrow if
> someone else does not beat me to it.
Non-PAE uses ModeB / PSE paging, which only has 2-level page tables. The
non-existent levels get folded in and pmd_offset/pud_offset() return the
passed in pointer instead of accessing a table, e.g.:
static inline pmd_t * pmd_offset(pud_t * pud, unsigned long address)
{
return (pmd_t *)pud;
}
The bug probably only manifests with PSE paging because it can have huge
pages in the top-level table, i.e. is the only mode that can get a false
positive.
This is arguably a bug in pmd_huge/pud_hug(), seems like they should
unconditionally return false if the relevant level doesn't exist.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: + mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch added to -mm tree
2020-03-31 4:44 ` Sean Christopherson
@ 2020-03-31 14:08 ` Jason Gunthorpe
2020-03-31 21:58 ` Mike Kravetz
0 siblings, 1 reply; 7+ messages in thread
From: Jason Gunthorpe @ 2020-03-31 14:08 UTC (permalink / raw)
To: Sean Christopherson
Cc: Mike Kravetz, akpm, longpeng2, mm-commits, stable, willy, Naresh Kamboju
On Mon, Mar 30, 2020 at 09:44:08PM -0700, Sean Christopherson wrote:
> On Mon, Mar 30, 2020 at 08:35:29PM -0700, Mike Kravetz wrote:
> > On 3/28/20 3:10 PM, akpm@linux-foundation.org wrote:
> > > The patch titled
> > > Subject: mm/hugetlb: fix a addressing exception caused by huge_pte_offset
> > > has been added to the -mm tree. Its filename is
> > > mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
> > >
> > > This patch should soon appear at
> > > http://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
> > > and later at
> > > http://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
> > >
> > > Before you just go and hit "reply", please:
> > > a) Consider who else should be cc'ed
> > > b) Prefer to cc a suitable mailing list as well
> > > c) Ideally: find the original patch on the mailing list and do a
> > > reply-to-all to that, adding suitable additional cc's
> > >
> > > *** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
> > >
> > > The -mm tree is included into linux-next and is updated
> > > there every 3-4 working days
> > >
> > > From: Longpeng <longpeng2@huawei.com>
> > > Subject: mm/hugetlb: fix a addressing exception caused by huge_pte_offset
> >
> > This patch is what caused the BUG reported on i386 non-PAE kernel here:
> >
> > https://lore.kernel.org/linux-mm/CA+G9fYsJgZhhWLMzUxu_ZQ+THdCcJmFbHQ2ETA_YPP8M6yxOYA@mail.gmail.com/
> >
> > As a clue, when building in this environment I get:
> >
> > CC mm/hugetlb.o
> > mm/hugetlb.c: In function ‘huge_pte_offset’:
> > cc1: warning: function may return address of local variable [-Wreturn-local-addr]
> > mm/hugetlb.c:5361:14: note: declared here
> > pud_t *pud, pud_entry;
> > ^~~~~~~~~
> > cc1: warning: function may return address of local variable [-Wreturn-local-addr]
> > mm/hugetlb.c:5361:14: note: declared here
> > cc1: warning: function may return address of local variable [-Wreturn-local-addr]
> > mm/hugetlb.c:5360:14: note: declared here
> > p4d_t *p4d, p4d_entry;
> > ^~~~~~~~~
Yes, this is certainly very bad.
> Non-PAE uses ModeB / PSE paging, which only has 2-level page tables. The
> non-existent levels get folded in and pmd_offset/pud_offset() return the
> passed in pointer instead of accessing a table, e.g.:
>
> static inline pmd_t * pmd_offset(pud_t * pud, unsigned long address)
> {
> return (pmd_t *)pud;
> }
> The bug probably only manifests with PSE paging because it can have huge
> pages in the top-level table, i.e. is the only mode that can get a false
> positive.
> This is arguably a bug in pmd_huge/pud_hug(), seems like they should
> unconditionally return false if the relevant level doesn't exist.
The issue is that to get the READ_ONCE semantic for a lockless flow
this hackily defeats the de-reference inside the pXX_offset by passing
in a pointer to a stack variable. This is fine unless you actually
care about the *address* of the result of pXX_offset, which
huge_pte_offset() does.
I can't think of an easy fix here.
Andrew, I think this patch has to be dropped :(
Longpeng can fix the direct bug he saw by not changing the
pXX_offset(), but this extra de-reference will remain some
theortical/rare bug according to the memory model.
Maybe we need to change pXX_offset to take in the pointer and the
de'refd value?
Jason
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: + mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch added to -mm tree
2020-03-31 14:08 ` Jason Gunthorpe
@ 2020-03-31 21:58 ` Mike Kravetz
0 siblings, 0 replies; 7+ messages in thread
From: Mike Kravetz @ 2020-03-31 21:58 UTC (permalink / raw)
To: Jason Gunthorpe, Sean Christopherson
Cc: akpm, longpeng2, mm-commits, stable, willy, Naresh Kamboju
On 3/31/20 7:08 AM, Jason Gunthorpe wrote:
> I can't think of an easy fix here.
>
> Andrew, I think this patch has to be dropped :(
>
> Longpeng can fix the direct bug he saw by not changing the
> pXX_offset(), but this extra de-reference will remain some
> theortical/rare bug according to the memory model.
FWIW,
I tested Longpeng's V2 patch without the READ_ONCE for *pgd and *p4d
in this environment and it worked fine.
--
Mike Kravetz
^ permalink raw reply [flat|nested] 7+ messages in thread
* incoming
@ 2020-04-12 7:41 Andrew Morton
2020-04-13 20:51 ` + mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch added to -mm tree Andrew Morton
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2020-04-12 7:41 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
A straggler. This patch caused a lot of build errors on a lot of
architectures for a long time, but Anshuman believes it's all fixed up
now.
1 patch, based on GIT b032227c62939b5481bcd45442b36dfa263f4a7c.
Anshuman Khandual <anshuman.khandual@arm.com>:
mm/debug: add tests validating architecture page table helpers
Documentation/features/debug/debug-vm-pgtable/arch-support.txt | 34
arch/arc/Kconfig | 1
arch/arm64/Kconfig | 1
arch/powerpc/Kconfig | 1
arch/s390/Kconfig | 1
arch/x86/Kconfig | 1
arch/x86/include/asm/pgtable_64.h | 6
include/linux/mmdebug.h | 5
init/main.c | 2
lib/Kconfig.debug | 26
mm/Makefile | 1
mm/debug_vm_pgtable.c | 392 ++++++++++
12 files changed, 471 insertions(+)
^ permalink raw reply [flat|nested] 7+ messages in thread
* + mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch added to -mm tree
2020-04-12 7:41 incoming Andrew Morton
@ 2020-04-13 20:51 ` Andrew Morton
0 siblings, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2020-04-13 20:51 UTC (permalink / raw)
To: jgg, longpeng2, mike.kravetz, mm-commits, sean.j.christopherson,
stable, willy
The patch titled
Subject: mm/hugetlb: fix a addressing exception caused by huge_pte_offset
has been added to the -mm tree. Its filename is
mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Longpeng <longpeng2@huawei.com>
Subject: mm/hugetlb: fix a addressing exception caused by huge_pte_offset
Our machine encountered a panic(addressing exception) after run for a long
time and the calltrace is:
RIP: 0010:[<ffffffff9dff0587>] [<ffffffff9dff0587>] hugetlb_fault+0x307/0xbe0
RSP: 0018:ffff9567fc27f808 EFLAGS: 00010286
RAX: e800c03ff1258d48 RBX: ffffd3bb003b69c0 RCX: e800c03ff1258d48
RDX: 17ff3fc00eda72b7 RSI: 00003ffffffff000 RDI: e800c03ff1258d48
RBP: ffff9567fc27f8c8 R08: e800c03ff1258d48 R09: 0000000000000080
R10: ffffaba0704c22a8 R11: 0000000000000001 R12: ffff95c87b4b60d8
R13: 00005fff00000000 R14: 0000000000000000 R15: ffff9567face8074
FS: 00007fe2d9ffb700(0000) GS:ffff956900e40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffd3bb003b69c0 CR3: 000000be67374000 CR4: 00000000003627e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffff9df9b71b>] ? unlock_page+0x2b/0x30
[<ffffffff9dff04a2>] ? hugetlb_fault+0x222/0xbe0
[<ffffffff9dff1405>] follow_hugetlb_page+0x175/0x540
[<ffffffff9e15b825>] ? cpumask_next_and+0x35/0x50
[<ffffffff9dfc7230>] __get_user_pages+0x2a0/0x7e0
[<ffffffff9dfc648d>] __get_user_pages_unlocked+0x15d/0x210
[<ffffffffc068cfc5>] __gfn_to_pfn_memslot+0x3c5/0x460 [kvm]
[<ffffffffc06b28be>] try_async_pf+0x6e/0x2a0 [kvm]
[<ffffffffc06b4b41>] tdp_page_fault+0x151/0x2d0 [kvm]
...
[<ffffffffc06a6f90>] kvm_arch_vcpu_ioctl_run+0x330/0x490 [kvm]
[<ffffffffc068d919>] kvm_vcpu_ioctl+0x309/0x6d0 [kvm]
[<ffffffff9deaa8c2>] ? dequeue_signal+0x32/0x180
[<ffffffff9deae34d>] ? do_sigtimedwait+0xcd/0x230
[<ffffffff9e03aed0>] do_vfs_ioctl+0x3f0/0x540
[<ffffffff9e03b0c1>] SyS_ioctl+0xa1/0xc0
[<ffffffff9e53879b>] system_call_fastpath+0x22/0x27
For 1G hugepages, huge_pte_offset() wants to return NULL or pudp, but it
may return a wrong 'pmdp' if there is a race. Please look at the
following code snippet:
...
pud = pud_offset(p4d, addr);
if (sz != PUD_SIZE && pud_none(*pud))
return NULL;
/* hugepage or swap? */
if (pud_huge(*pud) || !pud_present(*pud))
return (pte_t *)pud;
pmd = pmd_offset(pud, addr);
if (sz != PMD_SIZE && pmd_none(*pmd))
return NULL;
/* hugepage or swap? */
if (pmd_huge(*pmd) || !pmd_present(*pmd))
return (pte_t *)pmd;
...
The following sequence would trigger this bug:
1. CPU0: sz = PUD_SIZE and *pud = 0 , continue
1. CPU0: "pud_huge(*pud)" is false
2. CPU1: calling hugetlb_no_page and set *pud to xxxx8e7(PRESENT)
3. CPU0: "!pud_present(*pud)" is false, continue
4. CPU0: pmd = pmd_offset(pud, addr) and maybe return a wrong pmdp
However, we want CPU0 to return NULL or pudp in this case.
We must make sure there is exactly one dereference of pud and pmd.
Link: http://lkml.kernel.org/r/20200413010342.771-1-longpeng2@huawei.com
Signed-off-by: Longpeng <longpeng2@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/hugetlb.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset
+++ a/mm/hugetlb.c
@@ -5365,8 +5365,8 @@ pte_t *huge_pte_offset(struct mm_struct
{
pgd_t *pgd;
p4d_t *p4d;
- pud_t *pud;
- pmd_t *pmd;
+ pud_t *pud, pud_entry;
+ pmd_t *pmd, pmd_entry;
pgd = pgd_offset(mm, addr);
if (!pgd_present(*pgd))
@@ -5376,17 +5376,19 @@ pte_t *huge_pte_offset(struct mm_struct
return NULL;
pud = pud_offset(p4d, addr);
- if (sz != PUD_SIZE && pud_none(*pud))
+ pud_entry = READ_ONCE(*pud);
+ if (sz != PUD_SIZE && pud_none(pud_entry))
return NULL;
/* hugepage or swap? */
- if (pud_huge(*pud) || !pud_present(*pud))
+ if (pud_huge(pud_entry) || !pud_present(pud_entry))
return (pte_t *)pud;
pmd = pmd_offset(pud, addr);
- if (sz != PMD_SIZE && pmd_none(*pmd))
+ pmd_entry = READ_ONCE(*pmd);
+ if (sz != PMD_SIZE && pmd_none(pmd_entry))
return NULL;
/* hugepage or swap? */
- if (pmd_huge(*pmd) || !pmd_present(*pmd))
+ if (pmd_huge(pmd_entry) || !pmd_present(pmd_entry))
return (pte_t *)pmd;
return NULL;
_
Patches currently in -mm which might be from longpeng2@huawei.com are
mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
^ permalink raw reply [flat|nested] 7+ messages in thread
* incoming
@ 2020-02-04 1:33 Andrew Morton
2020-02-24 3:29 ` + mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch added to -mm tree Andrew Morton
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2020-02-04 1:33 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
The rest of MM and the rest of everything else.
Subsystems affected by this patch series:
hotfixes
mm/pagealloc
mm/memory-hotplug
ipc
misc
mm/cleanups
mm/pagemap
procfs
lib
cleanups
arm
Subsystem: hotfixes
Gang He <GHe@suse.com>:
ocfs2: fix oops when writing cloned file
David Hildenbrand <david@redhat.com>:
Patch series "mm: fix max_pfn not falling on section boundary", v2:
mm/page_alloc.c: fix uninitialized memmaps on a partially populated last section
fs/proc/page.c: allow inspection of last section and fix end detection
mm/page_alloc.c: initialize memmap of unavailable memory directly
Subsystem: mm/pagealloc
David Hildenbrand <david@redhat.com>:
mm/page_alloc: fix and rework pfn handling in memmap_init_zone()
mm: factor out next_present_section_nr()
Subsystem: mm/memory-hotplug
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
Patch series "mm/memory_hotplug: Shrink zones before removing memory", v6:
mm/memmap_init: update variable name in memmap_init_zone
David Hildenbrand <david@redhat.com>:
mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()
mm/memory_hotplug: we always have a zone in find_(smallest|biggest)_section_pfn
mm/memory_hotplug: don't check for "all holes" in shrink_zone_span()
mm/memory_hotplug: drop local variables in shrink_zone_span()
mm/memory_hotplug: cleanup __remove_pages()
mm/memory_hotplug: drop valid_start/valid_end from test_pages_in_a_zone()
Subsystem: ipc
Manfred Spraul <manfred@colorfullife.com>:
smp_mb__{before,after}_atomic(): update Documentation
Davidlohr Bueso <dave@stgolabs.net>:
ipc/mqueue.c: remove duplicated code
Manfred Spraul <manfred@colorfullife.com>:
ipc/mqueue.c: update/document memory barriers
ipc/msg.c: update and document memory barriers
ipc/sem.c: document and update memory barriers
Lu Shuaibing <shuaibinglu@126.com>:
ipc/msg.c: consolidate all xxxctl_down() functions
drivers/block/null_blk_main.c: fix layout
Subsystem: misc
Andrew Morton <akpm@linux-foundation.org>:
drivers/block/null_blk_main.c: fix layout
drivers/block/null_blk_main.c: fix uninitialized var warnings
Randy Dunlap <rdunlap@infradead.org>:
pinctrl: fix pxa2xx.c build warnings
Subsystem: mm/cleanups
Florian Westphal <fw@strlen.de>:
mm: remove __krealloc
Subsystem: mm/pagemap
Steven Price <steven.price@arm.com>:
Patch series "Generic page walk and ptdump", v17:
mm: add generic p?d_leaf() macros
arc: mm: add p?d_leaf() definitions
arm: mm: add p?d_leaf() definitions
arm64: mm: add p?d_leaf() definitions
mips: mm: add p?d_leaf() definitions
powerpc: mm: add p?d_leaf() definitions
riscv: mm: add p?d_leaf() definitions
s390: mm: add p?d_leaf() definitions
sparc: mm: add p?d_leaf() definitions
x86: mm: add p?d_leaf() definitions
mm: pagewalk: add p4d_entry() and pgd_entry()
mm: pagewalk: allow walking without vma
mm: pagewalk: don't lock PTEs for walk_page_range_novma()
mm: pagewalk: fix termination condition in walk_pte_range()
mm: pagewalk: add 'depth' parameter to pte_hole
x86: mm: point to struct seq_file from struct pg_state
x86: mm+efi: convert ptdump_walk_pgd_level() to take a mm_struct
x86: mm: convert ptdump_walk_pgd_level_debugfs() to take an mm_struct
mm: add generic ptdump
x86: mm: convert dump_pagetables to use walk_page_range
arm64: mm: convert mm/dump.c to use walk_page_range()
arm64: mm: display non-present entries in ptdump
mm: ptdump: reduce level numbers by 1 in note_page()
x86: mm: avoid allocating struct mm_struct on the stack
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
Patch series "Fixup page directory freeing", v4:
powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case
Peter Zijlstra <peterz@infradead.org>:
mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flush
asm-generic/tlb: avoid potential double flush
asm-gemeric/tlb: remove stray function declarations
asm-generic/tlb: add missing CONFIG symbol
asm-generic/tlb: rename HAVE_RCU_TABLE_FREE
asm-generic/tlb: rename HAVE_MMU_GATHER_PAGE_SIZE
asm-generic/tlb: rename HAVE_MMU_GATHER_NO_GATHER
asm-generic/tlb: provide MMU_GATHER_TABLE_FREE
Subsystem: procfs
Alexey Dobriyan <adobriyan@gmail.com>:
proc: decouple proc from VFS with "struct proc_ops"
proc: convert everything to "struct proc_ops"
Subsystem: lib
Yury Norov <yury.norov@gmail.com>:
Patch series "lib: rework bitmap_parse", v5:
lib/string: add strnchrnul()
bitops: more BITS_TO_* macros
lib: add test for bitmap_parse()
lib: make bitmap_parse_user a wrapper on bitmap_parse
lib: rework bitmap_parse()
lib: new testcases for bitmap_parse{_user}
include/linux/cpumask.h: don't calculate length of the input string
Subsystem: cleanups
Masahiro Yamada <masahiroy@kernel.org>:
treewide: remove redundant IS_ERR() before error code check
Subsystem: arm
Chen-Yu Tsai <wens@csie.org>:
ARM: dma-api: fix max_pfn off-by-one error in __dma_supported()
Documentation/memory-barriers.txt | 14
arch/Kconfig | 17
arch/alpha/kernel/srm_env.c | 17
arch/arc/include/asm/pgtable.h | 1
arch/arm/Kconfig | 2
arch/arm/include/asm/pgtable-2level.h | 1
arch/arm/include/asm/pgtable-3level.h | 1
arch/arm/include/asm/tlb.h | 6
arch/arm/kernel/atags_proc.c | 8
arch/arm/mm/alignment.c | 14
arch/arm/mm/dma-mapping.c | 2
arch/arm64/Kconfig | 3
arch/arm64/Kconfig.debug | 19
arch/arm64/include/asm/pgtable.h | 2
arch/arm64/include/asm/ptdump.h | 8
arch/arm64/mm/Makefile | 4
arch/arm64/mm/dump.c | 152 ++----
arch/arm64/mm/mmu.c | 4
arch/arm64/mm/ptdump_debugfs.c | 2
arch/ia64/kernel/salinfo.c | 24 -
arch/m68k/kernel/bootinfo_proc.c | 8
arch/mips/include/asm/pgtable.h | 5
arch/mips/lasat/picvue_proc.c | 31 -
arch/powerpc/Kconfig | 7
arch/powerpc/include/asm/book3s/32/pgalloc.h | 8
arch/powerpc/include/asm/book3s/64/pgalloc.h | 2
arch/powerpc/include/asm/book3s/64/pgtable.h | 3
arch/powerpc/include/asm/nohash/pgalloc.h | 8
arch/powerpc/include/asm/tlb.h | 11
arch/powerpc/kernel/proc_powerpc.c | 10
arch/powerpc/kernel/rtas-proc.c | 70 +--
arch/powerpc/kernel/rtas_flash.c | 34 -
arch/powerpc/kernel/rtasd.c | 14
arch/powerpc/mm/book3s64/pgtable.c | 7
arch/powerpc/mm/numa.c | 12
arch/powerpc/platforms/pseries/lpar.c | 24 -
arch/powerpc/platforms/pseries/lparcfg.c | 14
arch/powerpc/platforms/pseries/reconfig.c | 8
arch/powerpc/platforms/pseries/scanlog.c | 15
arch/riscv/include/asm/pgtable-64.h | 7
arch/riscv/include/asm/pgtable.h | 7
arch/s390/Kconfig | 4
arch/s390/include/asm/pgtable.h | 2
arch/sh/mm/alignment.c | 17
arch/sparc/Kconfig | 3
arch/sparc/include/asm/pgtable_64.h | 2
arch/sparc/include/asm/tlb_64.h | 11
arch/sparc/kernel/led.c | 15
arch/um/drivers/mconsole_kern.c | 9
arch/um/kernel/exitcode.c | 15
arch/um/kernel/process.c | 15
arch/x86/Kconfig | 3
arch/x86/Kconfig.debug | 20
arch/x86/include/asm/pgtable.h | 10
arch/x86/include/asm/tlb.h | 4
arch/x86/kernel/cpu/mtrr/if.c | 21
arch/x86/mm/Makefile | 4
arch/x86/mm/debug_pagetables.c | 18
arch/x86/mm/dump_pagetables.c | 418 +++++-------------
arch/x86/platform/efi/efi_32.c | 2
arch/x86/platform/efi/efi_64.c | 4
arch/x86/platform/uv/tlb_uv.c | 14
arch/xtensa/platforms/iss/simdisk.c | 10
crypto/af_alg.c | 2
drivers/acpi/battery.c | 15
drivers/acpi/proc.c | 15
drivers/acpi/scan.c | 2
drivers/base/memory.c | 9
drivers/block/null_blk_main.c | 58 +-
drivers/char/hw_random/bcm2835-rng.c | 2
drivers/char/hw_random/omap-rng.c | 4
drivers/clk/clk.c | 2
drivers/dma/mv_xor_v2.c | 2
drivers/firmware/efi/arm-runtime.c | 2
drivers/gpio/gpiolib-devres.c | 2
drivers/gpio/gpiolib-of.c | 8
drivers/gpio/gpiolib.c | 2
drivers/hwmon/dell-smm-hwmon.c | 15
drivers/i2c/busses/i2c-mv64xxx.c | 5
drivers/i2c/busses/i2c-synquacer.c | 2
drivers/ide/ide-proc.c | 19
drivers/input/input.c | 28 -
drivers/isdn/capi/kcapi_proc.c | 6
drivers/macintosh/via-pmu.c | 17
drivers/md/md.c | 15
drivers/misc/sgi-gru/gruprocfs.c | 42 -
drivers/mtd/ubi/build.c | 2
drivers/net/wireless/cisco/airo.c | 126 ++---
drivers/net/wireless/intel/ipw2x00/libipw_module.c | 15
drivers/net/wireless/intersil/hostap/hostap_hw.c | 4
drivers/net/wireless/intersil/hostap/hostap_proc.c | 14
drivers/net/wireless/intersil/hostap/hostap_wlan.h | 2
drivers/net/wireless/ray_cs.c | 20
drivers/of/device.c | 2
drivers/parisc/led.c | 17
drivers/pci/controller/pci-tegra.c | 2
drivers/pci/proc.c | 25 -
drivers/phy/phy-core.c | 4
drivers/pinctrl/pxa/pinctrl-pxa2xx.c | 1
drivers/platform/x86/thinkpad_acpi.c | 15
drivers/platform/x86/toshiba_acpi.c | 60 +-
drivers/pnp/isapnp/proc.c | 9
drivers/pnp/pnpbios/proc.c | 17
drivers/s390/block/dasd_proc.c | 15
drivers/s390/cio/blacklist.c | 14
drivers/s390/cio/css.c | 11
drivers/scsi/esas2r/esas2r_main.c | 9
drivers/scsi/scsi_devinfo.c | 15
drivers/scsi/scsi_proc.c | 29 -
drivers/scsi/sg.c | 30 -
drivers/spi/spi-orion.c | 3
drivers/staging/rtl8192u/ieee80211/ieee80211_module.c | 14
drivers/tty/sysrq.c | 8
drivers/usb/gadget/function/rndis.c | 17
drivers/video/fbdev/imxfb.c | 2
drivers/video/fbdev/via/viafbdev.c | 105 ++--
drivers/zorro/proc.c | 9
fs/cifs/cifs_debug.c | 108 ++--
fs/cifs/dfs_cache.c | 13
fs/cifs/dfs_cache.h | 2
fs/ext4/super.c | 2
fs/f2fs/node.c | 2
fs/fscache/internal.h | 2
fs/fscache/object-list.c | 11
fs/fscache/proc.c | 2
fs/jbd2/journal.c | 13
fs/jfs/jfs_debug.c | 14
fs/lockd/procfs.c | 12
fs/nfsd/nfsctl.c | 13
fs/nfsd/stats.c | 12
fs/ocfs2/file.c | 14
fs/ocfs2/suballoc.c | 2
fs/proc/cpuinfo.c | 12
fs/proc/generic.c | 38 -
fs/proc/inode.c | 76 +--
fs/proc/internal.h | 5
fs/proc/kcore.c | 13
fs/proc/kmsg.c | 14
fs/proc/page.c | 54 +-
fs/proc/proc_net.c | 32 -
fs/proc/proc_sysctl.c | 2
fs/proc/root.c | 2
fs/proc/stat.c | 12
fs/proc/task_mmu.c | 4
fs/proc/vmcore.c | 10
fs/sysfs/group.c | 2
include/asm-generic/pgtable.h | 20
include/asm-generic/tlb.h | 138 +++--
include/linux/bitmap.h | 8
include/linux/bitops.h | 4
include/linux/cpumask.h | 4
include/linux/memory_hotplug.h | 4
include/linux/mm.h | 6
include/linux/mmzone.h | 10
include/linux/pagewalk.h | 49 +-
include/linux/proc_fs.h | 23
include/linux/ptdump.h | 24 -
include/linux/seq_file.h | 13
include/linux/slab.h | 1
include/linux/string.h | 1
include/linux/sunrpc/stats.h | 4
ipc/mqueue.c | 123 ++++-
ipc/msg.c | 62 +-
ipc/sem.c | 66 +-
ipc/util.c | 14
kernel/configs.c | 9
kernel/irq/proc.c | 42 -
kernel/kallsyms.c | 12
kernel/latencytop.c | 14
kernel/locking/lockdep_proc.c | 15
kernel/module.c | 12
kernel/profile.c | 24 -
kernel/sched/psi.c | 48 +-
lib/bitmap.c | 195 ++++----
lib/string.c | 17
lib/test_bitmap.c | 105 ++++
mm/Kconfig.debug | 21
mm/Makefile | 1
mm/gup.c | 2
mm/hmm.c | 66 +-
mm/memory_hotplug.c | 104 +---
mm/memremap.c | 2
mm/migrate.c | 5
mm/mincore.c | 1
mm/mmu_gather.c | 158 ++++--
mm/page_alloc.c | 75 +--
mm/pagewalk.c | 167 +++++--
mm/ptdump.c | 159 ++++++
mm/slab_common.c | 37 -
mm/sparse.c | 10
mm/swapfile.c | 14
net/atm/mpoa_proc.c | 17
net/atm/proc.c | 8
net/core/dev.c | 2
net/core/filter.c | 2
net/core/pktgen.c | 44 -
net/ipv4/ipconfig.c | 10
net/ipv4/netfilter/ipt_CLUSTERIP.c | 16
net/ipv4/route.c | 24 -
net/netfilter/xt_recent.c | 17
net/sunrpc/auth_gss/svcauth_gss.c | 10
net/sunrpc/cache.c | 45 -
net/sunrpc/stats.c | 21
net/xfrm/xfrm_policy.c | 2
samples/kfifo/bytestream-example.c | 11
samples/kfifo/inttype-example.c | 11
samples/kfifo/record-example.c | 11
scripts/coccinelle/free/devm_free.cocci | 4
sound/core/info.c | 34 -
sound/soc/codecs/ak4104.c | 3
sound/soc/codecs/cs4270.c | 3
sound/soc/codecs/tlv320aic32x4.c | 6
sound/soc/sunxi/sun4i-spdif.c | 2
tools/include/linux/bitops.h | 9
214 files changed, 2589 insertions(+), 2227 deletions(-)
^ permalink raw reply [flat|nested] 7+ messages in thread
* + mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch added to -mm tree
2020-02-04 1:33 incoming Andrew Morton
@ 2020-02-24 3:29 ` Andrew Morton
0 siblings, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2020-02-24 3:29 UTC (permalink / raw)
To: longpeng2, mike.kravetz, mm-commits, sean.j.christopherson,
stable, willy
The patch titled
Subject: mm/hugetlb.c: fix a addressing exception caused by huge_pte_offset()
has been added to the -mm tree. Its filename is
mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Longpeng <longpeng2@huawei.com>
Subject: mm/hugetlb.c: fix a addressing exception caused by huge_pte_offset()
Our machine encountered a panic(addressing exception) after running for a
long time. The calltrace is:
RIP: 0010:[<ffffffff9dff0587>] [<ffffffff9dff0587>] hugetlb_fault+0x307/0xbe0
RSP: 0018:ffff9567fc27f808 EFLAGS: 00010286
RAX: e800c03ff1258d48 RBX: ffffd3bb003b69c0 RCX: e800c03ff1258d48
RDX: 17ff3fc00eda72b7 RSI: 00003ffffffff000 RDI: e800c03ff1258d48
RBP: ffff9567fc27f8c8 R08: e800c03ff1258d48 R09: 0000000000000080
R10: ffffaba0704c22a8 R11: 0000000000000001 R12: ffff95c87b4b60d8
R13: 00005fff00000000 R14: 0000000000000000 R15: ffff9567face8074
FS: 00007fe2d9ffb700(0000) GS:ffff956900e40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffd3bb003b69c0 CR3: 000000be67374000 CR4: 00000000003627e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffff9df9b71b>] ? unlock_page+0x2b/0x30
[<ffffffff9dff04a2>] ? hugetlb_fault+0x222/0xbe0
[<ffffffff9dff1405>] follow_hugetlb_page+0x175/0x540
[<ffffffff9e15b825>] ? cpumask_next_and+0x35/0x50
[<ffffffff9dfc7230>] __get_user_pages+0x2a0/0x7e0
[<ffffffff9dfc648d>] __get_user_pages_unlocked+0x15d/0x210
[<ffffffffc068cfc5>] __gfn_to_pfn_memslot+0x3c5/0x460 [kvm]
[<ffffffffc06b28be>] try_async_pf+0x6e/0x2a0 [kvm]
[<ffffffffc06b4b41>] tdp_page_fault+0x151/0x2d0 [kvm]
[<ffffffffc075731c>] ? vmx_vcpu_run+0x2ec/0xc80 [kvm_intel]
[<ffffffffc0757328>] ? vmx_vcpu_run+0x2f8/0xc80 [kvm_intel]
[<ffffffffc06abc11>] kvm_mmu_page_fault+0x31/0x140 [kvm]
[<ffffffffc074d1ae>] handle_ept_violation+0x9e/0x170 [kvm_intel]
[<ffffffffc075579c>] vmx_handle_exit+0x2bc/0xc70 [kvm_intel]
[<ffffffffc074f1a0>] ? __vmx_complete_interrupts.part.73+0x80/0xd0 [kvm_intel]
[<ffffffffc07574c0>] ? vmx_vcpu_run+0x490/0xc80 [kvm_intel]
[<ffffffffc069f3be>] vcpu_enter_guest+0x7be/0x13a0 [kvm]
[<ffffffffc06cf53e>] ? kvm_check_async_pf_completion+0x8e/0xb0 [kvm]
[<ffffffffc06a6f90>] kvm_arch_vcpu_ioctl_run+0x330/0x490 [kvm]
[<ffffffffc068d919>] kvm_vcpu_ioctl+0x309/0x6d0 [kvm]
[<ffffffff9deaa8c2>] ? dequeue_signal+0x32/0x180
[<ffffffff9deae34d>] ? do_sigtimedwait+0xcd/0x230
[<ffffffff9e03aed0>] do_vfs_ioctl+0x3f0/0x540
[<ffffffff9e03b0c1>] SyS_ioctl+0xa1/0xc0
[<ffffffff9e53879b>] system_call_fastpath+0x22/0x27
The kernel we used is older, but we think the latest kernel also has this
bug after digging into this problem.
For 1G hugepages, huge_pte_offset() wants to return NULL or pudp, but it
may return a wrong 'pmdp' if there is a race. Please look at the
following code snippet:
...
pud = pud_offset(p4d, addr);
if (sz != PUD_SIZE && pud_none(*pud))
return NULL;
/* hugepage or swap? */
if (pud_huge(*pud) || !pud_present(*pud))
return (pte_t *)pud;
pmd = pmd_offset(pud, addr);
if (sz != PMD_SIZE && pmd_none(*pmd))
return NULL;
/* hugepage or swap? */
if (pmd_huge(*pmd) || !pmd_present(*pmd))
return (pte_t *)pmd;
...
The following sequence would trigger this bug:
1. CPU0: sz = PUD_SIZE and *pud = 0 , continue
1. CPU0: "pud_huge(*pud)" is false
2. CPU1: calling hugetlb_no_page and set *pud to xxxx8e7(PRESENT)
3. CPU0: "!pud_present(*pud)" is false, continue
4. CPU0: pmd = pmd_offset(pud, addr) and maybe return a wrong pmdp
However, we want CPU0 to return NULL or pudp.
We can avoid this race by reading the pud only once. What's more, we also
use READ_ONCE to access the entries for safety (i.e. avoid the compilier
mischief)
Link: http://lkml.kernel.org/r/1582342427-230392-1-git-send-email-longpeng2@huawei.com
Signed-off-by: Longpeng <longpeng2@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/hugetlb.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset
+++ a/mm/hugetlb.c
@@ -4910,28 +4910,30 @@ pte_t *huge_pte_offset(struct mm_struct
{
pgd_t *pgd;
p4d_t *p4d;
- pud_t *pud;
- pmd_t *pmd;
+ pud_t *pud, pud_entry;
+ pmd_t *pmd, pmd_entry;
pgd = pgd_offset(mm, addr);
- if (!pgd_present(*pgd))
+ if (!pgd_present(READ_ONCE(*pgd)))
return NULL;
p4d = p4d_offset(pgd, addr);
- if (!p4d_present(*p4d))
+ if (!p4d_present(READ_ONCE(*p4d)))
return NULL;
pud = pud_offset(p4d, addr);
- if (sz != PUD_SIZE && pud_none(*pud))
+ pud_entry = READ_ONCE(*pud);
+ if (sz != PUD_SIZE && pud_none(pud_entry))
return NULL;
/* hugepage or swap? */
- if (pud_huge(*pud) || !pud_present(*pud))
+ if (pud_huge(pud_entry) || !pud_present(pud_entry))
return (pte_t *)pud;
pmd = pmd_offset(pud, addr);
- if (sz != PMD_SIZE && pmd_none(*pmd))
+ pmd_entry = READ_ONCE(*pmd);
+ if (sz != PMD_SIZE && pmd_none(pmd_entry))
return NULL;
/* hugepage or swap? */
- if (pmd_huge(*pmd) || !pmd_present(*pmd))
+ if (pmd_huge(pmd_entry) || !pmd_present(pmd_entry))
return (pte_t *)pmd;
return NULL;
_
Patches currently in -mm which might be from longpeng2@huawei.com are
mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-04-13 20:51 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-28 22:10 + mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch added to -mm tree akpm
2020-03-31 3:35 ` Mike Kravetz
2020-03-31 4:44 ` Sean Christopherson
2020-03-31 14:08 ` Jason Gunthorpe
2020-03-31 21:58 ` Mike Kravetz
-- strict thread matches above, loose matches on Subject: below --
2020-04-12 7:41 incoming Andrew Morton
2020-04-13 20:51 ` + mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch added to -mm tree Andrew Morton
2020-02-04 1:33 incoming Andrew Morton
2020-02-24 3:29 ` + mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch added to -mm tree Andrew Morton
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.