All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Sasha Levin <sasha.levin@oracle.com>
Cc: Hugh Dickins <hughd@google.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Jones <davej@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Cyrill Gorcunov <gorcunov@gmail.com>
Subject: Re: mm: BUG in unmap_page_range
Date: Wed, 27 Aug 2014 16:26:22 +0100	[thread overview]
Message-ID: <20140827152622.GC12424@suse.de> (raw)
In-Reply-To: <53FD4D9F.6050500@oracle.com>

On Tue, Aug 26, 2014 at 11:16:47PM -0400, Sasha Levin wrote:
> On 08/11/2014 11:28 PM, Sasha Levin wrote:
> > On 08/05/2014 09:04 PM, Sasha Levin wrote:
> >> > Thanks Hugh, Mel. I've added both patches to my local tree and will update tomorrow
> >> > with the weather.
> >> > 
> >> > Also:
> >> > 
> >> > On 08/05/2014 08:42 PM, Hugh Dickins wrote:
> >>> >> One thing I did wonder, though: at first I was reassured by the
> >>> >> VM_BUG_ON(!pte_present(pte)) you add to pte_mknuma(); but then thought
> >>> >> it would be better as VM_BUG_ON(!(val & _PAGE_PRESENT)), being stronger
> >>> >> - asserting that indeed we do not put NUMA hints on PROT_NONE areas.
> >>> >> (But I have not tested, perhaps such a VM_BUG_ON would actually fire.)
> >> > 
> >> > I've added VM_BUG_ON(!(val & _PAGE_PRESENT)) in just as a curiosity, I'll
> >> > update how that one looks as well.
> > Sorry for the rather long delay.
> > 
> > The patch looks fine, the issue didn't reproduce.
> > 
> > The added VM_BUG_ON didn't trigger either, so maybe we should consider adding
> > it in.
> 
> It took a while, but I've managed to hit that VM_BUG_ON:
> 
> [  707.975456] kernel BUG at include/asm-generic/pgtable.h:724!
> [  707.977147] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [  707.978974] Dumping ftrace buffer:
> [  707.980110]    (ftrace buffer empty)
> [  707.981221] Modules linked in:
> [  707.982312] CPU: 18 PID: 9488 Comm: trinity-c538 Not tainted 3.17.0-rc2-next-20140826-sasha-00031-gc48c9ac-dirty #1079
> [  707.982801] task: ffff880165e28000 ti: ffff880165e30000 task.ti: ffff880165e30000
> [  707.982801] RIP: 0010:[<ffffffffb42e3dda>]  [<ffffffffb42e3dda>] change_protection_range+0x94a/0x970
> [  707.982801] RSP: 0018:ffff880165e33d98  EFLAGS: 00010246
> [  707.982801] RAX: 000000009d340902 RBX: ffff880511204a08 RCX: 0000000000000100
> [  707.982801] RDX: 000000009d340902 RSI: 0000000041741000 RDI: 000000009d340902
> [  707.982801] RBP: ffff880165e33e88 R08: ffff880708a23c00 R09: 0000000000b52000
> [  707.982801] R10: 0000000000001e01 R11: 0000000000000008 R12: 0000000041751000
> [  707.982801] R13: 00000000000000f7 R14: 000000009d340902 R15: 0000000041741000
> [  707.982801] FS:  00007f358a9aa700(0000) GS:ffff88071c600000(0000) knlGS:0000000000000000
> [  707.982801] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  707.982801] CR2: 00007f3586b69490 CR3: 0000000165d88000 CR4: 00000000000006a0
> [  707.982801] Stack:
> [  707.982801]  ffff8804db88d058 0000000000000000 ffff88070fb17cf0 0000000000000000
> [  707.982801]  ffff880165d88000 0000000000000000 ffff8801686a5000 000000004163e000
> [  707.982801]  ffff8801686a5000 0000000000000001 0000000000000025 0000000041750fff
> [  707.982801] Call Trace:
> [  707.982801]  [<ffffffffb42e3e14>] change_protection+0x14/0x30
> [  707.982801]  [<ffffffffb42fda3b>] change_prot_numa+0x1b/0x40
> [  707.982801]  [<ffffffffb41ad766>] task_numa_work+0x1f6/0x330
> [  707.982801]  [<ffffffffb41937c4>] task_work_run+0xc4/0xf0
> [  707.982801]  [<ffffffffb40712e7>] do_notify_resume+0x97/0xb0
> [  707.982801]  [<ffffffffb74fd6ea>] int_signal+0x12/0x17
> [  707.982801] Code: e8 2c 84 21 03 e9 72 ff ff ff 0f 1f 80 00 00 00 00 0f 0b 48 8b 7d a8 4c 89 f2 4c 89 fe e8 9f 7b 03 00 e9 47 f9 ff ff 0f 0b 0f 0b <0f> 0b 0f 0b 48 8b b5 70 ff ff ff 4c 89 ea 48 89 c7 e8 10 d5 01
> [  707.982801] RIP  [<ffffffffb42e3dda>] change_protection_range+0x94a/0x970
> [  707.982801]  RSP <ffff880165e33d98>
> 

The tests to reach here are

pte_present     any of  _PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_NUMA
pte_numa        only    _PAGE_NUMA out of _PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_NUMA
VM_BUG_ON	not set _PAGE_PRESENT

To trigger the bug the PTE bits must then be _PAGE_PROTNONE | _PAGE_NUMA. The
NUMA PTE scanner is skipping PROT_NONE VMAs so it should be "impossible"
for it to be set there. The mmap_sem is held for read during scans so
the protections should not be altering underneath us and the PTL is held
against parallel faults.

That leaves setting prot_none leaveing _PAGE_NUMA behind. Potentially
that's an issue due to

/* Set of bits not changed in pte_modify */
#define _PAGE_CHG_MASK  (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \
                         _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY | \
                         _PAGE_SOFT_DIRTY | _PAGE_NUMA)

The _PAGE_NUMA bit is not cleared as removing it potentially leaves the
PTE in an unexpected state due to a "present" PTE marked for NUMA hinting
fault becoming non-present. Instead there is this check in change_pte_range()
to move PTEs to a known state before changing protections

                                if (pte_numa(ptent))
                                        ptent = pte_mknonnuma(ptent);
                                ptent = pte_modify(ptent, newprot);

So right now, I'm not seeing what path gets us to this inconsistent
state. Sasha, how long does it typically take to trigger this? Are you
using any particular switches for trinity that would trigger the bug
faster?

This untested patch might help pinpoint the source of the corruption
early though it's x86 only

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 281870f..ffea570 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -723,6 +723,9 @@ static inline pte_t pte_mknuma(pte_t pte)
 
 	VM_BUG_ON(!(val & _PAGE_PRESENT));
 
+	/* debugging only, specific to x86 */
+	VM_BUG_ON(val & _PAGE_PROTNONE);
+
 	val &= ~_PAGE_PRESENT;
 	val |= _PAGE_NUMA;
 

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: Sasha Levin <sasha.levin@oracle.com>
Cc: Hugh Dickins <hughd@google.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Jones <davej@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Cyrill Gorcunov <gorcunov@gmail.com>
Subject: Re: mm: BUG in unmap_page_range
Date: Wed, 27 Aug 2014 16:26:22 +0100	[thread overview]
Message-ID: <20140827152622.GC12424@suse.de> (raw)
In-Reply-To: <53FD4D9F.6050500@oracle.com>

On Tue, Aug 26, 2014 at 11:16:47PM -0400, Sasha Levin wrote:
> On 08/11/2014 11:28 PM, Sasha Levin wrote:
> > On 08/05/2014 09:04 PM, Sasha Levin wrote:
> >> > Thanks Hugh, Mel. I've added both patches to my local tree and will update tomorrow
> >> > with the weather.
> >> > 
> >> > Also:
> >> > 
> >> > On 08/05/2014 08:42 PM, Hugh Dickins wrote:
> >>> >> One thing I did wonder, though: at first I was reassured by the
> >>> >> VM_BUG_ON(!pte_present(pte)) you add to pte_mknuma(); but then thought
> >>> >> it would be better as VM_BUG_ON(!(val & _PAGE_PRESENT)), being stronger
> >>> >> - asserting that indeed we do not put NUMA hints on PROT_NONE areas.
> >>> >> (But I have not tested, perhaps such a VM_BUG_ON would actually fire.)
> >> > 
> >> > I've added VM_BUG_ON(!(val & _PAGE_PRESENT)) in just as a curiosity, I'll
> >> > update how that one looks as well.
> > Sorry for the rather long delay.
> > 
> > The patch looks fine, the issue didn't reproduce.
> > 
> > The added VM_BUG_ON didn't trigger either, so maybe we should consider adding
> > it in.
> 
> It took a while, but I've managed to hit that VM_BUG_ON:
> 
> [  707.975456] kernel BUG at include/asm-generic/pgtable.h:724!
> [  707.977147] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [  707.978974] Dumping ftrace buffer:
> [  707.980110]    (ftrace buffer empty)
> [  707.981221] Modules linked in:
> [  707.982312] CPU: 18 PID: 9488 Comm: trinity-c538 Not tainted 3.17.0-rc2-next-20140826-sasha-00031-gc48c9ac-dirty #1079
> [  707.982801] task: ffff880165e28000 ti: ffff880165e30000 task.ti: ffff880165e30000
> [  707.982801] RIP: 0010:[<ffffffffb42e3dda>]  [<ffffffffb42e3dda>] change_protection_range+0x94a/0x970
> [  707.982801] RSP: 0018:ffff880165e33d98  EFLAGS: 00010246
> [  707.982801] RAX: 000000009d340902 RBX: ffff880511204a08 RCX: 0000000000000100
> [  707.982801] RDX: 000000009d340902 RSI: 0000000041741000 RDI: 000000009d340902
> [  707.982801] RBP: ffff880165e33e88 R08: ffff880708a23c00 R09: 0000000000b52000
> [  707.982801] R10: 0000000000001e01 R11: 0000000000000008 R12: 0000000041751000
> [  707.982801] R13: 00000000000000f7 R14: 000000009d340902 R15: 0000000041741000
> [  707.982801] FS:  00007f358a9aa700(0000) GS:ffff88071c600000(0000) knlGS:0000000000000000
> [  707.982801] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  707.982801] CR2: 00007f3586b69490 CR3: 0000000165d88000 CR4: 00000000000006a0
> [  707.982801] Stack:
> [  707.982801]  ffff8804db88d058 0000000000000000 ffff88070fb17cf0 0000000000000000
> [  707.982801]  ffff880165d88000 0000000000000000 ffff8801686a5000 000000004163e000
> [  707.982801]  ffff8801686a5000 0000000000000001 0000000000000025 0000000041750fff
> [  707.982801] Call Trace:
> [  707.982801]  [<ffffffffb42e3e14>] change_protection+0x14/0x30
> [  707.982801]  [<ffffffffb42fda3b>] change_prot_numa+0x1b/0x40
> [  707.982801]  [<ffffffffb41ad766>] task_numa_work+0x1f6/0x330
> [  707.982801]  [<ffffffffb41937c4>] task_work_run+0xc4/0xf0
> [  707.982801]  [<ffffffffb40712e7>] do_notify_resume+0x97/0xb0
> [  707.982801]  [<ffffffffb74fd6ea>] int_signal+0x12/0x17
> [  707.982801] Code: e8 2c 84 21 03 e9 72 ff ff ff 0f 1f 80 00 00 00 00 0f 0b 48 8b 7d a8 4c 89 f2 4c 89 fe e8 9f 7b 03 00 e9 47 f9 ff ff 0f 0b 0f 0b <0f> 0b 0f 0b 48 8b b5 70 ff ff ff 4c 89 ea 48 89 c7 e8 10 d5 01
> [  707.982801] RIP  [<ffffffffb42e3dda>] change_protection_range+0x94a/0x970
> [  707.982801]  RSP <ffff880165e33d98>
> 

The tests to reach here are

pte_present     any of  _PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_NUMA
pte_numa        only    _PAGE_NUMA out of _PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_NUMA
VM_BUG_ON	not set _PAGE_PRESENT

To trigger the bug the PTE bits must then be _PAGE_PROTNONE | _PAGE_NUMA. The
NUMA PTE scanner is skipping PROT_NONE VMAs so it should be "impossible"
for it to be set there. The mmap_sem is held for read during scans so
the protections should not be altering underneath us and the PTL is held
against parallel faults.

That leaves setting prot_none leaveing _PAGE_NUMA behind. Potentially
that's an issue due to

/* Set of bits not changed in pte_modify */
#define _PAGE_CHG_MASK  (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \
                         _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY | \
                         _PAGE_SOFT_DIRTY | _PAGE_NUMA)

The _PAGE_NUMA bit is not cleared as removing it potentially leaves the
PTE in an unexpected state due to a "present" PTE marked for NUMA hinting
fault becoming non-present. Instead there is this check in change_pte_range()
to move PTEs to a known state before changing protections

                                if (pte_numa(ptent))
                                        ptent = pte_mknonnuma(ptent);
                                ptent = pte_modify(ptent, newprot);

So right now, I'm not seeing what path gets us to this inconsistent
state. Sasha, how long does it typically take to trigger this? Are you
using any particular switches for trinity that would trigger the bug
faster?

This untested patch might help pinpoint the source of the corruption
early though it's x86 only

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 281870f..ffea570 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -723,6 +723,9 @@ static inline pte_t pte_mknuma(pte_t pte)
 
 	VM_BUG_ON(!(val & _PAGE_PRESENT));
 
+	/* debugging only, specific to x86 */
+	VM_BUG_ON(val & _PAGE_PROTNONE);
+
 	val &= ~_PAGE_PRESENT;
 	val |= _PAGE_NUMA;
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-08-27 15:26 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-02 21:58 mm: BUG in unmap_page_range Sasha Levin
2014-08-02 21:58 ` Sasha Levin
2014-08-04 11:40 ` Hugh Dickins
2014-08-04 11:40   ` Hugh Dickins
2014-08-05 14:44   ` Mel Gorman
2014-08-05 14:44     ` Mel Gorman
2014-08-06  0:42     ` Hugh Dickins
2014-08-06  0:42       ` Hugh Dickins
2014-08-06  1:04       ` Sasha Levin
2014-08-06  1:04         ` Sasha Levin
2014-08-12  3:28         ` Sasha Levin
2014-08-12  3:28           ` Sasha Levin
2014-08-12 10:47           ` [PATCH] x86,mm: fix pte_special versus pte_numa Mel Gorman
2014-08-12 10:47             ` Mel Gorman
2014-08-12 11:08             ` [PATCH] mm: Remove misleading ARCH_USES_NUMA_PROT_NONE Mel Gorman
2014-08-12 11:08               ` Mel Gorman
2014-08-13 13:14               ` Aneesh Kumar K.V
2014-08-13 13:14                 ` Aneesh Kumar K.V
2014-08-27  3:16           ` mm: BUG in unmap_page_range Sasha Levin
2014-08-27  3:16             ` Sasha Levin
2014-08-27 15:26             ` Mel Gorman [this message]
2014-08-27 15:26               ` Mel Gorman
2014-08-27 18:21               ` Sasha Levin
2014-08-27 18:21                 ` Sasha Levin
2014-08-30  1:23               ` Sasha Levin
2014-08-30  1:23                 ` Sasha Levin
2014-09-04  9:04                 ` Sasha Levin
2014-09-04  9:04                   ` Sasha Levin
2014-09-08 17:18                   ` Mel Gorman
2014-09-08 17:18                     ` Mel Gorman
2014-09-08 17:23                     ` Sasha Levin
2014-09-08 17:56                     ` Sasha Levin
2014-09-08 17:56                       ` Sasha Levin
2014-09-09 21:33                       ` Mel Gorman
2014-09-09 21:33                         ` Mel Gorman
2014-09-09 22:20                         ` Sasha Levin
2014-09-09 22:20                           ` Sasha Levin
2014-09-10  2:45                           ` Hugh Dickins
2014-09-10  2:45                             ` Hugh Dickins
2014-09-10 12:47                             ` Mel Gorman
2014-09-10 12:47                               ` Mel Gorman
2014-09-10 14:24                               ` Trinity and mbind flags (WAS: Re: mm: BUG in unmap_page_range) Sasha Levin
2014-09-10 14:24                                 ` Sasha Levin
2014-09-10 14:33                                 ` Dave Jones
2014-09-10 14:33                                   ` Dave Jones
2014-09-10 19:06                               ` mm: BUG in unmap_page_range Sasha Levin
2014-09-10 19:06                                 ` Sasha Levin
2014-09-10 19:36                               ` Hugh Dickins
2014-09-10 19:36                                 ` Hugh Dickins
2014-09-11  2:43                                 ` Sasha Levin
2014-09-11  2:43                                   ` Sasha Levin
2014-09-11 11:39                                   ` Hugh Dickins
2014-09-11 11:39                                     ` Hugh Dickins
2014-09-11 14:22                                     ` Sasha Levin
2014-09-11 14:22                                       ` Sasha Levin
2014-09-11 14:33                                       ` Dave Jones
2014-09-11 14:33                                         ` Dave Jones
2014-09-11 16:28                                     ` Mel Gorman
2014-09-11 16:28                                       ` Mel Gorman
2014-09-11 22:38                                       ` Sasha Levin
2014-09-11 22:38                                         ` Sasha Levin
2014-09-17 21:37                                         ` Sasha Levin
2014-09-17 21:37                                           ` Sasha Levin
2014-09-10 13:12                             ` Sasha Levin
2014-09-10 13:12                               ` Sasha Levin
2014-09-10 13:40                               ` Mel Gorman
2014-09-10 13:40                                 ` Mel Gorman
2014-09-10 16:44                                 ` Sasha Levin
2014-09-10 16:44                                   ` Sasha Levin
2014-09-10 19:09                               ` Hugh Dickins
2014-09-10 19:09                                 ` Hugh Dickins
2014-09-10 20:36                                 ` Sasha Levin
2014-09-10 20:36                                   ` Sasha Levin
2014-09-10 23:00                                   ` Hugh Dickins
2014-09-10 23:00                                     ` Hugh Dickins
2014-08-06 10:35       ` Mel Gorman
2014-08-06 10:35         ` Mel Gorman
2014-08-06  7:14     ` Aneesh Kumar K.V
2014-08-06  7:14       ` Aneesh Kumar K.V
2014-08-06  7:14       ` Aneesh Kumar K.V
2014-08-06 10:23       ` Mel Gorman
2014-08-06 10:23         ` Mel Gorman
2014-08-06 10:23         ` Mel Gorman
2014-08-07  8:40         ` Aneesh Kumar K.V
2014-08-07  8:40           ` Aneesh Kumar K.V
2014-08-07  8:40           ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140827152622.GC12424@suse.de \
    --to=mgorman@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=davej@redhat.com \
    --cc=gorcunov@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=sasha.levin@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.