All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: kernel BUG at mm/swap.c:134! - page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
       [not found] <20150418205656.GA7972@pd.tnic>
@ 2015-04-18 21:27 ` Linus Torvalds
  2015-04-18 21:56   ` Kirill A. Shutemov
  0 siblings, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2015-04-18 21:27 UTC (permalink / raw)
  To: Borislav Petkov, Naoya Horiguchi, Michal Hocko
  Cc: Andrew Morton, x86-ml, linux-mm

On Sat, Apr 18, 2015 at 4:56 PM, Borislav Petkov <bp@alien8.de> wrote:
>
> so I'm running some intermediate state of linus/master + tip/master from
> Thursday and probably I shouldn't be even taking such splat seriously
> and wait until 4.1-rc1 has been done but let me report it just in case
> so that it is out there, in case someone else sees it too.
>
> I don't have a reproducer yet except the fact that it happened twice
> already, the second time while watching the new Star Wars teaser on
> youtube (current->comm is "AudioThread" probably from chrome, as shown
> in the splat below).

Hmm. The only recent commit in this area seems to be 822fc61367f0
("mm: don't call __page_cache_release for hugetlb") although I don't
see why it would cause anything like that. But it changes code that
has been stable for many years, which makes me wonder how valid it is
(__put_compound_page() has been unchanged since 2011, and now suddenly
it grew that "!PageHuge()" test).

So quite frankly, I'd almost suggest changing that

        if (!PageHuge(page))
                __page_cache_release(page);

back to the old unconditional __page_cache_release(page), and maybe add a single

        WARN_ON_ONCE(PageHuge(page));

just to see if that condition actually happens. The new comment says
it shouldn't happen and that the change shouldn't matter, but...

Of course, your recent BUG_ON may well be entirely unrelated to this
change in mm/swap.c, but it *is* in kind of the same area, and the
timing would match too...

             Linus

---
[115258.861335] page:ffffea0010a15040 count:0 mapcount:1 mapping:
    (null) index:0x0
[115258.869511] flags: 0x8000000000008014(referenced|dirty|tail)
[115258.874159] page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
[115258.874179] kernel BUG at mm/swap.c:134!
[115258.874262] RIP: put_compound_page+0x3b9/0x480

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at mm/swap.c:134! - page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
  2015-04-18 21:27 ` kernel BUG at mm/swap.c:134! - page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0) Linus Torvalds
@ 2015-04-18 21:56   ` Kirill A. Shutemov
  2015-04-18 21:59     ` Linus Torvalds
  0 siblings, 1 reply; 17+ messages in thread
From: Kirill A. Shutemov @ 2015-04-18 21:56 UTC (permalink / raw)
  To: Linus Torvalds, Borislav Petkov
  Cc: Naoya Horiguchi, Michal Hocko, Andrew Morton, x86-ml, linux-mm,
	Andrea Arcangeli, Hugh Dickins

On Sat, Apr 18, 2015 at 05:27:49PM -0400, Linus Torvalds wrote:
> On Sat, Apr 18, 2015 at 4:56 PM, Borislav Petkov <bp@alien8.de> wrote:
> >
> > so I'm running some intermediate state of linus/master + tip/master from
> > Thursday and probably I shouldn't be even taking such splat seriously
> > and wait until 4.1-rc1 has been done but let me report it just in case
> > so that it is out there, in case someone else sees it too.
> >
> > I don't have a reproducer yet except the fact that it happened twice
> > already, the second time while watching the new Star Wars teaser on
> > youtube (current->comm is "AudioThread" probably from chrome, as shown
> > in the splat below).

I would guess it's related to sound: the most common source of PTE-mapeed
compund pages into userspace.

> Hmm. The only recent commit in this area seems to be 822fc61367f0
> ("mm: don't call __page_cache_release for hugetlb") although I don't
> see why it would cause anything like that. But it changes code that
> has been stable for many years, which makes me wonder how valid it is
> (__put_compound_page() has been unchanged since 2011, and now suddenly
> it grew that "!PageHuge()" test).
> 
> So quite frankly, I'd almost suggest changing that
> 
>         if (!PageHuge(page))
>                 __page_cache_release(page);
> 
> back to the old unconditional __page_cache_release(page), and maybe add a single
> 
>         WARN_ON_ONCE(PageHuge(page));
> 
> just to see if that condition actually happens. The new comment says
> it shouldn't happen and that the change shouldn't matter, but...
> 
> Of course, your recent BUG_ON may well be entirely unrelated to this
> change in mm/swap.c, but it *is* in kind of the same area, and the
> timing would match too...

Andrea has already seen the bug and pointed to 8d63d99a5dfb as possible
cause. I don't see why the commit could broke anything, but it worth
trying to revert and test.

Borislav, could you try?


> ---
> [115258.861335] page:ffffea0010a15040 count:0 mapcount:1 mapping:
>     (null) index:0x0
> [115258.869511] flags: 0x8000000000008014(referenced|dirty|tail)
> [115258.874159] page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
> [115258.874179] kernel BUG at mm/swap.c:134!
> [115258.874262] RIP: put_compound_page+0x3b9/0x480
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at mm/swap.c:134! - page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
  2015-04-18 21:56   ` Kirill A. Shutemov
@ 2015-04-18 21:59     ` Linus Torvalds
  2015-04-18 22:08       ` Borislav Petkov
  2015-04-18 22:12       ` Linus Torvalds
  0 siblings, 2 replies; 17+ messages in thread
From: Linus Torvalds @ 2015-04-18 21:59 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Borislav Petkov, Naoya Horiguchi, Michal Hocko, Andrew Morton,
	x86-ml, linux-mm, Andrea Arcangeli, Hugh Dickins

On Sat, Apr 18, 2015 at 5:56 PM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
>
> Andrea has already seen the bug and pointed to 8d63d99a5dfb as possible
> cause. I don't see why the commit could broke anything, but it worth
> trying to revert and test.

Ahh, yes, that does look like a more likely culprit.

         Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at mm/swap.c:134! - page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
  2015-04-18 21:59     ` Linus Torvalds
@ 2015-04-18 22:08       ` Borislav Petkov
  2015-04-18 22:16           ` Borislav Petkov
  2015-04-22 13:12         ` Borislav Petkov
  2015-04-18 22:12       ` Linus Torvalds
  1 sibling, 2 replies; 17+ messages in thread
From: Borislav Petkov @ 2015-04-18 22:08 UTC (permalink / raw)
  To: Linus Torvalds, Kirill A. Shutemov
  Cc: Naoya Horiguchi, Michal Hocko, Andrew Morton, x86-ml, linux-mm,
	Andrea Arcangeli, Hugh Dickins

On Sat, Apr 18, 2015 at 05:59:53PM -0400, Linus Torvalds wrote:
> On Sat, Apr 18, 2015 at 5:56 PM, Kirill A. Shutemov
> <kirill@shutemov.name> wrote:
> >
> > Andrea has already seen the bug and pointed to 8d63d99a5dfb as possible
> > cause. I don't see why the commit could broke anything, but it worth
> > trying to revert and test.
> 
> Ahh, yes, that does look like a more likely culprit.

Reverted and building... will report in the next days.

Thanks guys.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at mm/swap.c:134! - page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
  2015-04-18 21:59     ` Linus Torvalds
  2015-04-18 22:08       ` Borislav Petkov
@ 2015-04-18 22:12       ` Linus Torvalds
  2015-04-20  0:02         ` Naoya Horiguchi
  1 sibling, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2015-04-18 22:12 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Borislav Petkov, Naoya Horiguchi, Michal Hocko, Andrew Morton,
	x86-ml, linux-mm, Andrea Arcangeli, Hugh Dickins

On Sat, Apr 18, 2015 at 5:59 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sat, Apr 18, 2015 at 5:56 PM, Kirill A. Shutemov
> <kirill@shutemov.name> wrote:
>>
>> Andrea has already seen the bug and pointed to 8d63d99a5dfb as possible
>> cause. I don't see why the commit could broke anything, but it worth
>> trying to revert and test.
>
> Ahh, yes, that does look like a more likely culprit.

That said, I do think we should likely also do that

        WARN_ON_ONCE(PageHuge(page));

in __put_compound_page() rather than just silently saying "no refcount
changes for this magical case that shouldn't even happen".  If it
shouldn't happen, then we should warn about it, not try to ":handle"
some case that shouldn't happen and shouldn't matter.

Let's not play games in this area. This code has been stable for many
years, why are we suddenly doing random things here? There's something
to be said for "if it ain't broke..", and there's *definitely* a lot
to be said for "let's not complicate this even more".

             Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at mm/swap.c:134! - page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
  2015-04-18 22:08       ` Borislav Petkov
@ 2015-04-18 22:16           ` Borislav Petkov
  2015-04-22 13:12         ` Borislav Petkov
  1 sibling, 0 replies; 17+ messages in thread
From: Borislav Petkov @ 2015-04-18 22:16 UTC (permalink / raw)
  To: Linus Torvalds, Kirill A. Shutemov
  Cc: Naoya Horiguchi, Michal Hocko, Andrew Morton, x86-ml, linux-mm,
	Andrea Arcangeli, Hugh Dickins, lkml

Forgot to CC lkml for archiving purposes, here's the whole thread in
one:

---
Hi guys,

so I'm running some intermediate state of linus/master + tip/master from
Thursday and probably I shouldn't be even taking such splat seriously
and wait until 4.1-rc1 has been done but let me report it just in case
so that it is out there, in case someone else sees it too.

I don't have a reproducer yet except the fact that it happened twice
already, the second time while watching the new Star Wars teaser on
youtube (current->comm is "AudioThread" probably from chrome, as shown
in the splat below).

And srsly, to VM_BUG_ON_PAGE() while I'm watching the new Star Wars
teaser - you must be kidding me people!

Anyway, just FYI, someone might have an idea...

So here's the state of what I was running:

---
commit 3963e69e59fa4e36ac164e8cd520811135d868d3
Merge: 34c9a0ffc75a 11664e41b11e
Author: Borislav Petkov <bp@suse.de>
Date:   Thu Apr 16 13:39:44 2015 +0200

    Merge remote-tracking branch 'tip/master' into rc0+

commit 11664e41b11ed447f598424dd83ecf65400be5a1 (refs/remotes/tip/master)
Merge: 61a7fd4deb61 2df8406a439b
Author: Ingo Molnar <mingo@kernel.org>
Date:   Thu Apr 16 09:20:52 2015 +0200

    Merge branch 'sched/urgent'

commit eea3a00264cf243a28e4331566ce67b86059339d
Merge: e7c82412433a e693d73c20ff
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Apr 15 16:39:15 2015 -0700

    Merge branch 'akpm' (patches from Andrew)

    Merge second patchbomb from Andrew Morton:

---

and here's the splat:

---

[115258.861335] page:ffffea0010a15040 count:0 mapcount:1 mapping:          (null) index:0x0
[115258.869511] flags: 0x8000000000008014(referenced|dirty|tail)
[115258.874159] page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
[115258.874177] ------------[ cut here ]------------
[115258.874179] kernel BUG at mm/swap.c:134!
[115258.874182] invalid opcode: 0000 [#1] 
[115258.874183] PREEMPT 
[115258.874184] SMP 

[115258.874187] Modules linked in:
[115258.874189]  nls_iso8859_15
[115258.874190]  nls_cp437
[115258.874192]  ipt_MASQUERADE
[115258.874193]  nf_nat_masquerade_ipv4
[115258.874194]  iptable_mangle
[115258.874195]  iptable_nat
[115258.874196]  nf_conntrack_ipv4
[115258.874198]  nf_defrag_ipv4
[115258.874199]  nf_nat_ipv4
[115258.874200]  nf_nat
[115258.874201]  nf_conntrack
[115258.874202]  iptable_filter
[115258.874204]  ip_tables
[115258.874205]  x_tables
[115258.874206]  tun
[115258.874207]  sha256_ssse3
[115258.874209]  sha256_generic
[115258.874211]  binfmt_misc
[115258.874212]  ipv6
[115258.874213]  vfat
[115258.874214]  fat
[115258.874215]  fuse
[115258.874216]  dm_crypt
[115258.874217]  dm_mod
[115258.874219]  kvm_amd
[115258.874243]  kvm crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd amd64_edac_mod edac_core fam15h_power k10temp amdkfd amd_iommu_v2 radeon drm_kms_helper ttm cfbfillrect cfbimgblt cfbcopyarea acpi_cpufreq
[115258.874248] CPU: 0 PID: 2904 Comm: AudioThread Not tainted 4.0.0+ #1
[115258.874250] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97 EVO R2.0, BIOS 1503 01/16/2013
[115258.874252] task: ffff8803e8278000 ti: ffff8803f8a04000 task.ti: ffff8803f8a04000
[115258.874262] RIP: 0010:[<ffffffff8113fcb9>]  [<ffffffff8113fcb9>] put_compound_page+0x3b9/0x480
[115258.874264] RSP: 0018:ffff8803f8a07b98  EFLAGS: 00010246
[115258.874266] RAX: 000000000000003d RBX: ffffea0010a15040 RCX: 0000000000000000
[115258.874268] RDX: ffffffff8109f016 RSI: ffffffff810bb33f RDI: ffffffff810bae60
[115258.874270] RBP: ffff8803f8a07bb8 R08: 0000000000000001 R09: 0000000000000001
[115258.874271] R10: 0000000000000001 R11: 0000000000000001 R12: ffffea0010a15000
[115258.874273] R13: ffff8803f8a07e28 R14: ffffea0010a15040 R15: 0000000000000000
[115258.874276] FS:  00007f206f2af700(0000) GS:ffff88042c600000(0000) knlGS:0000000000000000
[115258.874278] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[115258.874280] CR2: 00007f2095443310 CR3: 000000041e866000 CR4: 00000000000406f0
[115258.874281] Stack:
[115258.874287]  ffff8803f8a07e60 ffffea0010a151c0 ffff8803f8a07e28 ffffea0010a15040
[115258.874292]  ffff8803f8a07c28 ffffffff8113ffd0 0000000100000000 00000000ffffffff
[115258.874296]  ffff8803f8a07de8 ffff8803f8a07e60 ffff8803f8a07be8 ffff8803f8a07be8
[115258.874298] Call Trace:
[115258.874304]  [<ffffffff8113ffd0>] release_pages+0x250/0x270
[115258.874311]  [<ffffffff811736c5>] free_pages_and_swap_cache+0x95/0xb0
[115258.874317]  [<ffffffff8115ddc0>] tlb_flush_mmu_free+0x40/0x60
[115258.874323]  [<ffffffff8115fcac>] unmap_single_vma+0x69c/0x730
[115258.874331]  [<ffffffff81160594>] unmap_vmas+0x54/0xb0
[115258.874335]  [<ffffffff81165a38>] unmap_region+0xa8/0x110
[115258.874342]  [<ffffffff811679ea>] do_munmap+0x1ea/0x3f0
[115258.874346]  [<ffffffff81167c33>] ? vm_munmap+0x43/0x80
[115258.874350]  [<ffffffff81167c41>] vm_munmap+0x51/0x80
[115258.874354]  [<ffffffff81168bee>] SyS_munmap+0xe/0x20
[115258.874359]  [<ffffffff816918db>] system_call_fastpath+0x16/0x73
[115258.874424] Code: 81 48 89 df e8 29 c9 01 00 0f 0b 48 c7 c6 00 81 8a 81 4c 89 e7 e8 18 c9 01 00 0f 0b 48 c7 c6 d8 96 8b 81 48 89 df e8 07 c9 01 00 <0f> 0b 48 c7 c6 30 97 8b 81 48 89 df e8 f6 c8 01 00 0f 0b 48 c7 
[115258.874428] RIP  [<ffffffff8113fcb9>] put_compound_page+0x3b9/0x480
[115258.874429]  RSP <ffff8803f8a07b98>
[115258.898487] ---[ end trace 6ec080e8a6ee9fb1 ]---

Thanks!

-- 
On Sat, Apr 18, 2015 at 4:56 PM, Borislav Petkov <bp@alien8.de> wrote:
>
> so I'm running some intermediate state of linus/master + tip/master from
> Thursday and probably I shouldn't be even taking such splat seriously
> and wait until 4.1-rc1 has been done but let me report it just in case
> so that it is out there, in case someone else sees it too.
>
> I don't have a reproducer yet except the fact that it happened twice
> already, the second time while watching the new Star Wars teaser on
> youtube (current->comm is "AudioThread" probably from chrome, as shown
> in the splat below).

Hmm. The only recent commit in this area seems to be 822fc61367f0
("mm: don't call __page_cache_release for hugetlb") although I don't
see why it would cause anything like that. But it changes code that
has been stable for many years, which makes me wonder how valid it is
(__put_compound_page() has been unchanged since 2011, and now suddenly
it grew that "!PageHuge()" test).

So quite frankly, I'd almost suggest changing that

        if (!PageHuge(page))
                __page_cache_release(page);

back to the old unconditional __page_cache_release(page), and maybe add a single

        WARN_ON_ONCE(PageHuge(page));

just to see if that condition actually happens. The new comment says
it shouldn't happen and that the change shouldn't matter, but...

Of course, your recent BUG_ON may well be entirely unrelated to this
change in mm/swap.c, but it *is* in kind of the same area, and the
timing would match too...

             Linus

---
[115258.861335] page:ffffea0010a15040 count:0 mapcount:1 mapping:
    (null) index:0x0
[115258.869511] flags: 0x8000000000008014(referenced|dirty|tail)
[115258.874159] page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
[115258.874179] kernel BUG at mm/swap.c:134!
[115258.874262] RIP: put_compound_page+0x3b9/0x480


---
On Sat, Apr 18, 2015 at 05:27:49PM -0400, Linus Torvalds wrote:
> On Sat, Apr 18, 2015 at 4:56 PM, Borislav Petkov <bp@alien8.de> wrote:
> >
> > so I'm running some intermediate state of linus/master + tip/master from
> > Thursday and probably I shouldn't be even taking such splat seriously
> > and wait until 4.1-rc1 has been done but let me report it just in case
> > so that it is out there, in case someone else sees it too.
> >
> > I don't have a reproducer yet except the fact that it happened twice
> > already, the second time while watching the new Star Wars teaser on
> > youtube (current->comm is "AudioThread" probably from chrome, as shown
> > in the splat below).

I would guess it's related to sound: the most common source of PTE-mapeed
compund pages into userspace.

> Hmm. The only recent commit in this area seems to be 822fc61367f0
> ("mm: don't call __page_cache_release for hugetlb") although I don't
> see why it would cause anything like that. But it changes code that
> has been stable for many years, which makes me wonder how valid it is
> (__put_compound_page() has been unchanged since 2011, and now suddenly
> it grew that "!PageHuge()" test).
> 
> So quite frankly, I'd almost suggest changing that
> 
>         if (!PageHuge(page))
>                 __page_cache_release(page);
> 
> back to the old unconditional __page_cache_release(page), and maybe add a single
> 
>         WARN_ON_ONCE(PageHuge(page));
> 
> just to see if that condition actually happens. The new comment says
> it shouldn't happen and that the change shouldn't matter, but...
> 
> Of course, your recent BUG_ON may well be entirely unrelated to this
> change in mm/swap.c, but it *is* in kind of the same area, and the
> timing would match too...

Andrea has already seen the bug and pointed to 8d63d99a5dfb as possible
cause. I don't see why the commit could broke anything, but it worth
trying to revert and test.

Borislav, could you try?


> ---
> [115258.861335] page:ffffea0010a15040 count:0 mapcount:1 mapping:
>     (null) index:0x0
> [115258.869511] flags: 0x8000000000008014(referenced|dirty|tail)
> [115258.874159] page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
> [115258.874179] kernel BUG at mm/swap.c:134!
> [115258.874262] RIP: put_compound_page+0x3b9/0x480
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
 Kirill A. Shutemov

---
On Sat, Apr 18, 2015 at 5:56 PM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
>
> Andrea has already seen the bug and pointed to 8d63d99a5dfb as possible
> cause. I don't see why the commit could broke anything, but it worth
> trying to revert and test.

Ahh, yes, that does look like a more likely culprit.

         Linus


---

On Sat, Apr 18, 2015 at 05:59:53PM -0400, Linus Torvalds wrote:
> On Sat, Apr 18, 2015 at 5:56 PM, Kirill A. Shutemov
> <kirill@shutemov.name> wrote:
> >
> > Andrea has already seen the bug and pointed to 8d63d99a5dfb as possible
> > cause. I don't see why the commit could broke anything, but it worth
> > trying to revert and test.
> 
> Ahh, yes, that does look like a more likely culprit.

Reverted and building... will report in the next days.

Thanks guys.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at mm/swap.c:134! - page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
@ 2015-04-18 22:16           ` Borislav Petkov
  0 siblings, 0 replies; 17+ messages in thread
From: Borislav Petkov @ 2015-04-18 22:16 UTC (permalink / raw)
  To: Linus Torvalds, Kirill A. Shutemov
  Cc: Naoya Horiguchi, Michal Hocko, Andrew Morton, x86-ml, linux-mm,
	Andrea Arcangeli, Hugh Dickins, lkml

Forgot to CC lkml for archiving purposes, here's the whole thread in
one:

---
Hi guys,

so I'm running some intermediate state of linus/master + tip/master from
Thursday and probably I shouldn't be even taking such splat seriously
and wait until 4.1-rc1 has been done but let me report it just in case
so that it is out there, in case someone else sees it too.

I don't have a reproducer yet except the fact that it happened twice
already, the second time while watching the new Star Wars teaser on
youtube (current->comm is "AudioThread" probably from chrome, as shown
in the splat below).

And srsly, to VM_BUG_ON_PAGE() while I'm watching the new Star Wars
teaser - you must be kidding me people!

Anyway, just FYI, someone might have an idea...

So here's the state of what I was running:

---
commit 3963e69e59fa4e36ac164e8cd520811135d868d3
Merge: 34c9a0ffc75a 11664e41b11e
Author: Borislav Petkov <bp@suse.de>
Date:   Thu Apr 16 13:39:44 2015 +0200

    Merge remote-tracking branch 'tip/master' into rc0+

commit 11664e41b11ed447f598424dd83ecf65400be5a1 (refs/remotes/tip/master)
Merge: 61a7fd4deb61 2df8406a439b
Author: Ingo Molnar <mingo@kernel.org>
Date:   Thu Apr 16 09:20:52 2015 +0200

    Merge branch 'sched/urgent'

commit eea3a00264cf243a28e4331566ce67b86059339d
Merge: e7c82412433a e693d73c20ff
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Apr 15 16:39:15 2015 -0700

    Merge branch 'akpm' (patches from Andrew)

    Merge second patchbomb from Andrew Morton:

---

and here's the splat:

---

[115258.861335] page:ffffea0010a15040 count:0 mapcount:1 mapping:          (null) index:0x0
[115258.869511] flags: 0x8000000000008014(referenced|dirty|tail)
[115258.874159] page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
[115258.874177] ------------[ cut here ]------------
[115258.874179] kernel BUG at mm/swap.c:134!
[115258.874182] invalid opcode: 0000 [#1] 
[115258.874183] PREEMPT 
[115258.874184] SMP 

[115258.874187] Modules linked in:
[115258.874189]  nls_iso8859_15
[115258.874190]  nls_cp437
[115258.874192]  ipt_MASQUERADE
[115258.874193]  nf_nat_masquerade_ipv4
[115258.874194]  iptable_mangle
[115258.874195]  iptable_nat
[115258.874196]  nf_conntrack_ipv4
[115258.874198]  nf_defrag_ipv4
[115258.874199]  nf_nat_ipv4
[115258.874200]  nf_nat
[115258.874201]  nf_conntrack
[115258.874202]  iptable_filter
[115258.874204]  ip_tables
[115258.874205]  x_tables
[115258.874206]  tun
[115258.874207]  sha256_ssse3
[115258.874209]  sha256_generic
[115258.874211]  binfmt_misc
[115258.874212]  ipv6
[115258.874213]  vfat
[115258.874214]  fat
[115258.874215]  fuse
[115258.874216]  dm_crypt
[115258.874217]  dm_mod
[115258.874219]  kvm_amd
[115258.874243]  kvm crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd amd64_edac_mod edac_core fam15h_power k10temp amdkfd amd_iommu_v2 radeon drm_kms_helper ttm cfbfillrect cfbimgblt cfbcopyarea acpi_cpufreq
[115258.874248] CPU: 0 PID: 2904 Comm: AudioThread Not tainted 4.0.0+ #1
[115258.874250] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97 EVO R2.0, BIOS 1503 01/16/2013
[115258.874252] task: ffff8803e8278000 ti: ffff8803f8a04000 task.ti: ffff8803f8a04000
[115258.874262] RIP: 0010:[<ffffffff8113fcb9>]  [<ffffffff8113fcb9>] put_compound_page+0x3b9/0x480
[115258.874264] RSP: 0018:ffff8803f8a07b98  EFLAGS: 00010246
[115258.874266] RAX: 000000000000003d RBX: ffffea0010a15040 RCX: 0000000000000000
[115258.874268] RDX: ffffffff8109f016 RSI: ffffffff810bb33f RDI: ffffffff810bae60
[115258.874270] RBP: ffff8803f8a07bb8 R08: 0000000000000001 R09: 0000000000000001
[115258.874271] R10: 0000000000000001 R11: 0000000000000001 R12: ffffea0010a15000
[115258.874273] R13: ffff8803f8a07e28 R14: ffffea0010a15040 R15: 0000000000000000
[115258.874276] FS:  00007f206f2af700(0000) GS:ffff88042c600000(0000) knlGS:0000000000000000
[115258.874278] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[115258.874280] CR2: 00007f2095443310 CR3: 000000041e866000 CR4: 00000000000406f0
[115258.874281] Stack:
[115258.874287]  ffff8803f8a07e60 ffffea0010a151c0 ffff8803f8a07e28 ffffea0010a15040
[115258.874292]  ffff8803f8a07c28 ffffffff8113ffd0 0000000100000000 00000000ffffffff
[115258.874296]  ffff8803f8a07de8 ffff8803f8a07e60 ffff8803f8a07be8 ffff8803f8a07be8
[115258.874298] Call Trace:
[115258.874304]  [<ffffffff8113ffd0>] release_pages+0x250/0x270
[115258.874311]  [<ffffffff811736c5>] free_pages_and_swap_cache+0x95/0xb0
[115258.874317]  [<ffffffff8115ddc0>] tlb_flush_mmu_free+0x40/0x60
[115258.874323]  [<ffffffff8115fcac>] unmap_single_vma+0x69c/0x730
[115258.874331]  [<ffffffff81160594>] unmap_vmas+0x54/0xb0
[115258.874335]  [<ffffffff81165a38>] unmap_region+0xa8/0x110
[115258.874342]  [<ffffffff811679ea>] do_munmap+0x1ea/0x3f0
[115258.874346]  [<ffffffff81167c33>] ? vm_munmap+0x43/0x80
[115258.874350]  [<ffffffff81167c41>] vm_munmap+0x51/0x80
[115258.874354]  [<ffffffff81168bee>] SyS_munmap+0xe/0x20
[115258.874359]  [<ffffffff816918db>] system_call_fastpath+0x16/0x73
[115258.874424] Code: 81 48 89 df e8 29 c9 01 00 0f 0b 48 c7 c6 00 81 8a 81 4c 89 e7 e8 18 c9 01 00 0f 0b 48 c7 c6 d8 96 8b 81 48 89 df e8 07 c9 01 00 <0f> 0b 48 c7 c6 30 97 8b 81 48 89 df e8 f6 c8 01 00 0f 0b 48 c7 
[115258.874428] RIP  [<ffffffff8113fcb9>] put_compound_page+0x3b9/0x480
[115258.874429]  RSP <ffff8803f8a07b98>
[115258.898487] ---[ end trace 6ec080e8a6ee9fb1 ]---

Thanks!

-- 
On Sat, Apr 18, 2015 at 4:56 PM, Borislav Petkov <bp@alien8.de> wrote:
>
> so I'm running some intermediate state of linus/master + tip/master from
> Thursday and probably I shouldn't be even taking such splat seriously
> and wait until 4.1-rc1 has been done but let me report it just in case
> so that it is out there, in case someone else sees it too.
>
> I don't have a reproducer yet except the fact that it happened twice
> already, the second time while watching the new Star Wars teaser on
> youtube (current->comm is "AudioThread" probably from chrome, as shown
> in the splat below).

Hmm. The only recent commit in this area seems to be 822fc61367f0
("mm: don't call __page_cache_release for hugetlb") although I don't
see why it would cause anything like that. But it changes code that
has been stable for many years, which makes me wonder how valid it is
(__put_compound_page() has been unchanged since 2011, and now suddenly
it grew that "!PageHuge()" test).

So quite frankly, I'd almost suggest changing that

        if (!PageHuge(page))
                __page_cache_release(page);

back to the old unconditional __page_cache_release(page), and maybe add a single

        WARN_ON_ONCE(PageHuge(page));

just to see if that condition actually happens. The new comment says
it shouldn't happen and that the change shouldn't matter, but...

Of course, your recent BUG_ON may well be entirely unrelated to this
change in mm/swap.c, but it *is* in kind of the same area, and the
timing would match too...

             Linus

---
[115258.861335] page:ffffea0010a15040 count:0 mapcount:1 mapping:
    (null) index:0x0
[115258.869511] flags: 0x8000000000008014(referenced|dirty|tail)
[115258.874159] page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
[115258.874179] kernel BUG at mm/swap.c:134!
[115258.874262] RIP: put_compound_page+0x3b9/0x480


---
On Sat, Apr 18, 2015 at 05:27:49PM -0400, Linus Torvalds wrote:
> On Sat, Apr 18, 2015 at 4:56 PM, Borislav Petkov <bp@alien8.de> wrote:
> >
> > so I'm running some intermediate state of linus/master + tip/master from
> > Thursday and probably I shouldn't be even taking such splat seriously
> > and wait until 4.1-rc1 has been done but let me report it just in case
> > so that it is out there, in case someone else sees it too.
> >
> > I don't have a reproducer yet except the fact that it happened twice
> > already, the second time while watching the new Star Wars teaser on
> > youtube (current->comm is "AudioThread" probably from chrome, as shown
> > in the splat below).

I would guess it's related to sound: the most common source of PTE-mapeed
compund pages into userspace.

> Hmm. The only recent commit in this area seems to be 822fc61367f0
> ("mm: don't call __page_cache_release for hugetlb") although I don't
> see why it would cause anything like that. But it changes code that
> has been stable for many years, which makes me wonder how valid it is
> (__put_compound_page() has been unchanged since 2011, and now suddenly
> it grew that "!PageHuge()" test).
> 
> So quite frankly, I'd almost suggest changing that
> 
>         if (!PageHuge(page))
>                 __page_cache_release(page);
> 
> back to the old unconditional __page_cache_release(page), and maybe add a single
> 
>         WARN_ON_ONCE(PageHuge(page));
> 
> just to see if that condition actually happens. The new comment says
> it shouldn't happen and that the change shouldn't matter, but...
> 
> Of course, your recent BUG_ON may well be entirely unrelated to this
> change in mm/swap.c, but it *is* in kind of the same area, and the
> timing would match too...

Andrea has already seen the bug and pointed to 8d63d99a5dfb as possible
cause. I don't see why the commit could broke anything, but it worth
trying to revert and test.

Borislav, could you try?


> ---
> [115258.861335] page:ffffea0010a15040 count:0 mapcount:1 mapping:
>     (null) index:0x0
> [115258.869511] flags: 0x8000000000008014(referenced|dirty|tail)
> [115258.874159] page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
> [115258.874179] kernel BUG at mm/swap.c:134!
> [115258.874262] RIP: put_compound_page+0x3b9/0x480
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
 Kirill A. Shutemov

---
On Sat, Apr 18, 2015 at 5:56 PM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
>
> Andrea has already seen the bug and pointed to 8d63d99a5dfb as possible
> cause. I don't see why the commit could broke anything, but it worth
> trying to revert and test.

Ahh, yes, that does look like a more likely culprit.

         Linus


---

On Sat, Apr 18, 2015 at 05:59:53PM -0400, Linus Torvalds wrote:
> On Sat, Apr 18, 2015 at 5:56 PM, Kirill A. Shutemov
> <kirill@shutemov.name> wrote:
> >
> > Andrea has already seen the bug and pointed to 8d63d99a5dfb as possible
> > cause. I don't see why the commit could broke anything, but it worth
> > trying to revert and test.
> 
> Ahh, yes, that does look like a more likely culprit.

Reverted and building... will report in the next days.

Thanks guys.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at mm/swap.c:134! - page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
  2015-04-18 22:12       ` Linus Torvalds
@ 2015-04-20  0:02         ` Naoya Horiguchi
  0 siblings, 0 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2015-04-20  0:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kirill A. Shutemov, Borislav Petkov, Michal Hocko, Andrew Morton,
	x86-ml, linux-mm, Andrea Arcangeli, Hugh Dickins

On Sat, Apr 18, 2015 at 06:12:56PM -0400, Linus Torvalds wrote:
> On Sat, Apr 18, 2015 at 5:59 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Sat, Apr 18, 2015 at 5:56 PM, Kirill A. Shutemov
> > <kirill@shutemov.name> wrote:
> >>
> >> Andrea has already seen the bug and pointed to 8d63d99a5dfb as possible
> >> cause. I don't see why the commit could broke anything, but it worth
> >> trying to revert and test.
> >
> > Ahh, yes, that does look like a more likely culprit.
> 
> That said, I do think we should likely also do that
> 
>         WARN_ON_ONCE(PageHuge(page));
> 
> in __put_compound_page() rather than just silently saying "no refcount
> changes for this magical case that shouldn't even happen".  If it
> shouldn't happen, then we should warn about it, not try to ":handle"
> some case that shouldn't happen and shouldn't matter.

__put_compound_page() can be called for PageHuge, so I don't think that adding
WARN_ON_ONCE(PageHuge) is good (, which makes every hugetlb user see the warning
once in every boot.)

What I thought when I suggested this code was that __page_cache_release() seems
not to be intended for hugetlb, but I'm not sure.
__put_compound_page() does work without this !PageHuge check which is only for
potential change in __put_compound_page().
So if everyone thinks that __put_compound_page() is stable and will never change
in the future, this !PageHuge check is totally unnecessary.

> Let's not play games in this area. This code has been stable for many
> years, why are we suddenly doing random things here? There's something
> to be said for "if it ain't broke..", and there's *definitely* a lot
> to be said for "let's not complicate this even more".

OK, so could you please try simply reverting 822fc61367f0 ?

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at mm/swap.c:134! - page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
  2015-04-18 22:08       ` Borislav Petkov
  2015-04-18 22:16           ` Borislav Petkov
@ 2015-04-22 13:12         ` Borislav Petkov
  2015-04-22 18:33           ` Kirill A. Shutemov
  1 sibling, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2015-04-22 13:12 UTC (permalink / raw)
  To: Linus Torvalds, Kirill A. Shutemov
  Cc: Naoya Horiguchi, Michal Hocko, Andrew Morton, x86-ml, linux-mm,
	Andrea Arcangeli, Hugh Dickins

On Sun, Apr 19, 2015 at 12:08:03AM +0200, Borislav Petkov wrote:
> On Sat, Apr 18, 2015 at 05:59:53PM -0400, Linus Torvalds wrote:
> > On Sat, Apr 18, 2015 at 5:56 PM, Kirill A. Shutemov
> > <kirill@shutemov.name> wrote:
> > >
> > > Andrea has already seen the bug and pointed to 8d63d99a5dfb as possible
> > > cause. I don't see why the commit could broke anything, but it worth
> > > trying to revert and test.
> > 
> > Ahh, yes, that does look like a more likely culprit.
> 
> Reverted and building... will report in the next days.

FWIW, box is solid with the revert and has an uptime of ~4 days so far
without hickups.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at mm/swap.c:134! - page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
  2015-04-22 13:12         ` Borislav Petkov
@ 2015-04-22 18:33           ` Kirill A. Shutemov
  2015-04-22 18:40             ` Borislav Petkov
  2015-04-22 19:26             ` Linus Torvalds
  0 siblings, 2 replies; 17+ messages in thread
From: Kirill A. Shutemov @ 2015-04-22 18:33 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Linus Torvalds, Naoya Horiguchi, Michal Hocko, Andrew Morton,
	x86-ml, linux-mm, Andrea Arcangeli, Hugh Dickins

On Wed, Apr 22, 2015 at 03:12:19PM +0200, Borislav Petkov wrote:
> On Sun, Apr 19, 2015 at 12:08:03AM +0200, Borislav Petkov wrote:
> > On Sat, Apr 18, 2015 at 05:59:53PM -0400, Linus Torvalds wrote:
> > > On Sat, Apr 18, 2015 at 5:56 PM, Kirill A. Shutemov
> > > <kirill@shutemov.name> wrote:
> > > >
> > > > Andrea has already seen the bug and pointed to 8d63d99a5dfb as possible
> > > > cause. I don't see why the commit could broke anything, but it worth
> > > > trying to revert and test.
> > > 
> > > Ahh, yes, that does look like a more likely culprit.
> > 
> > Reverted and building... will report in the next days.
> 
> FWIW, box is solid with the revert and has an uptime of ~4 days so far
> without hickups.

Could you try patch below instead? This can give a clue what's going on.

diff --git a/mm/swap.c b/mm/swap.c
index a7251a8ed532..0dff7004aa25 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -131,7 +131,11 @@ void put_unrefcounted_compound_page(struct page *page_head, struct page *page)
 		 * here, see the comment above this function.
 		 */
 		VM_BUG_ON_PAGE(!PageHead(page_head), page_head);
-		VM_BUG_ON_PAGE(page_mapcount(page) != 0, page);
+		if (page_mapcount(page) != 0) {
+			dump_page(page_head, NULL);
+			dump_page(page, NULL);
+			BUG();
+		}
 		if (put_page_testzero(page_head)) {
 			/*
 			 * If this is the tail of a slab THP page,
-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: kernel BUG at mm/swap.c:134! - page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
  2015-04-22 18:33           ` Kirill A. Shutemov
@ 2015-04-22 18:40             ` Borislav Petkov
  2015-04-22 18:43               ` Kirill A. Shutemov
  2015-04-22 19:26             ` Linus Torvalds
  1 sibling, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2015-04-22 18:40 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Linus Torvalds, Naoya Horiguchi, Michal Hocko, Andrew Morton,
	x86-ml, linux-mm, Andrea Arcangeli, Hugh Dickins

On Wed, Apr 22, 2015 at 09:33:09PM +0300, Kirill A. Shutemov wrote:
> Could you try patch below instead? This can give a clue what's going on.

Well, this happens on my workstation and I need it for work. I'll try to
find another box to reproduce it on first. You could try to reproduce it
too - it happened here while playing videos on youtube in chromium. But
it is not easy to trigger, no particular use pattern. So I don't have a
sure-fire way of reproducing it.

I can send you my .config if you need it.

HTH

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at mm/swap.c:134! - page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
  2015-04-22 18:40             ` Borislav Petkov
@ 2015-04-22 18:43               ` Kirill A. Shutemov
  2015-04-22 19:18                 ` Borislav Petkov
  0 siblings, 1 reply; 17+ messages in thread
From: Kirill A. Shutemov @ 2015-04-22 18:43 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Linus Torvalds, Naoya Horiguchi, Michal Hocko, Andrew Morton,
	x86-ml, linux-mm, Andrea Arcangeli, Hugh Dickins

On Wed, Apr 22, 2015 at 08:40:11PM +0200, Borislav Petkov wrote:
> On Wed, Apr 22, 2015 at 09:33:09PM +0300, Kirill A. Shutemov wrote:
> > Could you try patch below instead? This can give a clue what's going on.
> 
> Well, this happens on my workstation and I need it for work. I'll try to
> find another box to reproduce it on first. You could try to reproduce it
> too - it happened here while playing videos on youtube in chromium. But
> it is not easy to trigger, no particular use pattern. So I don't have a
> sure-fire way of reproducing it.

I'm running kernel with this patch on my laptop for few day without a
crash. :-/
 
> I can send you my .config if you need it.

Yes, please.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at mm/swap.c:134! - page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
  2015-04-22 18:43               ` Kirill A. Shutemov
@ 2015-04-22 19:18                 ` Borislav Petkov
  0 siblings, 0 replies; 17+ messages in thread
From: Borislav Petkov @ 2015-04-22 19:18 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Linus Torvalds, Naoya Horiguchi, Michal Hocko, Andrew Morton,
	x86-ml, linux-mm, Andrea Arcangeli, Hugh Dickins

[-- Attachment #1: Type: text/plain, Size: 402 bytes --]

On Wed, Apr 22, 2015 at 09:43:49PM +0300, Kirill A. Shutemov wrote:
> I'm running kernel with this patch on my laptop for few day without a
> crash. :-/

If I'm to venture a wild guess, you probably need a bigger box for the
proper preconditions. I have 16G here. But just a wild guess anyway.

> Yes, please.

Attached.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

[-- Attachment #2: config-4.0.0+.bz2 --]
[-- Type: application/octet-stream, Size: 18657 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at mm/swap.c:134! - page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
  2015-04-22 18:33           ` Kirill A. Shutemov
  2015-04-22 18:40             ` Borislav Petkov
@ 2015-04-22 19:26             ` Linus Torvalds
  2015-04-23 16:23               ` Andrea Arcangeli
  1 sibling, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2015-04-22 19:26 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Borislav Petkov, Naoya Horiguchi, Michal Hocko, Andrew Morton,
	x86-ml, linux-mm, Andrea Arcangeli, Hugh Dickins

On Wed, Apr 22, 2015 at 11:33 AM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
>
> Could you try patch below instead? This can give a clue what's going on.

Just FYI, I've done the revert in my tree.

Trying to figure out what is going on despite that is obviously a good
idea, but I'm hoping that my merge window is winding down, so I am
trying to make sure it's all "good to go"..

           Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at mm/swap.c:134! - page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
  2015-04-22 19:26             ` Linus Torvalds
@ 2015-04-23 16:23               ` Andrea Arcangeli
  2015-04-24 21:42                 ` Kirill A. Shutemov
  0 siblings, 1 reply; 17+ messages in thread
From: Andrea Arcangeli @ 2015-04-23 16:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kirill A. Shutemov, Borislav Petkov, Naoya Horiguchi,
	Michal Hocko, Andrew Morton, x86-ml, linux-mm, Hugh Dickins

On Wed, Apr 22, 2015 at 12:26:55PM -0700, Linus Torvalds wrote:
> On Wed, Apr 22, 2015 at 11:33 AM, Kirill A. Shutemov
> <kirill@shutemov.name> wrote:
> >
> > Could you try patch below instead? This can give a clue what's going on.
> 
> Just FYI, I've done the revert in my tree.
> 
> Trying to figure out what is going on despite that is obviously a good
> idea, but I'm hoping that my merge window is winding down, so I am
> trying to make sure it's all "good to go"..

Sounds safer to defer it, agreed.

Unfortunately I also can only reproduce it only on a workstation where
it wasn't very handy to debug it as it'd disrupt my workflow and it
isn't equipped with reliable logging either (and the KMS mode didn't
switch to console to show me the oops either). It just got it logged
once in syslog before freezing.

The problem has to be that there's some get_page/put_page activity
before and after a PageAnon transition and it looks like a tail page
got mapped by hand in userland by some driver using 4k ptes which
isn't normal but apparently safe before the patch was applied. Before
the patch, the tail page accounting would be symmetric regardless of
the PageAnon transition.

page:ffffea0010226040 count:0 mapcount:1 mapping:          (null) index:0x0
flags: 0x8000000000008010(dirty|tail)
page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
------------[ cut here ]------------
kernel BUG at mm/swap.c:134!
invalid opcode: 0000 [#1] SMP
Modules linked in: tun usbhid x86_pkg_temp_thermal kvm_intel kvm snd_hda_codec_realtek snd_hd
d_hda_intel xhci_pci ehci_hcd snd_hda_controller xhci_hcd snd_hda_codec snd_hda_core snd_pcm
d psmouse cdrom pcspkr usb_common [last unloaded: microcode]
CPU: 1 PID: 4175 Comm: knotify4 Not tainted 4.0.0+ #18
Hardware name:                  /DH61BE, BIOS BEH6110H.86A.0120.2013.1112.1412 11/12/2013
task: ffff88040ca231e0 ti: ffff8800bd088000 task.ti: ffff8800bd088000
RIP: 0010:[<ffffffff81148baa>]  [<ffffffff81148baa>] put_compound_page+0x31a/0x320
RSP: 0018:ffff8800bd08bc48  EFLAGS: 00010246
RAX: 000000000000003d RBX: ffffea0010226040 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff88041f24d310
RBP: ffffea0010226000 R08: 0000000000000400 R09: ffffffff81ccaf54
R10: 00000000000002f3 R11: 00000000000002f2 R12: ffff8800bd08be20
R13: ffff8800bd08be70 R14: 00007ff3d9772000 R15: 0000000000000000
FS:  00007ff3d2693700(0000) GS:ffff88041f240000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f877f137000 CR3: 00000003dd5b4000 CR4: 00000000000407e0
Stack:
ffff8800bd08bd0a ffffea0010226040 ffff8800bd08be38 ffffffff81148dd2
0000000000000002 00000000bd08be20 ffff8800bd08bc78 ffff8800bd08bc78
ffffea00102261c0 ffff8800bd08be20 ffff8800bd08bdf8 ffff8800bd08be20
Call Trace:
[<ffffffff81148dd2>] ? release_pages+0x222/0x260
[<ffffffff81160d80>] ? tlb_flush_mmu_free+0x30/0x50
[<ffffffff81162a00>] ? unmap_single_vma+0x580/0x810
[<ffffffff811634c1>] ? unmap_vmas+0x41/0x90
[<ffffffff81168125>] ? unmap_region+0x85/0xf0
[<ffffffff8116a17d>] ? do_munmap+0x21d/0x390
[<ffffffff8116a32a>] ? vm_munmap+0x3a/0x60
[<ffffffff8116b2ac>] ? SyS_munmap+0x1c/0x30
[<ffffffff8176d897>] ? system_call_fastpath+0x12/0x6a
Code: 81 48 89 ef e8 08 6d 01 00 0f 0b 48 c7 c6 f0 e9 9c 81 48 89 ef e8 f7 6c 01 00 0f 0b 48
e8 e6 6c 01 00 <0f> 0b 0f 1f 40 00 41 57 41 56 41 55 41 54 55 53 48 83 ec 28 85
RIP  [<ffffffff81148baa>] put_compound_page+0x31a/0x320
RSP <ffff8800bd08bc48>
---[ end trace 81df9d42bd21b1f5 ]---

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at mm/swap.c:134! - page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
  2015-04-23 16:23               ` Andrea Arcangeli
@ 2015-04-24 21:42                 ` Kirill A. Shutemov
  2015-04-27 16:40                   ` Andrea Arcangeli
  0 siblings, 1 reply; 17+ messages in thread
From: Kirill A. Shutemov @ 2015-04-24 21:42 UTC (permalink / raw)
  To: Andrea Arcangeli, Linus Torvalds
  Cc: Borislav Petkov, Naoya Horiguchi, Michal Hocko, Andrew Morton,
	x86-ml, linux-mm, Hugh Dickins

On Thu, Apr 23, 2015 at 06:23:11PM +0200, Andrea Arcangeli wrote:
> On Wed, Apr 22, 2015 at 12:26:55PM -0700, Linus Torvalds wrote:
> > On Wed, Apr 22, 2015 at 11:33 AM, Kirill A. Shutemov
> > <kirill@shutemov.name> wrote:
> > >
> > > Could you try patch below instead? This can give a clue what's going on.
> > 
> > Just FYI, I've done the revert in my tree.
> > 
> > Trying to figure out what is going on despite that is obviously a good
> > idea, but I'm hoping that my merge window is winding down, so I am
> > trying to make sure it's all "good to go"..
> 
> Sounds safer to defer it, agreed.
> 
> Unfortunately I also can only reproduce it only on a workstation where
> it wasn't very handy to debug it as it'd disrupt my workflow and it
> isn't equipped with reliable logging either (and the KMS mode didn't
> switch to console to show me the oops either). It just got it logged
> once in syslog before freezing.
> 
> The problem has to be that there's some get_page/put_page activity
> before and after a PageAnon transition and it looks like a tail page
> got mapped by hand in userland by some driver using 4k ptes which
> isn't normal

Compound pages mapped with PTEs predates THP. See f3d48f0373c1.

> but apparently safe before the patch was applied. Before
> the patch, the tail page accounting would be symmetric regardless of
> the PageAnon transition.
> 
> page:ffffea0010226040 count:0 mapcount:1 mapping:          (null) index:0x0
> flags: 0x8000000000008010(dirty|tail)
> page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
> ------------[ cut here ]------------
> kernel BUG at mm/swap.c:134!

I looked into code a bit more. And the VM_BUG_ON_PAGE() is bogus. See
explanation in commit message below.

Tail page refcounting is mess. Please consider reviewing my patchset which
drops it [1]. ;)

Linus, how should we proceed with reverted patch? Should I re-submit it to
Andrew? Or you'll re-revert it?

[1] lkml.kernel.org/g/1429823043-157133-1-git-send-email-kirill.shutemov@linux.intel.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at mm/swap.c:134! - page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
  2015-04-24 21:42                 ` Kirill A. Shutemov
@ 2015-04-27 16:40                   ` Andrea Arcangeli
  0 siblings, 0 replies; 17+ messages in thread
From: Andrea Arcangeli @ 2015-04-27 16:40 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Linus Torvalds, Borislav Petkov, Naoya Horiguchi, Michal Hocko,
	Andrew Morton, x86-ml, linux-mm, Hugh Dickins

Hello,

On Sat, Apr 25, 2015 at 12:42:25AM +0300, Kirill A. Shutemov wrote:
> On Thu, Apr 23, 2015 at 06:23:11PM +0200, Andrea Arcangeli wrote:
> > On Wed, Apr 22, 2015 at 12:26:55PM -0700, Linus Torvalds wrote:
> > > On Wed, Apr 22, 2015 at 11:33 AM, Kirill A. Shutemov
> > > <kirill@shutemov.name> wrote:
> > > >
> > > > Could you try patch below instead? This can give a clue what's going on.
> > > 
> > > Just FYI, I've done the revert in my tree.
> > > 
> > > Trying to figure out what is going on despite that is obviously a good
> > > idea, but I'm hoping that my merge window is winding down, so I am
> > > trying to make sure it's all "good to go"..
> > 
> > Sounds safer to defer it, agreed.
> > 
> > Unfortunately I also can only reproduce it only on a workstation where
> > it wasn't very handy to debug it as it'd disrupt my workflow and it
> > isn't equipped with reliable logging either (and the KMS mode didn't
> > switch to console to show me the oops either). It just got it logged
> > once in syslog before freezing.
> > 
> > The problem has to be that there's some get_page/put_page activity
> > before and after a PageAnon transition and it looks like a tail page
> > got mapped by hand in userland by some driver using 4k ptes which
> > isn't normal
> 
> Compound pages mapped with PTEs predates THP. See f3d48f0373c1.

Yes, I intended "normal" as a feeling about it considering it's your
new patchset that tries to introduce that behavior for regular anon
pages, I didn't imply it was not ok for driver-owned pages, sorry for
the confusion.

> I looked into code a bit more. And the VM_BUG_ON_PAGE() is bogus. See
> explanation in commit message below.
> 
> Tail page refcounting is mess. Please consider reviewing my patchset which
> drops it [1]. ;)
> 
> Linus, how should we proceed with reverted patch? Should I re-submit it to
> Andrew? Or you'll re-revert it?

You could resubmit the old patch together with this patch, so they go
together.

In retrospect it may have been cleaner to pick another field than
mapcount for the tail page refcounting, then the VM_BUG_ON could have
been retained.

mapcount was picked as candidate of tail page refcounting, because it
already implemented a "count" too so it was simpler to use than
another random 32bit word and didn't require further unions into the
page struct.

page_count cannot be used for refcounting tail pages of THP or
speculative pagecache lookups race against
split_huge_page_refcount. For non-THP it doesn't matter, even
page_count could have been used or even better no tail refcounting at
all like with your patch.

With your patch you're basically disabling the tail page refcounting
for those usages so it probably doesn't matter anymore to move away
from mapcount and removing the VM_BUG_ON doesn't concern me by now (it
never actually triggered before).

Even for those usages doubling up the refcounting in mapcount, the
refcounting of 4.0 was safe. The false positive VM_BUG_ON could only
happen only after the change.

> [1] lkml.kernel.org/g/1429823043-157133-1-git-send-email-kirill.shutemov@linux.intel.com
> 
> From 854cdc961b7f83f04a83144ab4f7459ae46b0f3d Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Fri, 24 Apr 2015 23:49:04 +0300
> Subject: [PATCH] mm: drop bogus VM_BUG_ON_PAGE assert in put_page() codepath
> 
> My patch 8d63d99a5dfb which was merged during 4.1 merge window caused
> regression:
> 
>   page:ffffea0010a15040 count:0 mapcount:1 mapping:          (null) index:0x0
>   flags: 0x8000000000008014(referenced|dirty|tail)
>   page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
>   ------------[ cut here ]------------
>   kernel BUG at mm/swap.c:134!
> 
> The problem can be reproduced by playing *two* audio files at the same
> time and then stopping one of players. I used two mplayers to trigger
> this.
> 
> The VM_BUG_ON_PAGE() which triggers the bug is bogus:
> 
> Sound subsystem uses compound pages for its buffers, but unlike most
> __GFP_COMP users sound maps compound pages to userspace with PTEs.
> 
> In our case with two players map the buffer twice and therefore elevates

I didn't think at the case of mapping the same compound page twice in
userland, this clearly explains the crash. I thought it had to be the
PageAnon flipping somehow but I couldn't explain where... and in fact
it was something else.

> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reported-by: Andrea Arcangeli <aarcange@redhat.com>
> Reported-by: Borislav Petkov <bp@alien8.de>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
>  mm/swap.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/mm/swap.c b/mm/swap.c
> index a7251a8ed532..a3a0a2f1f7c3 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -131,7 +131,6 @@ void put_unrefcounted_compound_page(struct page *page_head, struct page *page)
>  		 * here, see the comment above this function.
>  		 */
>  		VM_BUG_ON_PAGE(!PageHead(page_head), page_head);
> -		VM_BUG_ON_PAGE(page_mapcount(page) != 0, page);
>  		if (put_page_testzero(page_head)) {
>  			/*
>  			 * If this is the tail of a slab THP page,

Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>

If you resend a consolidated commit with the two changes in the same
commit, feel free to retain the Reviewed-by for both.

Optionally you could turn it into a VM_BUG_ON_PAGE(page_mapcount(page)
< 0, page) but it's up to you.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-04-27 16:41 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20150418205656.GA7972@pd.tnic>
2015-04-18 21:27 ` kernel BUG at mm/swap.c:134! - page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0) Linus Torvalds
2015-04-18 21:56   ` Kirill A. Shutemov
2015-04-18 21:59     ` Linus Torvalds
2015-04-18 22:08       ` Borislav Petkov
2015-04-18 22:16         ` Borislav Petkov
2015-04-18 22:16           ` Borislav Petkov
2015-04-22 13:12         ` Borislav Petkov
2015-04-22 18:33           ` Kirill A. Shutemov
2015-04-22 18:40             ` Borislav Petkov
2015-04-22 18:43               ` Kirill A. Shutemov
2015-04-22 19:18                 ` Borislav Petkov
2015-04-22 19:26             ` Linus Torvalds
2015-04-23 16:23               ` Andrea Arcangeli
2015-04-24 21:42                 ` Kirill A. Shutemov
2015-04-27 16:40                   ` Andrea Arcangeli
2015-04-18 22:12       ` Linus Torvalds
2015-04-20  0:02         ` Naoya Horiguchi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.