linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kernel oops on mmotm-2015-10-15-15-20
@ 2015-10-21  5:28 Minchan Kim
  2015-10-21 11:07 ` Kirill A. Shutemov
  2015-10-22  2:15 ` Hugh Dickins
  0 siblings, 2 replies; 33+ messages in thread
From: Minchan Kim @ 2015-10-21  5:28 UTC (permalink / raw)
  To: Andrew Morton, Kirill A. Shutemov
  Cc: linux-mm, linux-kernel, Hugh Dickins, Rik van Riel, Mel Gorman,
	Michal Hocko, Johannes Weiner, Vlastimil Babka

I detach this report from my patchset thread because I see below
problem with removing MADV_FREE related code and I can reproduce
same oops with MADV_FREE + recent patches(both my SetPageDirty
and Kirill's pte_mkdirty) within 7 hours.

I can not be sure it's THP refcount redesign's problem but it was
one of big change in MM between mmotm-2015-10-15-15-20 and
mmotm-2015-10-06-16-30 so it could be a culprit.

In page_lock_anon_vma_read, anon_vma_root was NULL.
I added VM_BUG_ON_PAGE(!root_anon_vma, page) in there and got the result.

..
..
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
page:ffffea0001b81140 count:3 mapcount:1 mapping:ffff88007e806461 index:0x600001445
page:ffffea0001b87bc0 count:3 mapcount:1 mapping:ffff88007e806461 index:0x6000015ef
flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
page dumped because: VM_BUG_ON_PAGE(1)
page->mem_cgroup:ffff88007f2de000
------------[ cut here ]------------
kernel BUG at mm/rmap.c:517!
invalid opcode: 0000 [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 24935 Comm: madvise_test Not tainted 4.3.0-rc5-mm1-THP-ref-madv_free+ #1555
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff880000ce8000 ti: ffff8800ada28000 task.ti: ffff8800ada28000
RIP: 0010:[<ffffffff81128f6e>]  [<ffffffff81128f6e>] page_lock_anon_vma_read+0x18e/0x190
RSP: 0000:ffff8800ada2b868  EFLAGS: 00010296
RAX: 0000000000000021 RBX: ffffea0001b87bc0 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000282 RDI: ffffffff81830db0
RBP: ffff8800ada2b888 R08: 0000000000000021 R09: ffff8800ba40eb75
R10: 0000000001ff14bc R11: 0000000000000000 R12: ffff88007e806461
R13: ffff88007e806460 R14: 0000000000000000 R15: ffffffff818464c0
FS:  00007f6d93212740(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000600003c14000 CR3: 00000000a674b000 CR4: 00000000000006b0
Stack:
 ffffea0001b87bc0 ffff8800ada2b8f8 ffff88007f2de000 0000000000000000
 ffff8800ada2b8d0 ffffffff81129593 ffff880000000000 ffffffff8105f8c0
 ffffea0001b87bc0 ffff8800ada2b9f8 ffff88007f2de000 0000000000000000
Call Trace:
 [<ffffffff81129593>] rmap_walk+0x1b3/0x3f0
 [<ffffffff8105f8c0>] ? finish_task_switch+0x70/0x260
 [<ffffffff81129973>] page_referenced+0x1a3/0x220
 [<ffffffff81127c10>] ? __page_check_address+0x1d0/0x1d0
 [<ffffffff81128de0>] ? page_get_anon_vma+0xd0/0xd0
 [<ffffffff81127580>] ? anon_vma_ctor+0x40/0x40
 [<ffffffff81103e9e>] shrink_page_list+0x5ce/0xdc0
 [<ffffffff81104d4c>] shrink_inactive_list+0x18c/0x4b0
 [<ffffffff811059af>] shrink_lruvec+0x58f/0x730
 [<ffffffff81105c24>] shrink_zone+0xd4/0x280
 [<ffffffff81105efd>] do_try_to_free_pages+0x12d/0x3b0
 [<ffffffff8110635d>] try_to_free_mem_cgroup_pages+0x9d/0x120
 [<ffffffff8114e235>] try_charge+0x175/0x720
 [<ffffffff810fdf80>] ? __activate_page+0x230/0x230
 [<ffffffff81152005>] mem_cgroup_try_charge+0x85/0x1d0
 [<ffffffff8111e69a>] handle_mm_fault+0xc9a/0x1000
 [<ffffffff8106215b>] ? __set_cpus_allowed_ptr+0x9b/0x1a0
 [<ffffffff81033629>] __do_page_fault+0x189/0x400
 [<ffffffff810338ac>] do_page_fault+0xc/0x10
 [<ffffffff81428782>] page_fault+0x22/0x30
Code: c9 0f 84 b9 fe ff ff 8d 51 01 89 c8 f0 0f b1 16 39 c1 0f 84 11 ff ff ff 89 c1 eb e3 48 c7 c6 88 02 78 81 48 89 df e8 02 f3 fe ff <0f> 0b 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 45 31 f6 
41 55 4c 
RIP  [<ffffffff81128f6e>] page_lock_anon_vma_read+0x18e/0x190
 RSP <ffff8800ada2b868>
---[ end trace cfbb87f54f12290e ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled

On Tue, Oct 20, 2015 at 10:38:54AM +0900, Minchan Kim wrote:
> On Mon, Oct 19, 2015 at 07:01:50PM +0900, Minchan Kim wrote:
> > On Mon, Oct 19, 2015 at 03:31:42PM +0900, Minchan Kim wrote:
> > > Hello, it's too late since I sent previos patch.
> > > https://lkml.org/lkml/2015/6/3/37
> > > 
> > > This patch is alomost new compared to previos approach.
> > > I think this is more simple, clear and easy to review.
> > > 
> > > One thing I should notice is that I have tested this patch
> > > and couldn't find any critical problem so I rebased patchset
> > > onto recent mmotm(ie, mmotm-2015-10-15-15-20) to send formal
> > > patchset. Unfortunately, I start to see sudden discarding of
> > > the page we shouldn't do. IOW, application's valid anonymous page
> > > was disappeared suddenly.
> > > 
> > > When I look through THP changes, I think we could lose
> > > dirty bit of pte between freeze_page and unfreeze_page
> > > when we mark it as migration entry and restore it.
> > > So, I added below simple code without enough considering
> > > and cannot see the problem any more.
> > > I hope it's good hint to find right fix this problem.
> > > 
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index d5ea516ffb54..e881c04f5950 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -3138,6 +3138,9 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page,
> > >  		if (is_write_migration_entry(swp_entry))
> > >  			entry = maybe_mkwrite(entry, vma);
> > >  
> > > +		if (PageDirty(page))
> > > +			SetPageDirty(page);
> > 
> > The condition of PageDirty was typo. I didn't add the condition.
> > Just added.
> > 
> >                 SetPageDirty(page);
> 
> For the first step to find this bug, I removed all MADV_FREE related
> code in mmotm-2015-10-15-15-20. IOW, git checkout 54bad5da4834
> (arm64: add pmd_[dirty|mkclean] for THP) so the tree doesn't have
> any core code of MADV_FREE.
> 
> I tested following workloads in my KVM machine.
> 
> 0. make memcg
> 1. limit memcg
> 2. fork several processes
> 3. each process allocates THP page and fill
> 4. increase limit of the memcg to swapoff successfully
> 5. swapoff
> 6. kill all of processes
> 7. goto 1
> 
> Within a few hours, I encounter following bug.
> Attached detailed boot log and dmesg result.
> 
> 
> Initializing cgroup subsys cpu
> Command line: hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic console=ttyS0,115200 console=tty0 earlyprintk=ttyS0 ignore_loglevel ftrace_dump_on_oops vga=normal root=/dev/vda1 rw
> KERNEL supported cpus:
>   Intel GenuineIntel
> x86/fpu: Legacy x87 FPU detected.
> x86/fpu: Using 'lazy' FPU context switches.
> e820: BIOS-provided physical RAM map:
> BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
> BIOS-e820: [mem 0x0000000000100000-0x00000000bfffbfff] usable
> BIOS-e820: [mem 0x00000000bfffc000-0x00000000bfffffff] reserved
> BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
> BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
> 
> <snip>
> 
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> IP: [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> PGD 0 
> Oops: 0000 [#1] SMP 
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 26445 Comm: sh Not tainted 4.3.0-rc5-mm1-diet-meta+ #1545
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: ffff8800b9af3480 ti: ffff88007fea0000 task.ti: ffff88007fea0000
> RIP: 0010:[<ffffffff810782a9>]  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> RSP: 0018:ffff88007fea3648  EFLAGS: 00010202
> RAX: 0000000000000001 RBX: ffffea0002324900 RCX: ffff88007fea37e8
> RDX: 0000000000000000 RSI: ffff88007fea36e8 RDI: 0000000000000008
> RBP: ffff88007fea3648 R08: ffffffff818446a0 R09: ffff8800b9af4c80
> R10: 0000000000000216 R11: 0000000000000001 R12: ffff88007f58d6e1
> R13: ffff88007f58d6e0 R14: 0000000000000008 R15: 0000000000000001
> FS:  00007f0993e78740(0000) GS:ffff8800bfa20000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000008 CR3: 000000007edee000 CR4: 00000000000006a0
> Stack:
>  ffff88007fea3678 ffffffff81124ff0 ffffea0002324900 ffff88007fea36e8
>  ffff88009ffe8400 0000000000000000 ffff88007fea36c0 ffffffff81125733
>  ffff8800bfa34540 ffffffff8105dc9d ffffea0002324900 ffff88007fea37e8
> Call Trace:
>  [<ffffffff81124ff0>] page_lock_anon_vma_read+0x60/0x180
>  [<ffffffff81125733>] rmap_walk+0x1b3/0x3f0
>  [<ffffffff8105dc9d>] ? finish_task_switch+0x5d/0x1f0
>  [<ffffffff81125b13>] page_referenced+0x1a3/0x220
>  [<ffffffff81123e30>] ? __page_check_address+0x1a0/0x1a0
>  [<ffffffff81124f90>] ? page_get_anon_vma+0xd0/0xd0
>  [<ffffffff81123820>] ? anon_vma_ctor+0x40/0x40
>  [<ffffffff8110087b>] shrink_page_list+0x5ab/0xde0
>  [<ffffffff8110174c>] shrink_inactive_list+0x18c/0x4b0
>  [<ffffffff811023bd>] shrink_lruvec+0x59d/0x740
>  [<ffffffff811025f0>] shrink_zone+0x90/0x250
>  [<ffffffff811028dd>] do_try_to_free_pages+0x12d/0x3b0
>  [<ffffffff81102d3d>] try_to_free_mem_cgroup_pages+0x9d/0x120
>  [<ffffffff811496c3>] try_charge+0x163/0x700
>  [<ffffffff81149cb4>] mem_cgroup_do_precharge+0x54/0x70
>  [<ffffffff81149e45>] mem_cgroup_can_attach+0x175/0x1b0
>  [<ffffffff811b2c57>] ? kernfs_iattrs.isra.6+0x37/0xd0
>  [<ffffffff81148e70>] ? get_mctgt_type+0x320/0x320
>  [<ffffffff810a9d29>] cgroup_migrate+0x149/0x440
>  [<ffffffff810aa60c>] cgroup_attach_task+0x7c/0xe0
>  [<ffffffff810aa904>] __cgroup_procs_write.isra.33+0x1d4/0x2b0
>  [<ffffffff810aaa10>] cgroup_tasks_write+0x10/0x20
>  [<ffffffff810a6238>] cgroup_file_write+0x38/0xf0
>  [<ffffffff811b54ad>] kernfs_fop_write+0x11d/0x170
>  [<ffffffff81153918>] __vfs_write+0x28/0xe0
>  [<ffffffff8116e614>] ? __fd_install+0x24/0xc0
>  [<ffffffff810784a1>] ? percpu_down_read+0x21/0x50
>  [<ffffffff81153e91>] vfs_write+0xa1/0x170
>  [<ffffffff81154716>] SyS_write+0x46/0xa0
>  [<ffffffff81420a17>] entry_SYSCALL_64_fastpath+0x12/0x6a
> Code: 5e 82 3a 00 48 83 c4 08 5b 5d c3 48 89 45 f0 e8 9b 6a 3a 00 48 8b 45 f0 eb df 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 <48> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7 
> RIP  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
>  RSP <ffff88007fea3648>
> CR2: 0000000000000008
> BUG: unable to handle kernel ---[ end trace e81a82c8122b447d ]---
> Kernel panic - not syncing: Fatal exception
> 
> NULL pointer dereference at 0000000000000008
> IP: [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> PGD 0 
> Oops: 0000 [#2] SMP 
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 10 PID: 59 Comm: khugepaged Tainted: G      D         4.3.0-rc5-mm1-diet-meta+ #1545
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: ffff8800b9851a40 ti: ffff8800b985c000 task.ti: ffff8800b985c000
> RIP: 0010:[<ffffffff810782a9>]  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> RSP: 0018:ffff8800b985f778  EFLAGS: 00010202
> RAX: 0000000000000001 RBX: ffffea0002321800 RCX: ffff8800b985f918
> RDX: 0000000000000000 RSI: ffff8800b985f818 RDI: 0000000000000008
> RBP: ffff8800b985f778 R08: ffffffff818446a0 R09: ffff8800b9853240
> R10: 000000000000ba03 R11: 0000000000000001 R12: ffff88007f58d6e1
> R13: ffff88007f58d6e0 R14: 0000000000000008 R15: 0000000000000001
> FS:  0000000000000000(0000) GS:ffff8800bfb40000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000008 CR3: 0000000001808000 CR4: 00000000000006a0
> Stack:
>  ffff8800b985f7a8 ffffffff81124ff0 ffffea0002321800 ffff8800b985f818
>  ffff88009ffe8400 0000000000000000 ffff8800b985f7f0 ffffffff81125733
>  ffff8800bfb54540 ffffffff8105dc9d ffffea0002321800 ffff8800b985f918
> Call Trace:
>  [<ffffffff81124ff0>] page_lock_anon_vma_read+0x60/0x180
>  [<ffffffff81125733>] rmap_walk+0x1b3/0x3f0
>  [<ffffffff8105dc9d>] ? finish_task_switch+0x5d/0x1f0
>  [<ffffffff81125b13>] page_referenced+0x1a3/0x220
>  [<ffffffff81123e30>] ? __page_check_address+0x1a0/0x1a0
>  [<ffffffff81124f90>] ? page_get_anon_vma+0xd0/0xd0
>  [<ffffffff81123820>] ? anon_vma_ctor+0x40/0x40
>  [<ffffffff8110087b>] shrink_page_list+0x5ab/0xde0
>  [<ffffffff8110174c>] shrink_inactive_list+0x18c/0x4b0
>  [<ffffffff811023bd>] shrink_lruvec+0x59d/0x740
>  [<ffffffff811025f0>] shrink_zone+0x90/0x250
>  [<ffffffff811028dd>] do_try_to_free_pages+0x12d/0x3b0
>  [<ffffffff81102d3d>] try_to_free_mem_cgroup_pages+0x9d/0x120
>  [<ffffffff811496c3>] try_charge+0x163/0x700
>  [<ffffffff8141d1f3>] ? schedule+0x33/0x80
>  [<ffffffff8114d45f>] mem_cgroup_try_charge+0x9f/0x1d0
>  [<ffffffff811434bc>] khugepaged+0x7cc/0x1ac0
>  [<ffffffff81066e01>] ? hrtick_update+0x1/0x70
>  [<ffffffff81072430>] ? prepare_to_wait_event+0xf0/0xf0
>  [<ffffffff81142cf0>] ? total_mapcount+0x70/0x70
>  [<ffffffff81056cd9>] kthread+0xc9/0xe0
>  [<ffffffff81056c10>] ? kthread_park+0x60/0x60
>  [<ffffffff81420d6f>] ret_from_fork+0x3f/0x70
>  [<ffffffff81056c10>] ? kthread_park+0x60/0x60
> Code: 5e 82 3a 00 48 83 c4 08 5b 5d c3 48 89 45 f0 e8 9b 6a 3a 00 48 8b 45 f0 eb df 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 <48> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7 
> RIP  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
>  RSP <ffff8800b985f778>
> CR2: 0000000000000008
> ---[ end trace e81a82c8122b447e ]---
> Shutting down cpus with NMI
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Kernel Offset: disabled
> 

> QEMU 2.0.0 monitor - type 'help' for more information
> (qemu) s^[[Kearly console in setup code
> Initializing cgroup subsys cpu
> Linux version 4.3.0-rc5-mm1-diet-meta+ (barrios@bbox) (gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04) ) #1545 SMP Tue Oct 20 08:55:45 KST 2015
> Command line: hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic console=ttyS0,115200 console=tty0 earlyprintk=ttyS0 ignore_loglevel ftrace_dump_on_oops vga=normal root=/dev/vda1 rw
> KERNEL supported cpus:
>   Intel GenuineIntel
> x86/fpu: Legacy x87 FPU detected.
> x86/fpu: Using 'lazy' FPU context switches.
> e820: BIOS-provided physical RAM map:
> BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
> BIOS-e820: [mem 0x0000000000100000-0x00000000bfffbfff] usable
> BIOS-e820: [mem 0x00000000bfffc000-0x00000000bfffffff] reserved
> BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
> BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
> bootconsole [earlyser0] enabled
> debug: ignoring loglevel setting.
> NX (Execute Disable) protection: active
> SMBIOS 2.4 present.
> DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
> e820: remove [mem 0x000a0000-0x000fffff] usable
> e820: last_pfn = 0xbfffc max_arch_pfn = 0x400000000
> MTRR default type: write-back
> MTRR fixed ranges enabled:
>   00000-9FFFF write-back
>   A0000-BFFFF uncachable
>   C0000-FFFFF write-protect
> MTRR variable ranges enabled:
>   0 base 00C0000000 mask FFC0000000 uncachable
>   1 disabled
>   2 disabled
>   3 disabled
>   4 disabled
>   5 disabled
>   6 disabled
>   7 disabled
> x86/PAT: PAT not supported by CPU.
> Scan for SMP in [mem 0x00000000-0x000003ff]
> Scan for SMP in [mem 0x0009fc00-0x0009ffff]
> Scan for SMP in [mem 0x000f0000-0x000fffff]
> found SMP MP-table at [mem 0x000f0a70-0x000f0a7f] mapped at [ffff8800000f0a70]
>   mpc: f0a80-f0c44
> Scanning 1 areas for low memory corruption
> Base memory trampoline at [ffff880000099000] 99000 size 24576
> init_memory_mapping: [mem 0x00000000-0x000fffff]
>  [mem 0x00000000-0x000fffff] page 4k
> BRK [0x0220e000, 0x0220efff] PGTABLE
> BRK [0x0220f000, 0x0220ffff] PGTABLE
> BRK [0x02210000, 0x02210fff] PGTABLE
> init_memory_mapping: [mem 0xbfc00000-0xbfdfffff]
>  [mem 0xbfc00000-0xbfdfffff] page 2M
> BRK [0x02211000, 0x02211fff] PGTABLE
> init_memory_mapping: [mem 0xa0000000-0xbfbfffff]
>  [mem 0xa0000000-0xbfbfffff] page 2M
> init_memory_mapping: [mem 0x80000000-0x9fffffff]
>  [mem 0x80000000-0x9fffffff] page 2M
> init_memory_mapping: [mem 0x00100000-0x7fffffff]
>  [mem 0x00100000-0x001fffff] page 4k
>  [mem 0x00200000-0x7fffffff] page 2M
> init_memory_mapping: [mem 0xbfe00000-0xbfffbfff]
>  [mem 0xbfe00000-0xbfffbfff] page 4k
> BRK [0x02212000, 0x02212fff] PGTABLE
> RAMDISK: [mem 0x7851a000-0x7fffffff]
>  [ffffea0000000000-ffffea0002ffffff] PMD -> [ffff8800bc400000-ffff8800bf3fffff] on node 0
> Zone ranges:
>   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>   DMA32    [mem 0x0000000001000000-0x00000000bfffbfff]
>   Normal   empty
> Movable zone start for each node
> Early memory node ranges
>   node   0: [mem 0x0000000000001000-0x000000000009efff]
>   node   0: [mem 0x0000000000100000-0x00000000bfffbfff]
> Initmem setup node 0 [mem 0x0000000000001000-0x00000000bfffbfff]
> On node 0 totalpages: 786330
>   DMA zone: 64 pages used for memmap
>   DMA zone: 21 pages reserved
>   DMA zone: 3998 pages, LIFO batch:0
>   DMA32 zone: 12224 pages used for memmap
>   DMA32 zone: 782332 pages, LIFO batch:31
> Intel MultiProcessor Specification v1.4
>   mpc: f0a80-f0c44
> MPTABLE: OEM ID: BOCHSCPU
> MPTABLE: Product ID: 0.1         
> MPTABLE: APIC at: 0xFEE00000
> mapped APIC to ffffffffff5fd000 (        fee00000)
> Processor #0 (Bootup-CPU)
> Processor #1
> Processor #2
> Processor #3
> Processor #4
> Processor #5
> Processor #6
> Processor #7
> Processor #8
> Processor #9
> Processor #10
> Processor #11
> Bus #0 is PCI   
> Bus #1 is ISA   
> IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
> Int: type 0, pol 1, trig 0, bus 00, IRQ 04, APIC ID 0, APIC INT 09
> Int: type 0, pol 1, trig 0, bus 00, IRQ 0c, APIC ID 0, APIC INT 0b
> Int: type 0, pol 1, trig 0, bus 00, IRQ 10, APIC ID 0, APIC INT 0b
> Int: type 0, pol 1, trig 0, bus 00, IRQ 14, APIC ID 0, APIC INT 0a
> Int: type 0, pol 1, trig 0, bus 00, IRQ 18, APIC ID 0, APIC INT 0a
> Int: type 0, pol 0, trig 0, bus 01, IRQ 00, APIC ID 0, APIC INT 02
> Int: type 0, pol 0, trig 0, bus 01, IRQ 01, APIC ID 0, APIC INT 01
> Int: type 0, pol 0, trig 0, bus 01, IRQ 03, APIC ID 0, APIC INT 03
> Int: type 0, pol 0, trig 0, bus 01, IRQ 04, APIC ID 0, APIC INT 04
> Int: type 0, pol 0, trig 0, bus 01, IRQ 06, APIC ID 0, APIC INT 06
> Int: type 0, pol 0, trig 0, bus 01, IRQ 07, APIC ID 0, APIC INT 07
> Int: type 0, pol 0, trig 0, bus 01, IRQ 08, APIC ID 0, APIC INT 08
> Int: type 0, pol 0, trig 0, bus 01, IRQ 0c, APIC ID 0, APIC INT 0c
> Int: type 0, pol 0, trig 0, bus 01, IRQ 0d, APIC ID 0, APIC INT 0d
> Int: type 0, pol 0, trig 0, bus 01, IRQ 0e, APIC ID 0, APIC INT 0e
> Int: type 0, pol 0, trig 0, bus 01, IRQ 0f, APIC ID 0, APIC INT 0f
> Lint: type 3, pol 0, trig 0, bus 01, IRQ 00, APIC ID 0, APIC LINT 00
> Lint: type 1, pol 0, trig 0, bus 01, IRQ 00, APIC ID ff, APIC LINT 01
> Processors: 12
> smpboot: Allowing 12 CPUs, 0 hotplug CPUs
> mapped IOAPIC to ffffffffff5fc000 (fec00000)
> e820: [mem 0xc0000000-0xfeffbfff] available for PCI devices
> clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
> setup_percpu: NR_CPUS:16 nr_cpumask_bits:16 nr_cpu_ids:12 nr_node_ids:1
> PERCPU: Embedded 31 pages/cpu @ffff8800bfa00000 s87640 r8192 d31144 u131072
> pcpu-alloc: s87640 r8192 d31144 u131072 alloc=1*2097152
> pcpu-alloc: [0] 00 01 02 03 04 05 06 07 08 09 10 11 -- -- -- -- 
> Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 774021
> Kernel command line: hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic console=ttyS0,115200 console=tty0 earlyprintk=ttyS0 ignore_loglevel ftrace_dump_on_oops vga=normal root=/dev/vda1 rw
> sysrq: sysrq always enabled.
> log_buf_len individual max cpu contribution: 2097152 bytes
> log_buf_len total cpu_extra contributions: 23068672 bytes
> log_buf_len min size: 8388608 bytes
> log_buf_len: 33554432 bytes
> early log buf free: 8380096(99%)
> PID hash table entries: 4096 (order: 3, 32768 bytes)
> Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
> Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
> Memory: 2911172K/3145320K available (4237K kernel code, 721K rwdata, 1988K rodata, 936K init, 8608K bss, 234148K reserved, 0K cma-reserved)
> SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=12, Nodes=1
> Hierarchical RCU implementation.
> 	Build-time adjustment of leaf fanout to 64.
> 	RCU restricting CPUs from NR_CPUS=16 to nr_cpu_ids=12.
> RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=12
> NR_IRQS:4352 nr_irqs:136 16
> Console: colour VGA+ 80x25
> console [tty0] enabled
> bootconsole [earlyser0] disabled
> Initializing cgroup subsys cpu
> Linux version 4.3.0-rc5-mm1-diet-meta+ (barrios@bbox) (gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04) ) #1545 SMP Tue Oct 20 08:55:45 KST 2015
> Command line: hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic console=ttyS0,115200 console=tty0 earlyprintk=ttyS0 ignore_loglevel ftrace_dump_on_oops vga=normal root=/dev/vda1 rw
> KERNEL supported cpus:
>   Intel GenuineIntel
> x86/fpu: Legacy x87 FPU detected.
> x86/fpu: Using 'lazy' FPU context switches.
> e820: BIOS-provided physical RAM map:
> BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
> BIOS-e820: [mem 0x0000000000100000-0x00000000bfffbfff] usable
> BIOS-e820: [mem 0x00000000bfffc000-0x00000000bfffffff] reserved
> BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
> BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
> bootconsole [earlyser0] enabled
> debug: ignoring loglevel setting.
> NX (Execute Disable) protection: active
> SMBIOS 2.4 present.
> DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
> e820: remove [mem 0x000a0000-0x000fffff] usable
> e820: last_pfn = 0xbfffc max_arch_pfn = 0x400000000
> MTRR default type: write-back
> MTRR fixed ranges enabled:
>   00000-9FFFF write-back
>   A0000-BFFFF uncachable
>   C0000-FFFFF write-protect
> MTRR variable ranges enabled:
>   0 base 00C0000000 mask FFC0000000 uncachable
>   1 disabled
>   2 disabled
>   3 disabled
>   4 disabled
>   5 disabled
>   6 disabled
>   7 disabled
> x86/PAT: PAT not supported by CPU.
> Scan for SMP in [mem 0x00000000-0x000003ff]
> Scan for SMP in [mem 0x0009fc00-0x0009ffff]
> Scan for SMP in [mem 0x000f0000-0x000fffff]
> found SMP MP-table at [mem 0x000f0a70-0x000f0a7f] mapped at [ffff8800000f0a70]
>   mpc: f0a80-f0c44
> Scanning 1 areas for low memory corruption
> Base memory trampoline at [ffff880000099000] 99000 size 24576
> init_memory_mapping: [mem 0x00000000-0x000fffff]
>  [mem 0x00000000-0x000fffff] page 4k
> BRK [0x0220e000, 0x0220efff] PGTABLE
> BRK [0x0220f000, 0x0220ffff] PGTABLE
> BRK [0x02210000, 0x02210fff] PGTABLE
> init_memory_mapping: [mem 0xbfc00000-0xbfdfffff]
>  [mem 0xbfc00000-0xbfdfffff] page 2M
> BRK [0x02211000, 0x02211fff] PGTABLE
> init_memory_mapping: [mem 0xa0000000-0xbfbfffff]
>  [mem 0xa0000000-0xbfbfffff] page 2M
> init_memory_mapping: [mem 0x80000000-0x9fffffff]
>  [mem 0x80000000-0x9fffffff] page 2M
> init_memory_mapping: [mem 0x00100000-0x7fffffff]
>  [mem 0x00100000-0x001fffff] page 4k
>  [mem 0x00200000-0x7fffffff] page 2M
> init_memory_mapping: [mem 0xbfe00000-0xbfffbfff]
>  [mem 0xbfe00000-0xbfffbfff] page 4k
> BRK [0x02212000, 0x02212fff] PGTABLE
> RAMDISK: [mem 0x7851a000-0x7fffffff]
>  [ffffea0000000000-ffffea0002ffffff] PMD -> [ffff8800bc400000-ffff8800bf3fffff] on node 0
> Zone ranges:
>   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>   DMA32    [mem 0x0000000001000000-0x00000000bfffbfff]
>   Normal   empty
> Movable zone start for each node
> Early memory node ranges
>   node   0: [mem 0x0000000000001000-0x000000000009efff]
>   node   0: [mem 0x0000000000100000-0x00000000bfffbfff]
> Initmem setup node 0 [mem 0x0000000000001000-0x00000000bfffbfff]
> On node 0 totalpages: 786330
>   DMA zone: 64 pages used for memmap
>   DMA zone: 21 pages reserved
>   DMA zone: 3998 pages, LIFO batch:0
>   DMA32 zone: 12224 pages used for memmap
>   DMA32 zone: 782332 pages, LIFO batch:31
> Intel MultiProcessor Specification v1.4
>   mpc: f0a80-f0c44
> MPTABLE: OEM ID: BOCHSCPU
> MPTABLE: Product ID: 0.1         
> MPTABLE: APIC at: 0xFEE00000
> mapped APIC to ffffffffff5fd000 (        fee00000)
> Processor #0 (Bootup-CPU)
> Processor #1
> Processor #2
> Processor #3
> Processor #4
> Processor #5
> Processor #6
> Processor #7
> Processor #8
> Processor #9
> Processor #10
> Processor #11
> Bus #0 is PCI   
> Bus #1 is ISA   
> IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
> Int: type 0, pol 1, trig 0, bus 00, IRQ 04, APIC ID 0, APIC INT 09
> Int: type 0, pol 1, trig 0, bus 00, IRQ 0c, APIC ID 0, APIC INT 0b
> Int: type 0, pol 1, trig 0, bus 00, IRQ 10, APIC ID 0, APIC INT 0b
> Int: type 0, pol 1, trig 0, bus 00, IRQ 14, APIC ID 0, APIC INT 0a
> Int: type 0, pol 1, trig 0, bus 00, IRQ 18, APIC ID 0, APIC INT 0a
> Int: type 0, pol 0, trig 0, bus 01, IRQ 00, APIC ID 0, APIC INT 02
> Int: type 0, pol 0, trig 0, bus 01, IRQ 01, APIC ID 0, APIC INT 01
> Int: type 0, pol 0, trig 0, bus 01, IRQ 03, APIC ID 0, APIC INT 03
> Int: type 0, pol 0, trig 0, bus 01, IRQ 04, APIC ID 0, APIC INT 04
> Int: type 0, pol 0, trig 0, bus 01, IRQ 06, APIC ID 0, APIC INT 06
> Int: type 0, pol 0, trig 0, bus 01, IRQ 07, APIC ID 0, APIC INT 07
> Int: type 0, pol 0, trig 0, bus 01, IRQ 08, APIC ID 0, APIC INT 08
> Int: type 0, pol 0, trig 0, bus 01, IRQ 0c, APIC ID 0, APIC INT 0c
> Int: type 0, pol 0, trig 0, bus 01, IRQ 0d, APIC ID 0, APIC INT 0d
> Int: type 0, pol 0, trig 0, bus 01, IRQ 0e, APIC ID 0, APIC INT 0e
> Int: type 0, pol 0, trig 0, bus 01, IRQ 0f, APIC ID 0, APIC INT 0f
> Lint: type 3, pol 0, trig 0, bus 01, IRQ 00, APIC ID 0, APIC LINT 00
> Lint: type 1, pol 0, trig 0, bus 01, IRQ 00, APIC ID ff, APIC LINT 01
> Processors: 12
> smpboot: Allowing 12 CPUs, 0 hotplug CPUs
> mapped IOAPIC to ffffffffff5fc000 (fec00000)
> e820: [mem 0xc0000000-0xfeffbfff] available for PCI devices
> clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
> setup_percpu: NR_CPUS:16 nr_cpumask_bits:16 nr_cpu_ids:12 nr_node_ids:1
> PERCPU: Embedded 31 pages/cpu @ffff8800bfa00000 s87640 r8192 d31144 u131072
> pcpu-alloc: s87640 r8192 d31144 u131072 alloc=1*2097152
> pcpu-alloc: [0] 00 01 02 03 04 05 06 07 08 09 10 11 -- -- -- -- 
> Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 774021
> Kernel command line: hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic console=ttyS0,115200 console=tty0 earlyprintk=ttyS0 ignore_loglevel ftrace_dump_on_oops vga=normal root=/dev/vda1 rw
> sysrq: sysrq always enabled.
> log_buf_len individual max cpu contribution: 2097152 bytes
> log_buf_len total cpu_extra contributions: 23068672 bytes
> log_buf_len min size: 8388608 bytes
> log_buf_len: 33554432 bytes
> early log buf free: 8380096(99%)
> PID hash table entries: 4096 (order: 3, 32768 bytes)
> Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
> Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
> Memory: 2911172K/3145320K available (4237K kernel code, 721K rwdata, 1988K rodata, 936K init, 8608K bss, 234148K reserved, 0K cma-reserved)
> SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=12, Nodes=1
> Hierarchical RCU implementation.
> 	Build-time adjustment of leaf fanout to 64.
> 	RCU restricting CPUs from NR_CPUS=16 to nr_cpu_ids=12.
> RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=12
> NR_IRQS:4352 nr_irqs:136 16
> Console: colour VGA+ 80x25
> console [tty0] enabled
> bootconsole [earlyser0] disabled
> console [ttyS0] enabled
> tsc: Fast TSC calibration using PIT
> tsc: Detected 3199.926 MHz processor
> Calibrating delay loop (skipped), value calculated using timer frequency.. 6399.85 BogoMIPS (lpj=12799704)
> pid_max: default: 32768 minimum: 301
> Mount-cache hash table entries: 8192 (order: 4, 65536 bytes)
> Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes)
> Initializing cgroup subsys memory
> Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
> Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
> Freeing SMP alternatives memory: 20K (ffffffff819a0000 - ffffffff819a5000)
> ftrace: allocating 16664 entries in 66 pages
> Switched APIC routing to physical flat.
> enabled ExtINT on CPU#0
> ENABLING IO-APIC IRQs
> init IO_APIC IRQs
>  apic 0 pin 0 not connected
> IOAPIC[0]: Set routing entry (0-1 -> 0x31 -> IRQ 1 Mode:0 Active:0 Dest:0)
> IOAPIC[0]: Set routing entry (0-2 -> 0x30 -> IRQ 0 Mode:0 Active:0 Dest:0)
> IOAPIC[0]: Set routing entry (0-3 -> 0x33 -> IRQ 3 Mode:0 Active:0 Dest:0)
> IOAPIC[0]: Set routing entry (0-4 -> 0x34 -> IRQ 4 Mode:0 Active:0 Dest:0)
>  apic 0 pin 5 not connected
> IOAPIC[0]: Set routing entry (0-6 -> 0x36 -> IRQ 6 Mode:0 Active:0 Dest:0)
> IOAPIC[0]: Set routing entry (0-7 -> 0x37 -> IRQ 7 Mode:0 Active:0 Dest:0)
> IOAPIC[0]: Set routing entry (0-8 -> 0x38 -> IRQ 8 Mode:0 Active:0 Dest:0)
> IOAPIC[0]: Set routing entry (0-9 -> 0x39 -> IRQ 9 Mode:1 Active:0 Dest:0)
> IOAPIC[0]: Set routing entry (0-10 -> 0x3a -> IRQ 10 Mode:1 Active:0 Dest:0)
> IOAPIC[0]: Set routing entry (0-11 -> 0x3b -> IRQ 11 Mode:1 Active:0 Dest:0)
> IOAPIC[0]: Set routing entry (0-12 -> 0x3c -> IRQ 12 Mode:0 Active:0 Dest:0)
> IOAPIC[0]: Set routing entry (0-13 -> 0x3d -> IRQ 13 Mode:0 Active:0 Dest:0)
> IOAPIC[0]: Set routing entry (0-14 -> 0x3e -> IRQ 14 Mode:0 Active:0 Dest:0)
> IOAPIC[0]: Set routing entry (0-15 -> 0x3f -> IRQ 15 Mode:0 Active:0 Dest:0)
>  apic 0 pin 16 not connected
>  apic 0 pin 17 not connected
>  apic 0 pin 18 not connected
>  apic 0 pin 19 not connected
>  apic 0 pin 20 not connected
>  apic 0 pin 21 not connected
>  apic 0 pin 22 not connected
>  apic 0 pin 23 not connected
> ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> Using local APIC timer interrupts.
> calibrating APIC timer ...
> ... lapic delta = 6251755
> ..... delta 6251755
> ..... mult: 268510832
> ..... calibration result: 4001123
> ..... CPU clock speed is 3200.3592 MHz.
> ..... host bus clock speed is 1000.1123 MHz.
> ... verify APIC timer
> ... jiffies delta = 25
> ... jiffies result ok
> smpboot: CPU0: Intel QEMU Virtual CPU version 2.0.0 (family: 0x6, model: 0x6, stepping: 0x3)
> Performance Events: Broken PMU hardware detected, using software events only.
> Failed to access perfctr msr (MSR c2 is 0)
> x86: Booting SMP configuration:
> .... node  #0, CPUs:        #1
> masked ExtINT on CPU#1
>   #2
> masked ExtINT on CPU#2
>   #3
> masked ExtINT on CPU#3
>   #4
> masked ExtINT on CPU#4
>   #5
> masked ExtINT on CPU#5
>   #6
> masked ExtINT on CPU#6
>   #7
> masked ExtINT on CPU#7
>   #8
> masked ExtINT on CPU#8
>   #9
> masked ExtINT on CPU#9
>  #10
> masked ExtINT on CPU#10
>  #11
> masked ExtINT on CPU#11
> x86: Booted up 1 node, 12 CPUs
> smpboot: Total of 12 processors activated (76818.13 BogoMIPS)
> devtmpfs: initialized
> clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
> NET: Registered protocol family 16
> PCI: Using configuration type 1 for base access
> vgaarb: loaded
> SCSI subsystem initialized
> libata version 3.00 loaded.
> PCI: Probing PCI hardware
> PCI: root bus 00: using default resources
> PCI: Probing PCI hardware (bus 00)
> PCI host bridge to bus 0000:00
> pci_bus 0000:00: root bus resource [io  0x0000-0xffff]
> pci_bus 0000:00: root bus resource [mem 0x00000000-0xffffffffff]
> pci_bus 0000:00: No busn resource found for root bus, will use [bus 00-ff]
> pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
> pci 0000:00:01.0: [8086:7000] type 00 class 0x060100
> pci 0000:00:01.1: [8086:7010] type 00 class 0x010180
> pci 0000:00:01.1: reg 0x20: [io  0xc0c0-0xc0cf]
> pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io  0x01f0-0x01f7]
> pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io  0x03f6]
> pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io  0x0170-0x0177]
> pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io  0x0376]
> pci 0000:00:01.3: [8086:7113] type 00 class 0x068000
> pci 0000:00:02.0: [1013:00b8] type 00 class 0x030000
> pci 0000:00:02.0: reg 0x10: [mem 0xfc000000-0xfdffffff pref]
> pci 0000:00:02.0: reg 0x14: [mem 0xfebd0000-0xfebd0fff]
> pci 0000:00:02.0: reg 0x30: [mem 0xfebc0000-0xfebcffff pref]
> vgaarb: setting as boot device: PCI:0000:00:02.0
> vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
> pci 0000:00:03.0: [1af4:1000] type 00 class 0x020000
> pci 0000:00:03.0: reg 0x10: [io  0xc080-0xc09f]
> pci 0000:00:03.0: reg 0x14: [mem 0xfebd1000-0xfebd1fff]
> pci 0000:00:03.0: reg 0x30: [mem 0xfeb80000-0xfebbffff pref]
> pci 0000:00:04.0: [1af4:1002] type 00 class 0x00ff00
> pci 0000:00:04.0: reg 0x10: [io  0xc0a0-0xc0bf]
> pci 0000:00:05.0: [1af4:1001] type 00 class 0x010000
> pci 0000:00:05.0: reg 0x10: [io  0xc000-0xc03f]
> pci 0000:00:05.0: reg 0x14: [mem 0xfebd2000-0xfebd2fff]
> pci 0000:00:06.0: [1af4:1001] type 00 class 0x010000
> pci 0000:00:06.0: reg 0x10: [io  0xc040-0xc07f]
> pci 0000:00:06.0: reg 0x14: [mem 0xfebd3000-0xfebd3fff]
> pci 0000:00:07.0: [8086:25ab] type 00 class 0x088000
> pci 0000:00:07.0: reg 0x10: [mem 0xfebd4000-0xfebd400f]
> pci_bus 0000:00: busn_res: [bus 00-ff] end is updated to 00
> pci 0000:00:01.0: PIIX/ICH IRQ router [8086:7000]
> PCI: pci_cache_line_size set to 64 bytes
> e820: reserve RAM buffer [mem 0x0009fc00-0x0009ffff]
> e820: reserve RAM buffer [mem 0xbfffc000-0xbfffffff]
> clocksource: Switched to clocksource refined-jiffies
> pci_bus 0000:00: resource 4 [io  0x0000-0xffff]
> pci_bus 0000:00: resource 5 [mem 0x00000000-0xffffffffff]
> NET: Registered protocol family 2
> TCP established hash table entries: 32768 (order: 6, 262144 bytes)
> TCP bind hash table entries: 32768 (order: 7, 524288 bytes)
> TCP: Hash tables configured (established 32768 bind 32768)
> UDP hash table entries: 2048 (order: 4, 65536 bytes)
> UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes)
> NET: Registered protocol family 1
> Trying to unpack rootfs image as initramfs...
> Freeing initrd memory: 125848K (ffff88007851a000 - ffff880080000000)
> platform rtc_cmos: registered platform RTC device (no PNP device found)
> Scanning for low memory corruption every 60 seconds
> futex hash table entries: 4096 (order: 6, 262144 bytes)
> HugeTLB registered 2 MB page size, pre-allocated 0 pages
> fuse init (API version 7.23)
> 9p: Installing v9fs 9p2000 file system support
> cryptomgr_test (74) used greatest stack depth: 15352 bytes left
> cryptomgr_test (82) used greatest stack depth: 15136 bytes left
> Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251)
> io scheduler noop registered
> io scheduler deadline registered
> io scheduler cfq registered (default)
> querying PCI -> IRQ mapping bus:0, slot:3, pin:0.
> virtio-pci 0000:00:03.0: PCI->APIC IRQ transform: INT A -> IRQ 11
> virtio-pci 0000:00:03.0: virtio_pci: leaving for legacy driver
> querying PCI -> IRQ mapping bus:0, slot:4, pin:0.
> virtio-pci 0000:00:04.0: PCI->APIC IRQ transform: INT A -> IRQ 11
> virtio-pci 0000:00:04.0: virtio_pci: leaving for legacy driver
> querying PCI -> IRQ mapping bus:0, slot:5, pin:0.
> virtio-pci 0000:00:05.0: PCI->APIC IRQ transform: INT A -> IRQ 10
> virtio-pci 0000:00:05.0: virtio_pci: leaving for legacy driver
> querying PCI -> IRQ mapping bus:0, slot:6, pin:0.
> virtio-pci 0000:00:06.0: PCI->APIC IRQ transform: INT A -> IRQ 10
> virtio-pci 0000:00:06.0: virtio_pci: leaving for legacy driver
> Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
> serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> Linux agpgart interface v0.103
> brd: module loaded
> loop: module loaded
>  vda: vda1 vda2 < vda5 >
> zram: Added device: zram0
> libphy: Fixed MDIO Bus: probed
> tun: Universal TUN/TAP device driver, 1.6
> tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
> serio: i8042 KBD port at 0x60,0x64 irq 1
> serio: i8042 AUX port at 0x60,0x64 irq 12
> mousedev: PS/2 mouse device common for all mice
> rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
> rtc_cmos rtc_cmos: alarms up to one day, 114 bytes nvram
> device-mapper: ioctl: 4.33.0-ioctl (2015-8-18) initialised: dm-devel@redhat.com
> device-mapper: cache cleaner: version 1.0.0 loaded
> NET: Registered protocol family 17
> 9pnet: Installing 9P2000 support
> ... APIC ID:      00000000 (0)
> ... APIC VERSION: 01050014
> 0000000000000000000000000000000000000000000000000000000000000000
> 000000000e000000000000000000000000000000000000000000000000000000
> 0000000000020000000000000000000000000000000000000000000000008000
> 
> number of MP IRQ sources: 16.
> number of IO-APIC #0 registers: 24.
> testing the IO APIC.......................
> IO APIC #0......
> .... register #00: 00000000
> .......    : physical APIC id: 00
> .......    : Delivery Type: 0
> .......    : LTS          : 0
> .... register #01: 00170011
> .......     : max redirection entries: 17
> .......     : PRQ implemented: 0
> .......     : IO APIC version: 11
> .... register #02: 00000000
> .......     : arbitration: 00
> .... IRQ redirection table:
> IOAPIC 0:
>  pin00, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
>  pin01, enabled , edge , high, V(31), IRR(0), S(0), physical, D(00), M(0)
>  pin02, enabled , edge , high, V(30), IRR(0), S(0), physical, D(00), M(0)
>  pin03, enabled , edge , high, V(33), IRR(0), S(0), physical, D(00), M(0)
>  pin04, disabled, edge , high, V(34), IRR(0), S(0), physical, D(00), M(0)
>  pin05, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
>  pin06, enabled , edge , high, V(36), IRR(0), S(0), physical, D(00), M(0)
>  pin07, enabled , edge , high, V(37), IRR(0), S(0), physical, D(00), M(0)
>  pin08, enabled , edge , high, V(38), IRR(0), S(0), physical, D(00), M(0)
>  pin09, disabled, level, high, V(39), IRR(0), S(0), physical, D(00), M(0)
>  pin0a, enabled , level, high, V(3A), IRR(0), S(0), physical, D(00), M(0)
>  pin0b, enabled , level, high, V(3B), IRR(0), S(0), physical, D(00), M(0)
>  pin0c, enabled , edge , high, V(3C), IRR(0), S(0), physical, D(00), M(0)
>  pin0d, enabled , edge , high, V(3D), IRR(0), S(0), physical, D(00), M(0)
>  pin0e, enabled , edge , high, V(3E), IRR(0), S(0), physical, D(00), M(0)
>  pin0f, enabled , edge , high, V(3F), IRR(0), S(0), physical, D(00), M(0)
>  pin10, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
>  pin11, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
>  pin12, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
>  pin13, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
>  pin14, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
>  pin15, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
>  pin16, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
>  pin17, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
> IRQ to pin mappings:
> IRQ0 -> 0:2
> IRQ1 -> 0:1
> IRQ3 -> 0:3
> IRQ4 -> 0:4
> IRQ6 -> 0:6
> IRQ7 -> 0:7
> IRQ8 -> 0:8
> IRQ9 -> 0:9
> IRQ10 -> 0:10
> IRQ11 -> 0:11
> IRQ12 -> 0:12
> IRQ13 -> 0:13
> IRQ14 -> 0:14
> IRQ15 -> 0:15
> .................................... done.
> rtc_cmos rtc_cmos: setting system clock to 2015-10-20 08:57:55 UTC (1445331475)
> input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
> Freeing unused kernel memory: 936K (ffffffff818b6000 - ffffffff819a0000)
> Write protecting the kernel read-only data: 8192k
> Freeing unused kernel memory: 1900K (ffff880001425000 - ffff880001600000)
> Freeing unused kernel memory: 60K (ffff8800017f1000 - ffff880001800000)
> busybox (117) used greatest stack depth: 14480 bytes left
> exe (124) used greatest stack depth: 14024 bytes left
> udevd[140]: starting version 175
> blkid (151) used greatest stack depth: 13920 bytes left
> modprobe (242) used greatest stack depth: 13784 bytes left
> clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2e200418439, max_idle_ns: 440795220848 ns
> clocksource: Switched to clocksource tsc
> EXT4-fs (vda1): recovery complete
> EXT4-fs (vda1): mounted filesystem with ordered data mode. Opts: (null)
> exe (262) used greatest stack depth: 13032 bytes left
> random: init urandom read with 9 bits of entropy available
> init: plymouth-upstart-bridge main process (279) terminated with status 1
> init: plymouth-upstart-bridge main process ended, respawning
> init: plymouth-upstart-bridge main process (289) terminated with status 1
> init: plymouth-upstart-bridge main process ended, respawning
> init: plymouth-upstart-bridge main process (293) terminated with status 1
> init: plymouth-upstart-bridge main process ended, respawning
> init: ureadahead main process (282) terminated with status 5
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> systemd-udevd[423]: starting version 204
> EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: errors=remount-ro
>  * Stopping Send an event to indicate plymouth is up^[[74G[ OK ]
>  * Starting Mount filesystems on boot^[[74G[ OK ]
>  * Starting Signal sysvinit that the rootfs is mounted^[[74G[ OK ]
>  * Starting Populate /dev filesystem^[[74G[ OK ]
>  * Starting Populate and link to /run filesystem^[[74G[ OK ]
>  * Stopping Populate /dev filesystem^[[74G[ OK ]
>  * Stopping Populate and link to /run filesystem^[[74G[ OK ]
>  * Starting Clean /tmp directory^[[74G[ OK ]
>  * Stopping Track if upstart is running in a container^[[74G[ OK ]
>  * Stopping Clean /tmp directory^[[74G[ OK ]
>  * Starting Initialize or finalize resolvconf^[[74G[ OK ]
>  * Starting set console keymap^[[74G[ OK ]
>  * Starting Signal sysvinit that virtual filesystems are mounted^[[74G[ OK ]
>  * Starting Signal sysvinit that virtual filesystems are mounted^[[74G[ OK ]
>  * Starting Bridge udev events into upstart^[[74G[ OK ]
>  * Starting Signal sysvinit that remote filesystems are mounted^[[74G[ OK ]
>  * Stopping set console keymap^[[74G[ OK ]
>  * Starting device node and kernel event manager^[[74G[ OK ]
>  * Starting load modules from /etc/modules^[[74G[ OK ]
>  * Starting cold plug devices^[[74G[ OK ]
>  * Starting log initial device creation^[[74G[ OK ]
>  * Stopping Read required files in advance (for other mountpoints)^[[74G[ OK ]
>  * Stopping load modules from /etc/modules^[[74G[ OK ]
>  * Starting Signal sysvinit that local filesystems are mounted^[[74G[ OK ]
>  * Starting flush early job output to logs^[[74G[ OK ]
>  * Stopping Mount filesystems on boot^[[74G[ OK ]
>  * Stopping flush early job output to logs^[[74G[ OK ]
>  * Starting D-Bus system message bus^[[74G[ OK ]
>  * Starting SystemD login management service^[[74G[ OK ]
>  * Starting system logging daemon^[[74G[ OK ]
>  * Stopping cold plug devices^[[74G[ OK ]
>  * Starting Uncomplicated firewall^[[74G[ OK ]
>  * Starting configure network device security^[[74G[ OK ]
>  * Stopping log initial device creation^[[74G[ OK ]
>  * Starting configure network device security^[[74G[ OK ]
>  * Starting save udev log and update rules^[[74G[ OK ]
>  * Starting set console font^[[74G[ OK ]
>  * Stopping save udev log and update rules^[[74G[ OK ]
>  * Starting Mount network filesystems^[[74G[ OK ]
>  * Starting Failsafe Boot Delay^[[74G[ OK ]
>  * Starting configure network device security^[[74G[ OK ]
>  * Stopping Mount network filesystems^[[74G[ OK ]
>  * Starting configure network device^[[74G[ OK ]
>  * Starting configure network device^[[74G[ OK ]
>  * Starting Bridge file events into upstart^[[74G[ OK ]
>  * Starting Bridge socket events into upstart^[[74G[ OK ]
>  * Stopping set console font^[[74G[ OK ]
>  * Starting userspace bootsplash^[[74G[ OK ]
>  * Starting Send an event to indicate plymouth is up^[[74G[ OK ]
>  * Stopping userspace bootsplash^[[74G[ OK ]
>  * Stopping Send an event to indicate plymouth is up^[[74G[ OK ]
>  * Starting Mount network filesystems^[[74G[ OK ]
> init: failsafe main process (591) killed by TERM signal
>  * Stopping Failsafe Boot Delay^[[74G[ OK ]
>  * Starting System V initialisation compatibility^[[74G[ OK ]
>  * Stopping Mount network filesystems^[[74G[ OK ]
>  * Starting configure virtual network devices^[[74G[ OK ]
>  * Stopping System V initialisation compatibility^[[74G[ OK ]
>  * Starting System V runlevel compatibility^[[74G[ OK ]
>  * Starting deferred execution scheduler^[[74G[ OK ]
>  * Starting regular background program processing daemon^[[74G[ OK ]
>  * Starting ACPI daemon^[[74G[ OK ]
>  * Starting save kernel messages^[[74G[ OK ]
>  * Starting CPU interrupts balancing daemon^[[74G[ OK ]
>  * Stopping save kernel messages^[[74G[ OK ]
>  * Starting OpenSSH server^[[74G[ OK ]
>  * Starting automatic crash report generation^[[74G[ OK ]
>  * Restoring resolver state...       ^[[80G 
^[[74G[ OK ]
> eth0 Link encap:Ethernet HWaddr 52:54:79:12:34:57 inet addr:192.168.0.21 Bcast:192.168.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:34 errors:0 dropped:24 overruns:0 frame:0 TX packets:4 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:5780 (5.7 KB) TX bytes:800 (800.0 B) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
>  * Stopping System V runlevel compatibility^[[74G[ OK ]
> init: plymouth-upstart-bridge main process ended, respawning
> sh (1429) used greatest stack depth: 11752 bytes left
> sh (1454) used greatest stack depth: 11528 bytes left
> random: nonblocking pool is initialized
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> sh (2785) used greatest stack depth: 11480 bytes left
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> IP: [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> PGD 0 
> Oops: 0000 [#1] SMP 
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 26445 Comm: sh Not tainted 4.3.0-rc5-mm1-diet-meta+ #1545
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: ffff8800b9af3480 ti: ffff88007fea0000 task.ti: ffff88007fea0000
> RIP: 0010:[<ffffffff810782a9>]  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> RSP: 0018:ffff88007fea3648  EFLAGS: 00010202
> RAX: 0000000000000001 RBX: ffffea0002324900 RCX: ffff88007fea37e8
> RDX: 0000000000000000 RSI: ffff88007fea36e8 RDI: 0000000000000008
> RBP: ffff88007fea3648 R08: ffffffff818446a0 R09: ffff8800b9af4c80
> R10: 0000000000000216 R11: 0000000000000001 R12: ffff88007f58d6e1
> R13: ffff88007f58d6e0 R14: 0000000000000008 R15: 0000000000000001
> FS:  00007f0993e78740(0000) GS:ffff8800bfa20000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000008 CR3: 000000007edee000 CR4: 00000000000006a0
> Stack:
>  ffff88007fea3678 ffffffff81124ff0 ffffea0002324900 ffff88007fea36e8
>  ffff88009ffe8400 0000000000000000 ffff88007fea36c0 ffffffff81125733
>  ffff8800bfa34540 ffffffff8105dc9d ffffea0002324900 ffff88007fea37e8
> Call Trace:
>  [<ffffffff81124ff0>] page_lock_anon_vma_read+0x60/0x180
>  [<ffffffff81125733>] rmap_walk+0x1b3/0x3f0
>  [<ffffffff8105dc9d>] ? finish_task_switch+0x5d/0x1f0
>  [<ffffffff81125b13>] page_referenced+0x1a3/0x220
>  [<ffffffff81123e30>] ? __page_check_address+0x1a0/0x1a0
>  [<ffffffff81124f90>] ? page_get_anon_vma+0xd0/0xd0
>  [<ffffffff81123820>] ? anon_vma_ctor+0x40/0x40
>  [<ffffffff8110087b>] shrink_page_list+0x5ab/0xde0
>  [<ffffffff8110174c>] shrink_inactive_list+0x18c/0x4b0
>  [<ffffffff811023bd>] shrink_lruvec+0x59d/0x740
>  [<ffffffff811025f0>] shrink_zone+0x90/0x250
>  [<ffffffff811028dd>] do_try_to_free_pages+0x12d/0x3b0
>  [<ffffffff81102d3d>] try_to_free_mem_cgroup_pages+0x9d/0x120
>  [<ffffffff811496c3>] try_charge+0x163/0x700
>  [<ffffffff81149cb4>] mem_cgroup_do_precharge+0x54/0x70
>  [<ffffffff81149e45>] mem_cgroup_can_attach+0x175/0x1b0
>  [<ffffffff811b2c57>] ? kernfs_iattrs.isra.6+0x37/0xd0
>  [<ffffffff81148e70>] ? get_mctgt_type+0x320/0x320
>  [<ffffffff810a9d29>] cgroup_migrate+0x149/0x440
>  [<ffffffff810aa60c>] cgroup_attach_task+0x7c/0xe0
>  [<ffffffff810aa904>] __cgroup_procs_write.isra.33+0x1d4/0x2b0
>  [<ffffffff810aaa10>] cgroup_tasks_write+0x10/0x20
>  [<ffffffff810a6238>] cgroup_file_write+0x38/0xf0
>  [<ffffffff811b54ad>] kernfs_fop_write+0x11d/0x170
>  [<ffffffff81153918>] __vfs_write+0x28/0xe0
>  [<ffffffff8116e614>] ? __fd_install+0x24/0xc0
>  [<ffffffff810784a1>] ? percpu_down_read+0x21/0x50
>  [<ffffffff81153e91>] vfs_write+0xa1/0x170
>  [<ffffffff81154716>] SyS_write+0x46/0xa0
>  [<ffffffff81420a17>] entry_SYSCALL_64_fastpath+0x12/0x6a
> Code: 5e 82 3a 00 48 83 c4 08 5b 5d c3 48 89 45 f0 e8 9b 6a 3a 00 48 8b 45 f0 eb df 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 <48> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7 
> RIP  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
>  RSP <ffff88007fea3648>
> CR2: 0000000000000008
> BUG: unable to handle kernel ---[ end trace e81a82c8122b447d ]---
> Kernel panic - not syncing: Fatal exception
> 
> NULL pointer dereference at 0000000000000008
> IP: [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> PGD 0 
> Oops: 0000 [#2] SMP 
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 10 PID: 59 Comm: khugepaged Tainted: G      D         4.3.0-rc5-mm1-diet-meta+ #1545
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: ffff8800b9851a40 ti: ffff8800b985c000 task.ti: ffff8800b985c000
> RIP: 0010:[<ffffffff810782a9>]  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> RSP: 0018:ffff8800b985f778  EFLAGS: 00010202
> RAX: 0000000000000001 RBX: ffffea0002321800 RCX: ffff8800b985f918
> RDX: 0000000000000000 RSI: ffff8800b985f818 RDI: 0000000000000008
> RBP: ffff8800b985f778 R08: ffffffff818446a0 R09: ffff8800b9853240
> R10: 000000000000ba03 R11: 0000000000000001 R12: ffff88007f58d6e1
> R13: ffff88007f58d6e0 R14: 0000000000000008 R15: 0000000000000001
> FS:  0000000000000000(0000) GS:ffff8800bfb40000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000008 CR3: 0000000001808000 CR4: 00000000000006a0
> Stack:
>  ffff8800b985f7a8 ffffffff81124ff0 ffffea0002321800 ffff8800b985f818
>  ffff88009ffe8400 0000000000000000 ffff8800b985f7f0 ffffffff81125733
>  ffff8800bfb54540 ffffffff8105dc9d ffffea0002321800 ffff8800b985f918
> Call Trace:
>  [<ffffffff81124ff0>] page_lock_anon_vma_read+0x60/0x180
>  [<ffffffff81125733>] rmap_walk+0x1b3/0x3f0
>  [<ffffffff8105dc9d>] ? finish_task_switch+0x5d/0x1f0
>  [<ffffffff81125b13>] page_referenced+0x1a3/0x220
>  [<ffffffff81123e30>] ? __page_check_address+0x1a0/0x1a0
>  [<ffffffff81124f90>] ? page_get_anon_vma+0xd0/0xd0
>  [<ffffffff81123820>] ? anon_vma_ctor+0x40/0x40
>  [<ffffffff8110087b>] shrink_page_list+0x5ab/0xde0
>  [<ffffffff8110174c>] shrink_inactive_list+0x18c/0x4b0
>  [<ffffffff811023bd>] shrink_lruvec+0x59d/0x740
>  [<ffffffff811025f0>] shrink_zone+0x90/0x250
>  [<ffffffff811028dd>] do_try_to_free_pages+0x12d/0x3b0
>  [<ffffffff81102d3d>] try_to_free_mem_cgroup_pages+0x9d/0x120
>  [<ffffffff811496c3>] try_charge+0x163/0x700
>  [<ffffffff8141d1f3>] ? schedule+0x33/0x80
>  [<ffffffff8114d45f>] mem_cgroup_try_charge+0x9f/0x1d0
>  [<ffffffff811434bc>] khugepaged+0x7cc/0x1ac0
>  [<ffffffff81066e01>] ? hrtick_update+0x1/0x70
>  [<ffffffff81072430>] ? prepare_to_wait_event+0xf0/0xf0
>  [<ffffffff81142cf0>] ? total_mapcount+0x70/0x70
>  [<ffffffff81056cd9>] kthread+0xc9/0xe0
>  [<ffffffff81056c10>] ? kthread_park+0x60/0x60
>  [<ffffffff81420d6f>] ret_from_fork+0x3f/0x70
>  [<ffffffff81056c10>] ? kthread_park+0x60/0x60
> Code: 5e 82 3a 00 48 83 c4 08 5b 5d c3 48 89 45 f0 e8 9b 6a 3a 00 48 8b 45 f0 eb df 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 <48> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7 
> RIP  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
>  RSP <ffff8800b985f778>
> CR2: 0000000000000008
> ---[ end trace e81a82c8122b447e ]---
> Shutting down cpus with NMI
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Kernel Offset: disabled


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-10-21  5:28 kernel oops on mmotm-2015-10-15-15-20 Minchan Kim
@ 2015-10-21 11:07 ` Kirill A. Shutemov
  2015-10-22  0:06   ` Minchan Kim
  2015-10-22  2:15 ` Hugh Dickins
  1 sibling, 1 reply; 33+ messages in thread
From: Kirill A. Shutemov @ 2015-10-21 11:07 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Hugh Dickins,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Wed, Oct 21, 2015 at 02:28:36PM +0900, Minchan Kim wrote:
> I detach this report from my patchset thread because I see below
> problem with removing MADV_FREE related code and I can reproduce
> same oops with MADV_FREE + recent patches(both my SetPageDirty
> and Kirill's pte_mkdirty) within 7 hours.

Could you share code for your workload?

> I can not be sure it's THP refcount redesign's problem but it was
> one of big change in MM between mmotm-2015-10-15-15-20 and
> mmotm-2015-10-06-16-30 so it could be a culprit.
> 
> In page_lock_anon_vma_read, anon_vma_root was NULL.
> I added VM_BUG_ON_PAGE(!root_anon_vma, page) in there and got the result.

Hm. That's tricky.. :-/

Could you please dump anon_vma->refcount too?

I have vage suspicion that I'm screwing up anon_vma refcounting during
split_huge_page.

It would be great to see if the page was part of THP before.

> 
> ..
> ..
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> page:ffffea0001b81140 count:3 mapcount:1 mapping:ffff88007e806461 index:0x600001445
> page:ffffea0001b87bc0 count:3 mapcount:1 mapping:ffff88007e806461 index:0x6000015ef
> flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(1)
> page->mem_cgroup:ffff88007f2de000
> ------------[ cut here ]------------
> kernel BUG at mm/rmap.c:517!
> invalid opcode: 0000 [#1] SMP 
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 24935 Comm: madvise_test Not tainted 4.3.0-rc5-mm1-THP-ref-madv_free+ #1555
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: ffff880000ce8000 ti: ffff8800ada28000 task.ti: ffff8800ada28000
> RIP: 0010:[<ffffffff81128f6e>]  [<ffffffff81128f6e>] page_lock_anon_vma_read+0x18e/0x190
> RSP: 0000:ffff8800ada2b868  EFLAGS: 00010296
> RAX: 0000000000000021 RBX: ffffea0001b87bc0 RCX: 0000000000000000
> RDX: 0000000000000001 RSI: 0000000000000282 RDI: ffffffff81830db0
> RBP: ffff8800ada2b888 R08: 0000000000000021 R09: ffff8800ba40eb75
> R10: 0000000001ff14bc R11: 0000000000000000 R12: ffff88007e806461
> R13: ffff88007e806460 R14: 0000000000000000 R15: ffffffff818464c0
> FS:  00007f6d93212740(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000600003c14000 CR3: 00000000a674b000 CR4: 00000000000006b0
> Stack:
>  ffffea0001b87bc0 ffff8800ada2b8f8 ffff88007f2de000 0000000000000000
>  ffff8800ada2b8d0 ffffffff81129593 ffff880000000000 ffffffff8105f8c0
>  ffffea0001b87bc0 ffff8800ada2b9f8 ffff88007f2de000 0000000000000000
> Call Trace:
>  [<ffffffff81129593>] rmap_walk+0x1b3/0x3f0
>  [<ffffffff8105f8c0>] ? finish_task_switch+0x70/0x260
>  [<ffffffff81129973>] page_referenced+0x1a3/0x220
>  [<ffffffff81127c10>] ? __page_check_address+0x1d0/0x1d0
>  [<ffffffff81128de0>] ? page_get_anon_vma+0xd0/0xd0
>  [<ffffffff81127580>] ? anon_vma_ctor+0x40/0x40
>  [<ffffffff81103e9e>] shrink_page_list+0x5ce/0xdc0
>  [<ffffffff81104d4c>] shrink_inactive_list+0x18c/0x4b0
>  [<ffffffff811059af>] shrink_lruvec+0x58f/0x730
>  [<ffffffff81105c24>] shrink_zone+0xd4/0x280
>  [<ffffffff81105efd>] do_try_to_free_pages+0x12d/0x3b0
>  [<ffffffff8110635d>] try_to_free_mem_cgroup_pages+0x9d/0x120
>  [<ffffffff8114e235>] try_charge+0x175/0x720
>  [<ffffffff810fdf80>] ? __activate_page+0x230/0x230
>  [<ffffffff81152005>] mem_cgroup_try_charge+0x85/0x1d0
>  [<ffffffff8111e69a>] handle_mm_fault+0xc9a/0x1000
>  [<ffffffff8106215b>] ? __set_cpus_allowed_ptr+0x9b/0x1a0
>  [<ffffffff81033629>] __do_page_fault+0x189/0x400
>  [<ffffffff810338ac>] do_page_fault+0xc/0x10
>  [<ffffffff81428782>] page_fault+0x22/0x30
> Code: c9 0f 84 b9 fe ff ff 8d 51 01 89 c8 f0 0f b1 16 39 c1 0f 84 11 ff ff ff 89 c1 eb e3 48 c7 c6 88 02 78 81 48 89 df e8 02 f3 fe ff <0f> 0b 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 45 31 f6 
> 41 55 4c 
> RIP  [<ffffffff81128f6e>] page_lock_anon_vma_read+0x18e/0x190
>  RSP <ffff8800ada2b868>
> ---[ end trace cfbb87f54f12290e ]---
> Kernel panic - not syncing: Fatal exception
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Kernel Offset: disabled
> 
> On Tue, Oct 20, 2015 at 10:38:54AM +0900, Minchan Kim wrote:
> > On Mon, Oct 19, 2015 at 07:01:50PM +0900, Minchan Kim wrote:
> > > On Mon, Oct 19, 2015 at 03:31:42PM +0900, Minchan Kim wrote:
> > > > Hello, it's too late since I sent previos patch.
> > > > https://lkml.org/lkml/2015/6/3/37
> > > > 
> > > > This patch is alomost new compared to previos approach.
> > > > I think this is more simple, clear and easy to review.
> > > > 
> > > > One thing I should notice is that I have tested this patch
> > > > and couldn't find any critical problem so I rebased patchset
> > > > onto recent mmotm(ie, mmotm-2015-10-15-15-20) to send formal
> > > > patchset. Unfortunately, I start to see sudden discarding of
> > > > the page we shouldn't do. IOW, application's valid anonymous page
> > > > was disappeared suddenly.
> > > > 
> > > > When I look through THP changes, I think we could lose
> > > > dirty bit of pte between freeze_page and unfreeze_page
> > > > when we mark it as migration entry and restore it.
> > > > So, I added below simple code without enough considering
> > > > and cannot see the problem any more.
> > > > I hope it's good hint to find right fix this problem.
> > > > 
> > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > > index d5ea516ffb54..e881c04f5950 100644
> > > > --- a/mm/huge_memory.c
> > > > +++ b/mm/huge_memory.c
> > > > @@ -3138,6 +3138,9 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page,
> > > >  		if (is_write_migration_entry(swp_entry))
> > > >  			entry = maybe_mkwrite(entry, vma);
> > > >  
> > > > +		if (PageDirty(page))
> > > > +			SetPageDirty(page);
> > > 
> > > The condition of PageDirty was typo. I didn't add the condition.
> > > Just added.
> > > 
> > >                 SetPageDirty(page);
> > 
> > For the first step to find this bug, I removed all MADV_FREE related
> > code in mmotm-2015-10-15-15-20. IOW, git checkout 54bad5da4834
> > (arm64: add pmd_[dirty|mkclean] for THP) so the tree doesn't have
> > any core code of MADV_FREE.
> > 
> > I tested following workloads in my KVM machine.
> > 
> > 0. make memcg
> > 1. limit memcg
> > 2. fork several processes
> > 3. each process allocates THP page and fill
> > 4. increase limit of the memcg to swapoff successfully
> > 5. swapoff
> > 6. kill all of processes
> > 7. goto 1
> > 
> > Within a few hours, I encounter following bug.
> > Attached detailed boot log and dmesg result.
> > 
> > 
> > Initializing cgroup subsys cpu
> > Command line: hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic console=ttyS0,115200 console=tty0 earlyprintk=ttyS0 ignore_loglevel ftrace_dump_on_oops vga=normal root=/dev/vda1 rw
> > KERNEL supported cpus:
> >   Intel GenuineIntel
> > x86/fpu: Legacy x87 FPU detected.
> > x86/fpu: Using 'lazy' FPU context switches.
> > e820: BIOS-provided physical RAM map:
> > BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> > BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> > BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
> > BIOS-e820: [mem 0x0000000000100000-0x00000000bfffbfff] usable
> > BIOS-e820: [mem 0x00000000bfffc000-0x00000000bfffffff] reserved
> > BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
> > BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
> > 
> > <snip>
> > 
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> > IP: [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> > PGD 0 
> > Oops: 0000 [#1] SMP 
> > Dumping ftrace buffer:
> >    (ftrace buffer empty)
> > Modules linked in:
> > CPU: 1 PID: 26445 Comm: sh Not tainted 4.3.0-rc5-mm1-diet-meta+ #1545
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > task: ffff8800b9af3480 ti: ffff88007fea0000 task.ti: ffff88007fea0000
> > RIP: 0010:[<ffffffff810782a9>]  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> > RSP: 0018:ffff88007fea3648  EFLAGS: 00010202
> > RAX: 0000000000000001 RBX: ffffea0002324900 RCX: ffff88007fea37e8
> > RDX: 0000000000000000 RSI: ffff88007fea36e8 RDI: 0000000000000008
> > RBP: ffff88007fea3648 R08: ffffffff818446a0 R09: ffff8800b9af4c80
> > R10: 0000000000000216 R11: 0000000000000001 R12: ffff88007f58d6e1
> > R13: ffff88007f58d6e0 R14: 0000000000000008 R15: 0000000000000001
> > FS:  00007f0993e78740(0000) GS:ffff8800bfa20000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 0000000000000008 CR3: 000000007edee000 CR4: 00000000000006a0
> > Stack:
> >  ffff88007fea3678 ffffffff81124ff0 ffffea0002324900 ffff88007fea36e8
> >  ffff88009ffe8400 0000000000000000 ffff88007fea36c0 ffffffff81125733
> >  ffff8800bfa34540 ffffffff8105dc9d ffffea0002324900 ffff88007fea37e8
> > Call Trace:
> >  [<ffffffff81124ff0>] page_lock_anon_vma_read+0x60/0x180
> >  [<ffffffff81125733>] rmap_walk+0x1b3/0x3f0
> >  [<ffffffff8105dc9d>] ? finish_task_switch+0x5d/0x1f0
> >  [<ffffffff81125b13>] page_referenced+0x1a3/0x220
> >  [<ffffffff81123e30>] ? __page_check_address+0x1a0/0x1a0
> >  [<ffffffff81124f90>] ? page_get_anon_vma+0xd0/0xd0
> >  [<ffffffff81123820>] ? anon_vma_ctor+0x40/0x40
> >  [<ffffffff8110087b>] shrink_page_list+0x5ab/0xde0
> >  [<ffffffff8110174c>] shrink_inactive_list+0x18c/0x4b0
> >  [<ffffffff811023bd>] shrink_lruvec+0x59d/0x740
> >  [<ffffffff811025f0>] shrink_zone+0x90/0x250
> >  [<ffffffff811028dd>] do_try_to_free_pages+0x12d/0x3b0
> >  [<ffffffff81102d3d>] try_to_free_mem_cgroup_pages+0x9d/0x120
> >  [<ffffffff811496c3>] try_charge+0x163/0x700
> >  [<ffffffff81149cb4>] mem_cgroup_do_precharge+0x54/0x70
> >  [<ffffffff81149e45>] mem_cgroup_can_attach+0x175/0x1b0
> >  [<ffffffff811b2c57>] ? kernfs_iattrs.isra.6+0x37/0xd0
> >  [<ffffffff81148e70>] ? get_mctgt_type+0x320/0x320
> >  [<ffffffff810a9d29>] cgroup_migrate+0x149/0x440
> >  [<ffffffff810aa60c>] cgroup_attach_task+0x7c/0xe0
> >  [<ffffffff810aa904>] __cgroup_procs_write.isra.33+0x1d4/0x2b0
> >  [<ffffffff810aaa10>] cgroup_tasks_write+0x10/0x20
> >  [<ffffffff810a6238>] cgroup_file_write+0x38/0xf0
> >  [<ffffffff811b54ad>] kernfs_fop_write+0x11d/0x170
> >  [<ffffffff81153918>] __vfs_write+0x28/0xe0
> >  [<ffffffff8116e614>] ? __fd_install+0x24/0xc0
> >  [<ffffffff810784a1>] ? percpu_down_read+0x21/0x50
> >  [<ffffffff81153e91>] vfs_write+0xa1/0x170
> >  [<ffffffff81154716>] SyS_write+0x46/0xa0
> >  [<ffffffff81420a17>] entry_SYSCALL_64_fastpath+0x12/0x6a
> > Code: 5e 82 3a 00 48 83 c4 08 5b 5d c3 48 89 45 f0 e8 9b 6a 3a 00 48 8b 45 f0 eb df 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 <48> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7 
> > RIP  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> >  RSP <ffff88007fea3648>
> > CR2: 0000000000000008
> > BUG: unable to handle kernel ---[ end trace e81a82c8122b447d ]---
> > Kernel panic - not syncing: Fatal exception
> > 
> > NULL pointer dereference at 0000000000000008
> > IP: [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> > PGD 0 
> > Oops: 0000 [#2] SMP 
> > Dumping ftrace buffer:
> >    (ftrace buffer empty)
> > Modules linked in:
> > CPU: 10 PID: 59 Comm: khugepaged Tainted: G      D         4.3.0-rc5-mm1-diet-meta+ #1545
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > task: ffff8800b9851a40 ti: ffff8800b985c000 task.ti: ffff8800b985c000
> > RIP: 0010:[<ffffffff810782a9>]  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> > RSP: 0018:ffff8800b985f778  EFLAGS: 00010202
> > RAX: 0000000000000001 RBX: ffffea0002321800 RCX: ffff8800b985f918
> > RDX: 0000000000000000 RSI: ffff8800b985f818 RDI: 0000000000000008
> > RBP: ffff8800b985f778 R08: ffffffff818446a0 R09: ffff8800b9853240
> > R10: 000000000000ba03 R11: 0000000000000001 R12: ffff88007f58d6e1
> > R13: ffff88007f58d6e0 R14: 0000000000000008 R15: 0000000000000001
> > FS:  0000000000000000(0000) GS:ffff8800bfb40000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 0000000000000008 CR3: 0000000001808000 CR4: 00000000000006a0
> > Stack:
> >  ffff8800b985f7a8 ffffffff81124ff0 ffffea0002321800 ffff8800b985f818
> >  ffff88009ffe8400 0000000000000000 ffff8800b985f7f0 ffffffff81125733
> >  ffff8800bfb54540 ffffffff8105dc9d ffffea0002321800 ffff8800b985f918
> > Call Trace:
> >  [<ffffffff81124ff0>] page_lock_anon_vma_read+0x60/0x180
> >  [<ffffffff81125733>] rmap_walk+0x1b3/0x3f0
> >  [<ffffffff8105dc9d>] ? finish_task_switch+0x5d/0x1f0
> >  [<ffffffff81125b13>] page_referenced+0x1a3/0x220
> >  [<ffffffff81123e30>] ? __page_check_address+0x1a0/0x1a0
> >  [<ffffffff81124f90>] ? page_get_anon_vma+0xd0/0xd0
> >  [<ffffffff81123820>] ? anon_vma_ctor+0x40/0x40
> >  [<ffffffff8110087b>] shrink_page_list+0x5ab/0xde0
> >  [<ffffffff8110174c>] shrink_inactive_list+0x18c/0x4b0
> >  [<ffffffff811023bd>] shrink_lruvec+0x59d/0x740
> >  [<ffffffff811025f0>] shrink_zone+0x90/0x250
> >  [<ffffffff811028dd>] do_try_to_free_pages+0x12d/0x3b0
> >  [<ffffffff81102d3d>] try_to_free_mem_cgroup_pages+0x9d/0x120
> >  [<ffffffff811496c3>] try_charge+0x163/0x700
> >  [<ffffffff8141d1f3>] ? schedule+0x33/0x80
> >  [<ffffffff8114d45f>] mem_cgroup_try_charge+0x9f/0x1d0
> >  [<ffffffff811434bc>] khugepaged+0x7cc/0x1ac0
> >  [<ffffffff81066e01>] ? hrtick_update+0x1/0x70
> >  [<ffffffff81072430>] ? prepare_to_wait_event+0xf0/0xf0
> >  [<ffffffff81142cf0>] ? total_mapcount+0x70/0x70
> >  [<ffffffff81056cd9>] kthread+0xc9/0xe0
> >  [<ffffffff81056c10>] ? kthread_park+0x60/0x60
> >  [<ffffffff81420d6f>] ret_from_fork+0x3f/0x70
> >  [<ffffffff81056c10>] ? kthread_park+0x60/0x60
> > Code: 5e 82 3a 00 48 83 c4 08 5b 5d c3 48 89 45 f0 e8 9b 6a 3a 00 48 8b 45 f0 eb df 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 <48> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7 
> > RIP  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> >  RSP <ffff8800b985f778>
> > CR2: 0000000000000008
> > ---[ end trace e81a82c8122b447e ]---
> > Shutting down cpus with NMI
> > Dumping ftrace buffer:
> >    (ftrace buffer empty)
> > Kernel Offset: disabled
> > 
> 
> > QEMU 2.0.0 monitor - type 'help' for more information
> > (qemu) s^[[Kearly console in setup code
> > Initializing cgroup subsys cpu
> > Linux version 4.3.0-rc5-mm1-diet-meta+ (barrios@bbox) (gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04) ) #1545 SMP Tue Oct 20 08:55:45 KST 2015
> > Command line: hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic console=ttyS0,115200 console=tty0 earlyprintk=ttyS0 ignore_loglevel ftrace_dump_on_oops vga=normal root=/dev/vda1 rw
> > KERNEL supported cpus:
> >   Intel GenuineIntel
> > x86/fpu: Legacy x87 FPU detected.
> > x86/fpu: Using 'lazy' FPU context switches.
> > e820: BIOS-provided physical RAM map:
> > BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> > BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> > BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
> > BIOS-e820: [mem 0x0000000000100000-0x00000000bfffbfff] usable
> > BIOS-e820: [mem 0x00000000bfffc000-0x00000000bfffffff] reserved
> > BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
> > BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
> > bootconsole [earlyser0] enabled
> > debug: ignoring loglevel setting.
> > NX (Execute Disable) protection: active
> > SMBIOS 2.4 present.
> > DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
> > e820: remove [mem 0x000a0000-0x000fffff] usable
> > e820: last_pfn = 0xbfffc max_arch_pfn = 0x400000000
> > MTRR default type: write-back
> > MTRR fixed ranges enabled:
> >   00000-9FFFF write-back
> >   A0000-BFFFF uncachable
> >   C0000-FFFFF write-protect
> > MTRR variable ranges enabled:
> >   0 base 00C0000000 mask FFC0000000 uncachable
> >   1 disabled
> >   2 disabled
> >   3 disabled
> >   4 disabled
> >   5 disabled
> >   6 disabled
> >   7 disabled
> > x86/PAT: PAT not supported by CPU.
> > Scan for SMP in [mem 0x00000000-0x000003ff]
> > Scan for SMP in [mem 0x0009fc00-0x0009ffff]
> > Scan for SMP in [mem 0x000f0000-0x000fffff]
> > found SMP MP-table at [mem 0x000f0a70-0x000f0a7f] mapped at [ffff8800000f0a70]
> >   mpc: f0a80-f0c44
> > Scanning 1 areas for low memory corruption
> > Base memory trampoline at [ffff880000099000] 99000 size 24576
> > init_memory_mapping: [mem 0x00000000-0x000fffff]
> >  [mem 0x00000000-0x000fffff] page 4k
> > BRK [0x0220e000, 0x0220efff] PGTABLE
> > BRK [0x0220f000, 0x0220ffff] PGTABLE
> > BRK [0x02210000, 0x02210fff] PGTABLE
> > init_memory_mapping: [mem 0xbfc00000-0xbfdfffff]
> >  [mem 0xbfc00000-0xbfdfffff] page 2M
> > BRK [0x02211000, 0x02211fff] PGTABLE
> > init_memory_mapping: [mem 0xa0000000-0xbfbfffff]
> >  [mem 0xa0000000-0xbfbfffff] page 2M
> > init_memory_mapping: [mem 0x80000000-0x9fffffff]
> >  [mem 0x80000000-0x9fffffff] page 2M
> > init_memory_mapping: [mem 0x00100000-0x7fffffff]
> >  [mem 0x00100000-0x001fffff] page 4k
> >  [mem 0x00200000-0x7fffffff] page 2M
> > init_memory_mapping: [mem 0xbfe00000-0xbfffbfff]
> >  [mem 0xbfe00000-0xbfffbfff] page 4k
> > BRK [0x02212000, 0x02212fff] PGTABLE
> > RAMDISK: [mem 0x7851a000-0x7fffffff]
> >  [ffffea0000000000-ffffea0002ffffff] PMD -> [ffff8800bc400000-ffff8800bf3fffff] on node 0
> > Zone ranges:
> >   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> >   DMA32    [mem 0x0000000001000000-0x00000000bfffbfff]
> >   Normal   empty
> > Movable zone start for each node
> > Early memory node ranges
> >   node   0: [mem 0x0000000000001000-0x000000000009efff]
> >   node   0: [mem 0x0000000000100000-0x00000000bfffbfff]
> > Initmem setup node 0 [mem 0x0000000000001000-0x00000000bfffbfff]
> > On node 0 totalpages: 786330
> >   DMA zone: 64 pages used for memmap
> >   DMA zone: 21 pages reserved
> >   DMA zone: 3998 pages, LIFO batch:0
> >   DMA32 zone: 12224 pages used for memmap
> >   DMA32 zone: 782332 pages, LIFO batch:31
> > Intel MultiProcessor Specification v1.4
> >   mpc: f0a80-f0c44
> > MPTABLE: OEM ID: BOCHSCPU
> > MPTABLE: Product ID: 0.1         
> > MPTABLE: APIC at: 0xFEE00000
> > mapped APIC to ffffffffff5fd000 (        fee00000)
> > Processor #0 (Bootup-CPU)
> > Processor #1
> > Processor #2
> > Processor #3
> > Processor #4
> > Processor #5
> > Processor #6
> > Processor #7
> > Processor #8
> > Processor #9
> > Processor #10
> > Processor #11
> > Bus #0 is PCI   
> > Bus #1 is ISA   
> > IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
> > Int: type 0, pol 1, trig 0, bus 00, IRQ 04, APIC ID 0, APIC INT 09
> > Int: type 0, pol 1, trig 0, bus 00, IRQ 0c, APIC ID 0, APIC INT 0b
> > Int: type 0, pol 1, trig 0, bus 00, IRQ 10, APIC ID 0, APIC INT 0b
> > Int: type 0, pol 1, trig 0, bus 00, IRQ 14, APIC ID 0, APIC INT 0a
> > Int: type 0, pol 1, trig 0, bus 00, IRQ 18, APIC ID 0, APIC INT 0a
> > Int: type 0, pol 0, trig 0, bus 01, IRQ 00, APIC ID 0, APIC INT 02
> > Int: type 0, pol 0, trig 0, bus 01, IRQ 01, APIC ID 0, APIC INT 01
> > Int: type 0, pol 0, trig 0, bus 01, IRQ 03, APIC ID 0, APIC INT 03
> > Int: type 0, pol 0, trig 0, bus 01, IRQ 04, APIC ID 0, APIC INT 04
> > Int: type 0, pol 0, trig 0, bus 01, IRQ 06, APIC ID 0, APIC INT 06
> > Int: type 0, pol 0, trig 0, bus 01, IRQ 07, APIC ID 0, APIC INT 07
> > Int: type 0, pol 0, trig 0, bus 01, IRQ 08, APIC ID 0, APIC INT 08
> > Int: type 0, pol 0, trig 0, bus 01, IRQ 0c, APIC ID 0, APIC INT 0c
> > Int: type 0, pol 0, trig 0, bus 01, IRQ 0d, APIC ID 0, APIC INT 0d
> > Int: type 0, pol 0, trig 0, bus 01, IRQ 0e, APIC ID 0, APIC INT 0e
> > Int: type 0, pol 0, trig 0, bus 01, IRQ 0f, APIC ID 0, APIC INT 0f
> > Lint: type 3, pol 0, trig 0, bus 01, IRQ 00, APIC ID 0, APIC LINT 00
> > Lint: type 1, pol 0, trig 0, bus 01, IRQ 00, APIC ID ff, APIC LINT 01
> > Processors: 12
> > smpboot: Allowing 12 CPUs, 0 hotplug CPUs
> > mapped IOAPIC to ffffffffff5fc000 (fec00000)
> > e820: [mem 0xc0000000-0xfeffbfff] available for PCI devices
> > clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
> > setup_percpu: NR_CPUS:16 nr_cpumask_bits:16 nr_cpu_ids:12 nr_node_ids:1
> > PERCPU: Embedded 31 pages/cpu @ffff8800bfa00000 s87640 r8192 d31144 u131072
> > pcpu-alloc: s87640 r8192 d31144 u131072 alloc=1*2097152
> > pcpu-alloc: [0] 00 01 02 03 04 05 06 07 08 09 10 11 -- -- -- -- 
> > Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 774021
> > Kernel command line: hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic console=ttyS0,115200 console=tty0 earlyprintk=ttyS0 ignore_loglevel ftrace_dump_on_oops vga=normal root=/dev/vda1 rw
> > sysrq: sysrq always enabled.
> > log_buf_len individual max cpu contribution: 2097152 bytes
> > log_buf_len total cpu_extra contributions: 23068672 bytes
> > log_buf_len min size: 8388608 bytes
> > log_buf_len: 33554432 bytes
> > early log buf free: 8380096(99%)
> > PID hash table entries: 4096 (order: 3, 32768 bytes)
> > Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
> > Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
> > Memory: 2911172K/3145320K available (4237K kernel code, 721K rwdata, 1988K rodata, 936K init, 8608K bss, 234148K reserved, 0K cma-reserved)
> > SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=12, Nodes=1
> > Hierarchical RCU implementation.
> > 	Build-time adjustment of leaf fanout to 64.
> > 	RCU restricting CPUs from NR_CPUS=16 to nr_cpu_ids=12.
> > RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=12
> > NR_IRQS:4352 nr_irqs:136 16
> > Console: colour VGA+ 80x25
> > console [tty0] enabled
> > bootconsole [earlyser0] disabled
> > Initializing cgroup subsys cpu
> > Linux version 4.3.0-rc5-mm1-diet-meta+ (barrios@bbox) (gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04) ) #1545 SMP Tue Oct 20 08:55:45 KST 2015
> > Command line: hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic console=ttyS0,115200 console=tty0 earlyprintk=ttyS0 ignore_loglevel ftrace_dump_on_oops vga=normal root=/dev/vda1 rw
> > KERNEL supported cpus:
> >   Intel GenuineIntel
> > x86/fpu: Legacy x87 FPU detected.
> > x86/fpu: Using 'lazy' FPU context switches.
> > e820: BIOS-provided physical RAM map:
> > BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> > BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> > BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
> > BIOS-e820: [mem 0x0000000000100000-0x00000000bfffbfff] usable
> > BIOS-e820: [mem 0x00000000bfffc000-0x00000000bfffffff] reserved
> > BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
> > BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
> > bootconsole [earlyser0] enabled
> > debug: ignoring loglevel setting.
> > NX (Execute Disable) protection: active
> > SMBIOS 2.4 present.
> > DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
> > e820: remove [mem 0x000a0000-0x000fffff] usable
> > e820: last_pfn = 0xbfffc max_arch_pfn = 0x400000000
> > MTRR default type: write-back
> > MTRR fixed ranges enabled:
> >   00000-9FFFF write-back
> >   A0000-BFFFF uncachable
> >   C0000-FFFFF write-protect
> > MTRR variable ranges enabled:
> >   0 base 00C0000000 mask FFC0000000 uncachable
> >   1 disabled
> >   2 disabled
> >   3 disabled
> >   4 disabled
> >   5 disabled
> >   6 disabled
> >   7 disabled
> > x86/PAT: PAT not supported by CPU.
> > Scan for SMP in [mem 0x00000000-0x000003ff]
> > Scan for SMP in [mem 0x0009fc00-0x0009ffff]
> > Scan for SMP in [mem 0x000f0000-0x000fffff]
> > found SMP MP-table at [mem 0x000f0a70-0x000f0a7f] mapped at [ffff8800000f0a70]
> >   mpc: f0a80-f0c44
> > Scanning 1 areas for low memory corruption
> > Base memory trampoline at [ffff880000099000] 99000 size 24576
> > init_memory_mapping: [mem 0x00000000-0x000fffff]
> >  [mem 0x00000000-0x000fffff] page 4k
> > BRK [0x0220e000, 0x0220efff] PGTABLE
> > BRK [0x0220f000, 0x0220ffff] PGTABLE
> > BRK [0x02210000, 0x02210fff] PGTABLE
> > init_memory_mapping: [mem 0xbfc00000-0xbfdfffff]
> >  [mem 0xbfc00000-0xbfdfffff] page 2M
> > BRK [0x02211000, 0x02211fff] PGTABLE
> > init_memory_mapping: [mem 0xa0000000-0xbfbfffff]
> >  [mem 0xa0000000-0xbfbfffff] page 2M
> > init_memory_mapping: [mem 0x80000000-0x9fffffff]
> >  [mem 0x80000000-0x9fffffff] page 2M
> > init_memory_mapping: [mem 0x00100000-0x7fffffff]
> >  [mem 0x00100000-0x001fffff] page 4k
> >  [mem 0x00200000-0x7fffffff] page 2M
> > init_memory_mapping: [mem 0xbfe00000-0xbfffbfff]
> >  [mem 0xbfe00000-0xbfffbfff] page 4k
> > BRK [0x02212000, 0x02212fff] PGTABLE
> > RAMDISK: [mem 0x7851a000-0x7fffffff]
> >  [ffffea0000000000-ffffea0002ffffff] PMD -> [ffff8800bc400000-ffff8800bf3fffff] on node 0
> > Zone ranges:
> >   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> >   DMA32    [mem 0x0000000001000000-0x00000000bfffbfff]
> >   Normal   empty
> > Movable zone start for each node
> > Early memory node ranges
> >   node   0: [mem 0x0000000000001000-0x000000000009efff]
> >   node   0: [mem 0x0000000000100000-0x00000000bfffbfff]
> > Initmem setup node 0 [mem 0x0000000000001000-0x00000000bfffbfff]
> > On node 0 totalpages: 786330
> >   DMA zone: 64 pages used for memmap
> >   DMA zone: 21 pages reserved
> >   DMA zone: 3998 pages, LIFO batch:0
> >   DMA32 zone: 12224 pages used for memmap
> >   DMA32 zone: 782332 pages, LIFO batch:31
> > Intel MultiProcessor Specification v1.4
> >   mpc: f0a80-f0c44
> > MPTABLE: OEM ID: BOCHSCPU
> > MPTABLE: Product ID: 0.1         
> > MPTABLE: APIC at: 0xFEE00000
> > mapped APIC to ffffffffff5fd000 (        fee00000)
> > Processor #0 (Bootup-CPU)
> > Processor #1
> > Processor #2
> > Processor #3
> > Processor #4
> > Processor #5
> > Processor #6
> > Processor #7
> > Processor #8
> > Processor #9
> > Processor #10
> > Processor #11
> > Bus #0 is PCI   
> > Bus #1 is ISA   
> > IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
> > Int: type 0, pol 1, trig 0, bus 00, IRQ 04, APIC ID 0, APIC INT 09
> > Int: type 0, pol 1, trig 0, bus 00, IRQ 0c, APIC ID 0, APIC INT 0b
> > Int: type 0, pol 1, trig 0, bus 00, IRQ 10, APIC ID 0, APIC INT 0b
> > Int: type 0, pol 1, trig 0, bus 00, IRQ 14, APIC ID 0, APIC INT 0a
> > Int: type 0, pol 1, trig 0, bus 00, IRQ 18, APIC ID 0, APIC INT 0a
> > Int: type 0, pol 0, trig 0, bus 01, IRQ 00, APIC ID 0, APIC INT 02
> > Int: type 0, pol 0, trig 0, bus 01, IRQ 01, APIC ID 0, APIC INT 01
> > Int: type 0, pol 0, trig 0, bus 01, IRQ 03, APIC ID 0, APIC INT 03
> > Int: type 0, pol 0, trig 0, bus 01, IRQ 04, APIC ID 0, APIC INT 04
> > Int: type 0, pol 0, trig 0, bus 01, IRQ 06, APIC ID 0, APIC INT 06
> > Int: type 0, pol 0, trig 0, bus 01, IRQ 07, APIC ID 0, APIC INT 07
> > Int: type 0, pol 0, trig 0, bus 01, IRQ 08, APIC ID 0, APIC INT 08
> > Int: type 0, pol 0, trig 0, bus 01, IRQ 0c, APIC ID 0, APIC INT 0c
> > Int: type 0, pol 0, trig 0, bus 01, IRQ 0d, APIC ID 0, APIC INT 0d
> > Int: type 0, pol 0, trig 0, bus 01, IRQ 0e, APIC ID 0, APIC INT 0e
> > Int: type 0, pol 0, trig 0, bus 01, IRQ 0f, APIC ID 0, APIC INT 0f
> > Lint: type 3, pol 0, trig 0, bus 01, IRQ 00, APIC ID 0, APIC LINT 00
> > Lint: type 1, pol 0, trig 0, bus 01, IRQ 00, APIC ID ff, APIC LINT 01
> > Processors: 12
> > smpboot: Allowing 12 CPUs, 0 hotplug CPUs
> > mapped IOAPIC to ffffffffff5fc000 (fec00000)
> > e820: [mem 0xc0000000-0xfeffbfff] available for PCI devices
> > clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
> > setup_percpu: NR_CPUS:16 nr_cpumask_bits:16 nr_cpu_ids:12 nr_node_ids:1
> > PERCPU: Embedded 31 pages/cpu @ffff8800bfa00000 s87640 r8192 d31144 u131072
> > pcpu-alloc: s87640 r8192 d31144 u131072 alloc=1*2097152
> > pcpu-alloc: [0] 00 01 02 03 04 05 06 07 08 09 10 11 -- -- -- -- 
> > Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 774021
> > Kernel command line: hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic console=ttyS0,115200 console=tty0 earlyprintk=ttyS0 ignore_loglevel ftrace_dump_on_oops vga=normal root=/dev/vda1 rw
> > sysrq: sysrq always enabled.
> > log_buf_len individual max cpu contribution: 2097152 bytes
> > log_buf_len total cpu_extra contributions: 23068672 bytes
> > log_buf_len min size: 8388608 bytes
> > log_buf_len: 33554432 bytes
> > early log buf free: 8380096(99%)
> > PID hash table entries: 4096 (order: 3, 32768 bytes)
> > Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
> > Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
> > Memory: 2911172K/3145320K available (4237K kernel code, 721K rwdata, 1988K rodata, 936K init, 8608K bss, 234148K reserved, 0K cma-reserved)
> > SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=12, Nodes=1
> > Hierarchical RCU implementation.
> > 	Build-time adjustment of leaf fanout to 64.
> > 	RCU restricting CPUs from NR_CPUS=16 to nr_cpu_ids=12.
> > RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=12
> > NR_IRQS:4352 nr_irqs:136 16
> > Console: colour VGA+ 80x25
> > console [tty0] enabled
> > bootconsole [earlyser0] disabled
> > console [ttyS0] enabled
> > tsc: Fast TSC calibration using PIT
> > tsc: Detected 3199.926 MHz processor
> > Calibrating delay loop (skipped), value calculated using timer frequency.. 6399.85 BogoMIPS (lpj=12799704)
> > pid_max: default: 32768 minimum: 301
> > Mount-cache hash table entries: 8192 (order: 4, 65536 bytes)
> > Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes)
> > Initializing cgroup subsys memory
> > Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
> > Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
> > Freeing SMP alternatives memory: 20K (ffffffff819a0000 - ffffffff819a5000)
> > ftrace: allocating 16664 entries in 66 pages
> > Switched APIC routing to physical flat.
> > enabled ExtINT on CPU#0
> > ENABLING IO-APIC IRQs
> > init IO_APIC IRQs
> >  apic 0 pin 0 not connected
> > IOAPIC[0]: Set routing entry (0-1 -> 0x31 -> IRQ 1 Mode:0 Active:0 Dest:0)
> > IOAPIC[0]: Set routing entry (0-2 -> 0x30 -> IRQ 0 Mode:0 Active:0 Dest:0)
> > IOAPIC[0]: Set routing entry (0-3 -> 0x33 -> IRQ 3 Mode:0 Active:0 Dest:0)
> > IOAPIC[0]: Set routing entry (0-4 -> 0x34 -> IRQ 4 Mode:0 Active:0 Dest:0)
> >  apic 0 pin 5 not connected
> > IOAPIC[0]: Set routing entry (0-6 -> 0x36 -> IRQ 6 Mode:0 Active:0 Dest:0)
> > IOAPIC[0]: Set routing entry (0-7 -> 0x37 -> IRQ 7 Mode:0 Active:0 Dest:0)
> > IOAPIC[0]: Set routing entry (0-8 -> 0x38 -> IRQ 8 Mode:0 Active:0 Dest:0)
> > IOAPIC[0]: Set routing entry (0-9 -> 0x39 -> IRQ 9 Mode:1 Active:0 Dest:0)
> > IOAPIC[0]: Set routing entry (0-10 -> 0x3a -> IRQ 10 Mode:1 Active:0 Dest:0)
> > IOAPIC[0]: Set routing entry (0-11 -> 0x3b -> IRQ 11 Mode:1 Active:0 Dest:0)
> > IOAPIC[0]: Set routing entry (0-12 -> 0x3c -> IRQ 12 Mode:0 Active:0 Dest:0)
> > IOAPIC[0]: Set routing entry (0-13 -> 0x3d -> IRQ 13 Mode:0 Active:0 Dest:0)
> > IOAPIC[0]: Set routing entry (0-14 -> 0x3e -> IRQ 14 Mode:0 Active:0 Dest:0)
> > IOAPIC[0]: Set routing entry (0-15 -> 0x3f -> IRQ 15 Mode:0 Active:0 Dest:0)
> >  apic 0 pin 16 not connected
> >  apic 0 pin 17 not connected
> >  apic 0 pin 18 not connected
> >  apic 0 pin 19 not connected
> >  apic 0 pin 20 not connected
> >  apic 0 pin 21 not connected
> >  apic 0 pin 22 not connected
> >  apic 0 pin 23 not connected
> > ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> > Using local APIC timer interrupts.
> > calibrating APIC timer ...
> > ... lapic delta = 6251755
> > ..... delta 6251755
> > ..... mult: 268510832
> > ..... calibration result: 4001123
> > ..... CPU clock speed is 3200.3592 MHz.
> > ..... host bus clock speed is 1000.1123 MHz.
> > ... verify APIC timer
> > ... jiffies delta = 25
> > ... jiffies result ok
> > smpboot: CPU0: Intel QEMU Virtual CPU version 2.0.0 (family: 0x6, model: 0x6, stepping: 0x3)
> > Performance Events: Broken PMU hardware detected, using software events only.
> > Failed to access perfctr msr (MSR c2 is 0)
> > x86: Booting SMP configuration:
> > .... node  #0, CPUs:        #1
> > masked ExtINT on CPU#1
> >   #2
> > masked ExtINT on CPU#2
> >   #3
> > masked ExtINT on CPU#3
> >   #4
> > masked ExtINT on CPU#4
> >   #5
> > masked ExtINT on CPU#5
> >   #6
> > masked ExtINT on CPU#6
> >   #7
> > masked ExtINT on CPU#7
> >   #8
> > masked ExtINT on CPU#8
> >   #9
> > masked ExtINT on CPU#9
> >  #10
> > masked ExtINT on CPU#10
> >  #11
> > masked ExtINT on CPU#11
> > x86: Booted up 1 node, 12 CPUs
> > smpboot: Total of 12 processors activated (76818.13 BogoMIPS)
> > devtmpfs: initialized
> > clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
> > NET: Registered protocol family 16
> > PCI: Using configuration type 1 for base access
> > vgaarb: loaded
> > SCSI subsystem initialized
> > libata version 3.00 loaded.
> > PCI: Probing PCI hardware
> > PCI: root bus 00: using default resources
> > PCI: Probing PCI hardware (bus 00)
> > PCI host bridge to bus 0000:00
> > pci_bus 0000:00: root bus resource [io  0x0000-0xffff]
> > pci_bus 0000:00: root bus resource [mem 0x00000000-0xffffffffff]
> > pci_bus 0000:00: No busn resource found for root bus, will use [bus 00-ff]
> > pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
> > pci 0000:00:01.0: [8086:7000] type 00 class 0x060100
> > pci 0000:00:01.1: [8086:7010] type 00 class 0x010180
> > pci 0000:00:01.1: reg 0x20: [io  0xc0c0-0xc0cf]
> > pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io  0x01f0-0x01f7]
> > pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io  0x03f6]
> > pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io  0x0170-0x0177]
> > pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io  0x0376]
> > pci 0000:00:01.3: [8086:7113] type 00 class 0x068000
> > pci 0000:00:02.0: [1013:00b8] type 00 class 0x030000
> > pci 0000:00:02.0: reg 0x10: [mem 0xfc000000-0xfdffffff pref]
> > pci 0000:00:02.0: reg 0x14: [mem 0xfebd0000-0xfebd0fff]
> > pci 0000:00:02.0: reg 0x30: [mem 0xfebc0000-0xfebcffff pref]
> > vgaarb: setting as boot device: PCI:0000:00:02.0
> > vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
> > pci 0000:00:03.0: [1af4:1000] type 00 class 0x020000
> > pci 0000:00:03.0: reg 0x10: [io  0xc080-0xc09f]
> > pci 0000:00:03.0: reg 0x14: [mem 0xfebd1000-0xfebd1fff]
> > pci 0000:00:03.0: reg 0x30: [mem 0xfeb80000-0xfebbffff pref]
> > pci 0000:00:04.0: [1af4:1002] type 00 class 0x00ff00
> > pci 0000:00:04.0: reg 0x10: [io  0xc0a0-0xc0bf]
> > pci 0000:00:05.0: [1af4:1001] type 00 class 0x010000
> > pci 0000:00:05.0: reg 0x10: [io  0xc000-0xc03f]
> > pci 0000:00:05.0: reg 0x14: [mem 0xfebd2000-0xfebd2fff]
> > pci 0000:00:06.0: [1af4:1001] type 00 class 0x010000
> > pci 0000:00:06.0: reg 0x10: [io  0xc040-0xc07f]
> > pci 0000:00:06.0: reg 0x14: [mem 0xfebd3000-0xfebd3fff]
> > pci 0000:00:07.0: [8086:25ab] type 00 class 0x088000
> > pci 0000:00:07.0: reg 0x10: [mem 0xfebd4000-0xfebd400f]
> > pci_bus 0000:00: busn_res: [bus 00-ff] end is updated to 00
> > pci 0000:00:01.0: PIIX/ICH IRQ router [8086:7000]
> > PCI: pci_cache_line_size set to 64 bytes
> > e820: reserve RAM buffer [mem 0x0009fc00-0x0009ffff]
> > e820: reserve RAM buffer [mem 0xbfffc000-0xbfffffff]
> > clocksource: Switched to clocksource refined-jiffies
> > pci_bus 0000:00: resource 4 [io  0x0000-0xffff]
> > pci_bus 0000:00: resource 5 [mem 0x00000000-0xffffffffff]
> > NET: Registered protocol family 2
> > TCP established hash table entries: 32768 (order: 6, 262144 bytes)
> > TCP bind hash table entries: 32768 (order: 7, 524288 bytes)
> > TCP: Hash tables configured (established 32768 bind 32768)
> > UDP hash table entries: 2048 (order: 4, 65536 bytes)
> > UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes)
> > NET: Registered protocol family 1
> > Trying to unpack rootfs image as initramfs...
> > Freeing initrd memory: 125848K (ffff88007851a000 - ffff880080000000)
> > platform rtc_cmos: registered platform RTC device (no PNP device found)
> > Scanning for low memory corruption every 60 seconds
> > futex hash table entries: 4096 (order: 6, 262144 bytes)
> > HugeTLB registered 2 MB page size, pre-allocated 0 pages
> > fuse init (API version 7.23)
> > 9p: Installing v9fs 9p2000 file system support
> > cryptomgr_test (74) used greatest stack depth: 15352 bytes left
> > cryptomgr_test (82) used greatest stack depth: 15136 bytes left
> > Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251)
> > io scheduler noop registered
> > io scheduler deadline registered
> > io scheduler cfq registered (default)
> > querying PCI -> IRQ mapping bus:0, slot:3, pin:0.
> > virtio-pci 0000:00:03.0: PCI->APIC IRQ transform: INT A -> IRQ 11
> > virtio-pci 0000:00:03.0: virtio_pci: leaving for legacy driver
> > querying PCI -> IRQ mapping bus:0, slot:4, pin:0.
> > virtio-pci 0000:00:04.0: PCI->APIC IRQ transform: INT A -> IRQ 11
> > virtio-pci 0000:00:04.0: virtio_pci: leaving for legacy driver
> > querying PCI -> IRQ mapping bus:0, slot:5, pin:0.
> > virtio-pci 0000:00:05.0: PCI->APIC IRQ transform: INT A -> IRQ 10
> > virtio-pci 0000:00:05.0: virtio_pci: leaving for legacy driver
> > querying PCI -> IRQ mapping bus:0, slot:6, pin:0.
> > virtio-pci 0000:00:06.0: PCI->APIC IRQ transform: INT A -> IRQ 10
> > virtio-pci 0000:00:06.0: virtio_pci: leaving for legacy driver
> > Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
> > serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> > Linux agpgart interface v0.103
> > brd: module loaded
> > loop: module loaded
> >  vda: vda1 vda2 < vda5 >
> > zram: Added device: zram0
> > libphy: Fixed MDIO Bus: probed
> > tun: Universal TUN/TAP device driver, 1.6
> > tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
> > serio: i8042 KBD port at 0x60,0x64 irq 1
> > serio: i8042 AUX port at 0x60,0x64 irq 12
> > mousedev: PS/2 mouse device common for all mice
> > rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
> > rtc_cmos rtc_cmos: alarms up to one day, 114 bytes nvram
> > device-mapper: ioctl: 4.33.0-ioctl (2015-8-18) initialised: dm-devel@redhat.com
> > device-mapper: cache cleaner: version 1.0.0 loaded
> > NET: Registered protocol family 17
> > 9pnet: Installing 9P2000 support
> > ... APIC ID:      00000000 (0)
> > ... APIC VERSION: 01050014
> > 0000000000000000000000000000000000000000000000000000000000000000
> > 000000000e000000000000000000000000000000000000000000000000000000
> > 0000000000020000000000000000000000000000000000000000000000008000
> > 
> > number of MP IRQ sources: 16.
> > number of IO-APIC #0 registers: 24.
> > testing the IO APIC.......................
> > IO APIC #0......
> > .... register #00: 00000000
> > .......    : physical APIC id: 00
> > .......    : Delivery Type: 0
> > .......    : LTS          : 0
> > .... register #01: 00170011
> > .......     : max redirection entries: 17
> > .......     : PRQ implemented: 0
> > .......     : IO APIC version: 11
> > .... register #02: 00000000
> > .......     : arbitration: 00
> > .... IRQ redirection table:
> > IOAPIC 0:
> >  pin00, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
> >  pin01, enabled , edge , high, V(31), IRR(0), S(0), physical, D(00), M(0)
> >  pin02, enabled , edge , high, V(30), IRR(0), S(0), physical, D(00), M(0)
> >  pin03, enabled , edge , high, V(33), IRR(0), S(0), physical, D(00), M(0)
> >  pin04, disabled, edge , high, V(34), IRR(0), S(0), physical, D(00), M(0)
> >  pin05, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
> >  pin06, enabled , edge , high, V(36), IRR(0), S(0), physical, D(00), M(0)
> >  pin07, enabled , edge , high, V(37), IRR(0), S(0), physical, D(00), M(0)
> >  pin08, enabled , edge , high, V(38), IRR(0), S(0), physical, D(00), M(0)
> >  pin09, disabled, level, high, V(39), IRR(0), S(0), physical, D(00), M(0)
> >  pin0a, enabled , level, high, V(3A), IRR(0), S(0), physical, D(00), M(0)
> >  pin0b, enabled , level, high, V(3B), IRR(0), S(0), physical, D(00), M(0)
> >  pin0c, enabled , edge , high, V(3C), IRR(0), S(0), physical, D(00), M(0)
> >  pin0d, enabled , edge , high, V(3D), IRR(0), S(0), physical, D(00), M(0)
> >  pin0e, enabled , edge , high, V(3E), IRR(0), S(0), physical, D(00), M(0)
> >  pin0f, enabled , edge , high, V(3F), IRR(0), S(0), physical, D(00), M(0)
> >  pin10, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
> >  pin11, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
> >  pin12, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
> >  pin13, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
> >  pin14, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
> >  pin15, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
> >  pin16, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
> >  pin17, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
> > IRQ to pin mappings:
> > IRQ0 -> 0:2
> > IRQ1 -> 0:1
> > IRQ3 -> 0:3
> > IRQ4 -> 0:4
> > IRQ6 -> 0:6
> > IRQ7 -> 0:7
> > IRQ8 -> 0:8
> > IRQ9 -> 0:9
> > IRQ10 -> 0:10
> > IRQ11 -> 0:11
> > IRQ12 -> 0:12
> > IRQ13 -> 0:13
> > IRQ14 -> 0:14
> > IRQ15 -> 0:15
> > .................................... done.
> > rtc_cmos rtc_cmos: setting system clock to 2015-10-20 08:57:55 UTC (1445331475)
> > input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
> > Freeing unused kernel memory: 936K (ffffffff818b6000 - ffffffff819a0000)
> > Write protecting the kernel read-only data: 8192k
> > Freeing unused kernel memory: 1900K (ffff880001425000 - ffff880001600000)
> > Freeing unused kernel memory: 60K (ffff8800017f1000 - ffff880001800000)
> > busybox (117) used greatest stack depth: 14480 bytes left
> > exe (124) used greatest stack depth: 14024 bytes left
> > udevd[140]: starting version 175
> > blkid (151) used greatest stack depth: 13920 bytes left
> > modprobe (242) used greatest stack depth: 13784 bytes left
> > clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2e200418439, max_idle_ns: 440795220848 ns
> > clocksource: Switched to clocksource tsc
> > EXT4-fs (vda1): recovery complete
> > EXT4-fs (vda1): mounted filesystem with ordered data mode. Opts: (null)
> > exe (262) used greatest stack depth: 13032 bytes left
> > random: init urandom read with 9 bits of entropy available
> > init: plymouth-upstart-bridge main process (279) terminated with status 1
> > init: plymouth-upstart-bridge main process ended, respawning
> > init: plymouth-upstart-bridge main process (289) terminated with status 1
> > init: plymouth-upstart-bridge main process ended, respawning
> > init: plymouth-upstart-bridge main process (293) terminated with status 1
> > init: plymouth-upstart-bridge main process ended, respawning
> > init: ureadahead main process (282) terminated with status 5
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > systemd-udevd[423]: starting version 204
> > EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: errors=remount-ro
> >  * Stopping Send an event to indicate plymouth is up^[[74G[ OK ]
> >  * Starting Mount filesystems on boot^[[74G[ OK ]
> >  * Starting Signal sysvinit that the rootfs is mounted^[[74G[ OK ]
> >  * Starting Populate /dev filesystem^[[74G[ OK ]
> >  * Starting Populate and link to /run filesystem^[[74G[ OK ]
> >  * Stopping Populate /dev filesystem^[[74G[ OK ]
> >  * Stopping Populate and link to /run filesystem^[[74G[ OK ]
> >  * Starting Clean /tmp directory^[[74G[ OK ]
> >  * Stopping Track if upstart is running in a container^[[74G[ OK ]
> >  * Stopping Clean /tmp directory^[[74G[ OK ]
> >  * Starting Initialize or finalize resolvconf^[[74G[ OK ]
> >  * Starting set console keymap^[[74G[ OK ]
> >  * Starting Signal sysvinit that virtual filesystems are mounted^[[74G[ OK ]
> >  * Starting Signal sysvinit that virtual filesystems are mounted^[[74G[ OK ]
> >  * Starting Bridge udev events into upstart^[[74G[ OK ]
> >  * Starting Signal sysvinit that remote filesystems are mounted^[[74G[ OK ]
> >  * Stopping set console keymap^[[74G[ OK ]
> >  * Starting device node and kernel event manager^[[74G[ OK ]
> >  * Starting load modules from /etc/modules^[[74G[ OK ]
> >  * Starting cold plug devices^[[74G[ OK ]
> >  * Starting log initial device creation^[[74G[ OK ]
> >  * Stopping Read required files in advance (for other mountpoints)^[[74G[ OK ]
> >  * Stopping load modules from /etc/modules^[[74G[ OK ]
> >  * Starting Signal sysvinit that local filesystems are mounted^[[74G[ OK ]
> >  * Starting flush early job output to logs^[[74G[ OK ]
> >  * Stopping Mount filesystems on boot^[[74G[ OK ]
> >  * Stopping flush early job output to logs^[[74G[ OK ]
> >  * Starting D-Bus system message bus^[[74G[ OK ]
> >  * Starting SystemD login management service^[[74G[ OK ]
> >  * Starting system logging daemon^[[74G[ OK ]
> >  * Stopping cold plug devices^[[74G[ OK ]
> >  * Starting Uncomplicated firewall^[[74G[ OK ]
> >  * Starting configure network device security^[[74G[ OK ]
> >  * Stopping log initial device creation^[[74G[ OK ]
> >  * Starting configure network device security^[[74G[ OK ]
> >  * Starting save udev log and update rules^[[74G[ OK ]
> >  * Starting set console font^[[74G[ OK ]
> >  * Stopping save udev log and update rules^[[74G[ OK ]
> >  * Starting Mount network filesystems^[[74G[ OK ]
> >  * Starting Failsafe Boot Delay^[[74G[ OK ]
> >  * Starting configure network device security^[[74G[ OK ]
> >  * Stopping Mount network filesystems^[[74G[ OK ]
> >  * Starting configure network device^[[74G[ OK ]
> >  * Starting configure network device^[[74G[ OK ]
> >  * Starting Bridge file events into upstart^[[74G[ OK ]
> >  * Starting Bridge socket events into upstart^[[74G[ OK ]
> >  * Stopping set console font^[[74G[ OK ]
> >  * Starting userspace bootsplash^[[74G[ OK ]
> >  * Starting Send an event to indicate plymouth is up^[[74G[ OK ]
> >  * Stopping userspace bootsplash^[[74G[ OK ]
> >  * Stopping Send an event to indicate plymouth is up^[[74G[ OK ]
> >  * Starting Mount network filesystems^[[74G[ OK ]
> > init: failsafe main process (591) killed by TERM signal
> >  * Stopping Failsafe Boot Delay^[[74G[ OK ]
> >  * Starting System V initialisation compatibility^[[74G[ OK ]
> >  * Stopping Mount network filesystems^[[74G[ OK ]
> >  * Starting configure virtual network devices^[[74G[ OK ]
> >  * Stopping System V initialisation compatibility^[[74G[ OK ]
> >  * Starting System V runlevel compatibility^[[74G[ OK ]
> >  * Starting deferred execution scheduler^[[74G[ OK ]
> >  * Starting regular background program processing daemon^[[74G[ OK ]
> >  * Starting ACPI daemon^[[74G[ OK ]
> >  * Starting save kernel messages^[[74G[ OK ]
> >  * Starting CPU interrupts balancing daemon^[[74G[ OK ]
> >  * Stopping save kernel messages^[[74G[ OK ]
> >  * Starting OpenSSH server^[[74G[ OK ]
> >  * Starting automatic crash report generation^[[74G[ OK ]
> >  * Restoring resolver state...       ^[[80G 
^[[74G[ OK ]
> > eth0 Link encap:Ethernet HWaddr 52:54:79:12:34:57 inet addr:192.168.0.21 Bcast:192.168.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:34 errors:0 dropped:24 overruns:0 frame:0 TX packets:4 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:5780 (5.7 KB) TX bytes:800 (800.0 B) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
> >  * Stopping System V runlevel compatibility^[[74G[ OK ]
> > init: plymouth-upstart-bridge main process ended, respawning
> > sh (1429) used greatest stack depth: 11752 bytes left
> > sh (1454) used greatest stack depth: 11528 bytes left
> > random: nonblocking pool is initialized
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > sh (2785) used greatest stack depth: 11480 bytes left
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> > IP: [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> > PGD 0 
> > Oops: 0000 [#1] SMP 
> > Dumping ftrace buffer:
> >    (ftrace buffer empty)
> > Modules linked in:
> > CPU: 1 PID: 26445 Comm: sh Not tainted 4.3.0-rc5-mm1-diet-meta+ #1545
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > task: ffff8800b9af3480 ti: ffff88007fea0000 task.ti: ffff88007fea0000
> > RIP: 0010:[<ffffffff810782a9>]  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> > RSP: 0018:ffff88007fea3648  EFLAGS: 00010202
> > RAX: 0000000000000001 RBX: ffffea0002324900 RCX: ffff88007fea37e8
> > RDX: 0000000000000000 RSI: ffff88007fea36e8 RDI: 0000000000000008
> > RBP: ffff88007fea3648 R08: ffffffff818446a0 R09: ffff8800b9af4c80
> > R10: 0000000000000216 R11: 0000000000000001 R12: ffff88007f58d6e1
> > R13: ffff88007f58d6e0 R14: 0000000000000008 R15: 0000000000000001
> > FS:  00007f0993e78740(0000) GS:ffff8800bfa20000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 0000000000000008 CR3: 000000007edee000 CR4: 00000000000006a0
> > Stack:
> >  ffff88007fea3678 ffffffff81124ff0 ffffea0002324900 ffff88007fea36e8
> >  ffff88009ffe8400 0000000000000000 ffff88007fea36c0 ffffffff81125733
> >  ffff8800bfa34540 ffffffff8105dc9d ffffea0002324900 ffff88007fea37e8
> > Call Trace:
> >  [<ffffffff81124ff0>] page_lock_anon_vma_read+0x60/0x180
> >  [<ffffffff81125733>] rmap_walk+0x1b3/0x3f0
> >  [<ffffffff8105dc9d>] ? finish_task_switch+0x5d/0x1f0
> >  [<ffffffff81125b13>] page_referenced+0x1a3/0x220
> >  [<ffffffff81123e30>] ? __page_check_address+0x1a0/0x1a0
> >  [<ffffffff81124f90>] ? page_get_anon_vma+0xd0/0xd0
> >  [<ffffffff81123820>] ? anon_vma_ctor+0x40/0x40
> >  [<ffffffff8110087b>] shrink_page_list+0x5ab/0xde0
> >  [<ffffffff8110174c>] shrink_inactive_list+0x18c/0x4b0
> >  [<ffffffff811023bd>] shrink_lruvec+0x59d/0x740
> >  [<ffffffff811025f0>] shrink_zone+0x90/0x250
> >  [<ffffffff811028dd>] do_try_to_free_pages+0x12d/0x3b0
> >  [<ffffffff81102d3d>] try_to_free_mem_cgroup_pages+0x9d/0x120
> >  [<ffffffff811496c3>] try_charge+0x163/0x700
> >  [<ffffffff81149cb4>] mem_cgroup_do_precharge+0x54/0x70
> >  [<ffffffff81149e45>] mem_cgroup_can_attach+0x175/0x1b0
> >  [<ffffffff811b2c57>] ? kernfs_iattrs.isra.6+0x37/0xd0
> >  [<ffffffff81148e70>] ? get_mctgt_type+0x320/0x320
> >  [<ffffffff810a9d29>] cgroup_migrate+0x149/0x440
> >  [<ffffffff810aa60c>] cgroup_attach_task+0x7c/0xe0
> >  [<ffffffff810aa904>] __cgroup_procs_write.isra.33+0x1d4/0x2b0
> >  [<ffffffff810aaa10>] cgroup_tasks_write+0x10/0x20
> >  [<ffffffff810a6238>] cgroup_file_write+0x38/0xf0
> >  [<ffffffff811b54ad>] kernfs_fop_write+0x11d/0x170
> >  [<ffffffff81153918>] __vfs_write+0x28/0xe0
> >  [<ffffffff8116e614>] ? __fd_install+0x24/0xc0
> >  [<ffffffff810784a1>] ? percpu_down_read+0x21/0x50
> >  [<ffffffff81153e91>] vfs_write+0xa1/0x170
> >  [<ffffffff81154716>] SyS_write+0x46/0xa0
> >  [<ffffffff81420a17>] entry_SYSCALL_64_fastpath+0x12/0x6a
> > Code: 5e 82 3a 00 48 83 c4 08 5b 5d c3 48 89 45 f0 e8 9b 6a 3a 00 48 8b 45 f0 eb df 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 <48> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7 
> > RIP  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> >  RSP <ffff88007fea3648>
> > CR2: 0000000000000008
> > BUG: unable to handle kernel ---[ end trace e81a82c8122b447d ]---
> > Kernel panic - not syncing: Fatal exception
> > 
> > NULL pointer dereference at 0000000000000008
> > IP: [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> > PGD 0 
> > Oops: 0000 [#2] SMP 
> > Dumping ftrace buffer:
> >    (ftrace buffer empty)
> > Modules linked in:
> > CPU: 10 PID: 59 Comm: khugepaged Tainted: G      D         4.3.0-rc5-mm1-diet-meta+ #1545
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > task: ffff8800b9851a40 ti: ffff8800b985c000 task.ti: ffff8800b985c000
> > RIP: 0010:[<ffffffff810782a9>]  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> > RSP: 0018:ffff8800b985f778  EFLAGS: 00010202
> > RAX: 0000000000000001 RBX: ffffea0002321800 RCX: ffff8800b985f918
> > RDX: 0000000000000000 RSI: ffff8800b985f818 RDI: 0000000000000008
> > RBP: ffff8800b985f778 R08: ffffffff818446a0 R09: ffff8800b9853240
> > R10: 000000000000ba03 R11: 0000000000000001 R12: ffff88007f58d6e1
> > R13: ffff88007f58d6e0 R14: 0000000000000008 R15: 0000000000000001
> > FS:  0000000000000000(0000) GS:ffff8800bfb40000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 0000000000000008 CR3: 0000000001808000 CR4: 00000000000006a0
> > Stack:
> >  ffff8800b985f7a8 ffffffff81124ff0 ffffea0002321800 ffff8800b985f818
> >  ffff88009ffe8400 0000000000000000 ffff8800b985f7f0 ffffffff81125733
> >  ffff8800bfb54540 ffffffff8105dc9d ffffea0002321800 ffff8800b985f918
> > Call Trace:
> >  [<ffffffff81124ff0>] page_lock_anon_vma_read+0x60/0x180
> >  [<ffffffff81125733>] rmap_walk+0x1b3/0x3f0
> >  [<ffffffff8105dc9d>] ? finish_task_switch+0x5d/0x1f0
> >  [<ffffffff81125b13>] page_referenced+0x1a3/0x220
> >  [<ffffffff81123e30>] ? __page_check_address+0x1a0/0x1a0
> >  [<ffffffff81124f90>] ? page_get_anon_vma+0xd0/0xd0
> >  [<ffffffff81123820>] ? anon_vma_ctor+0x40/0x40
> >  [<ffffffff8110087b>] shrink_page_list+0x5ab/0xde0
> >  [<ffffffff8110174c>] shrink_inactive_list+0x18c/0x4b0
> >  [<ffffffff811023bd>] shrink_lruvec+0x59d/0x740
> >  [<ffffffff811025f0>] shrink_zone+0x90/0x250
> >  [<ffffffff811028dd>] do_try_to_free_pages+0x12d/0x3b0
> >  [<ffffffff81102d3d>] try_to_free_mem_cgroup_pages+0x9d/0x120
> >  [<ffffffff811496c3>] try_charge+0x163/0x700
> >  [<ffffffff8141d1f3>] ? schedule+0x33/0x80
> >  [<ffffffff8114d45f>] mem_cgroup_try_charge+0x9f/0x1d0
> >  [<ffffffff811434bc>] khugepaged+0x7cc/0x1ac0
> >  [<ffffffff81066e01>] ? hrtick_update+0x1/0x70
> >  [<ffffffff81072430>] ? prepare_to_wait_event+0xf0/0xf0
> >  [<ffffffff81142cf0>] ? total_mapcount+0x70/0x70
> >  [<ffffffff81056cd9>] kthread+0xc9/0xe0
> >  [<ffffffff81056c10>] ? kthread_park+0x60/0x60
> >  [<ffffffff81420d6f>] ret_from_fork+0x3f/0x70
> >  [<ffffffff81056c10>] ? kthread_park+0x60/0x60
> > Code: 5e 82 3a 00 48 83 c4 08 5b 5d c3 48 89 45 f0 e8 9b 6a 3a 00 48 8b 45 f0 eb df 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 <48> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7 
> > RIP  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> >  RSP <ffff8800b985f778>
> > CR2: 0000000000000008
> > ---[ end trace e81a82c8122b447e ]---
> > Shutting down cpus with NMI
> > Dumping ftrace buffer:
> >    (ftrace buffer empty)
> > Kernel Offset: disabled
> 

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-10-21 11:07 ` Kirill A. Shutemov
@ 2015-10-22  0:06   ` Minchan Kim
  2015-10-22  0:59     ` Hugh Dickins
  0 siblings, 1 reply; 33+ messages in thread
From: Minchan Kim @ 2015-10-22  0:06 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, linux-mm, linux-kernel, Hugh Dickins,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Wed, Oct 21, 2015 at 02:07:23PM +0300, Kirill A. Shutemov wrote:
> On Wed, Oct 21, 2015 at 02:28:36PM +0900, Minchan Kim wrote:
> > I detach this report from my patchset thread because I see below
> > problem with removing MADV_FREE related code and I can reproduce
> > same oops with MADV_FREE + recent patches(both my SetPageDirty
> > and Kirill's pte_mkdirty) within 7 hours.
> 
> Could you share code for your workload?

It's part of test suite so I need time to factor it out.
I will do/test and send it.

> 
> > I can not be sure it's THP refcount redesign's problem but it was
> > one of big change in MM between mmotm-2015-10-15-15-20 and
> > mmotm-2015-10-06-16-30 so it could be a culprit.
> > 
> > In page_lock_anon_vma_read, anon_vma_root was NULL.
> > I added VM_BUG_ON_PAGE(!root_anon_vma, page) in there and got the result.
> 
> Hm. That's tricky.. :-/
> 
> Could you please dump anon_vma->refcount too?

I added the code to check it and queued it again but I had another oops
in this time but symptom is related to anon_vma, too.
(kernel is based on recent mmotm + unconditional mkdirty for bug fix)
It seems page_get_anon_vma returns NULL since the page was not page_mapped
at that time but second check of page_mapped right before try_to_unmap seems
to be true.

Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
page->mem_cgroup:ffff88007f3dcc00
------------[ cut here ]------------
kernel BUG at mm/migrate.c:889!
invalid opcode: 0000 [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 11 PID: 59 Comm: khugepaged Not tainted 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff8800b9851a40 ti: ffff8800b985c000 task.ti: ffff8800b985c000
RIP: 0010:[<ffffffff81145c06>]  [<ffffffff81145c06>] migrate_pages+0x8e6/0x950
RSP: 0018:ffff8800b985fa00  EFLAGS: 00010286
RAX: 0000000000000021 RBX: ffffea0002dd7fc0 RCX: ffffffff81830db8
RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffffff821df4d8
RBP: ffff8800b985fa80 R08: 0000000000000000 R09: ffff8800000bb160
R10: ffffffff8163e000 R11: 00000000000001e0 R12: 0000000000000000
R13: ffffea0001cfbf80 R14: ffffea0001cfbfc0 R15: ffffffff8189de80
FS:  0000000000000000(0000) GS:ffff8800bfb60000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00005594f9d7e578 CR3: 0000000001808000 CR4: 00000000000006a0
Stack:
 ffff8800b9851a40 0000000000000000 0000000000000000 0000000000000000
 ffffffff811144b0 0000000000000000 ffffffff81115fb0 ffffea0001cfbfe0
 ffff8800b985fb30 ffff8800b985fb20 0000000000000000 ffff8800b985fb20
Call Trace:
 [<ffffffff811144b0>] ? trace_raw_output_mm_compaction_defer_template+0xc0/0xc0
 [<ffffffff81115fb0>] ? isolate_freepages_block+0x3d0/0x3d0
 [<ffffffff8111696b>] compact_zone+0x2bb/0x720
 [<ffffffff8142888d>] ? retint_kernel+0x10/0x10
 [<ffffffff81286b7d>] ? list_del+0xd/0x30
 [<ffffffff81116e3d>] compact_zone_order+0x6d/0xa0
 [<ffffffff8111708d>] try_to_compact_pages+0xed/0x200
 [<ffffffff81155243>] __alloc_pages_direct_compact+0x3b/0xd4
 [<ffffffff810f8f5b>] __alloc_pages_nodemask+0x3fb/0x920
 [<ffffffff81147b88>] khugepaged+0x158/0x1b90
 [<ffffffff81068e01>] ? hrtick_update+0x51/0x70
 [<ffffffff810744e0>] ? prepare_to_wait_event+0xf0/0xf0
 [<ffffffff81147a30>] ? unfreeze_page+0x320/0x320
 [<ffffffff81058929>] kthread+0xc9/0xe0
 [<ffffffff81058860>] ? kthread_park+0x60/0x60
 [<ffffffff814280ef>] ret_from_fork+0x3f/0x70
 [<ffffffff81058860>] ? kthread_park+0x60/0x60
Code: 44 c6 48 8b 40 08 83 e0 03 48 83 f8 03 0f 84 fd fa ff ff 4d 85 e4 0f 85 f4 fa ff ff 48 c7 c6 58 e9 77 81 4c 89 f7 e8 fa 2a fd ff <0f> 0b 48 83 e8 01 e9 d0 fa ff ff f6 40 07 01 0f 84 5b fd ff ff 
RIP  [<ffffffff81145c06>] migrate_pages+0x8e6/0x950
 RSP <ffff8800b985fa00>
---[ end trace 59eb35cc15af8a53 ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled

> 
> I have vage suspicion that I'm screwing up anon_vma refcounting during
> split_huge_page.
> 
> It would be great to see if the page was part of THP before.
> 
> > 
> > ..
> > ..
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > page:ffffea0001b81140 count:3 mapcount:1 mapping:ffff88007e806461 index:0x600001445
> > page:ffffea0001b87bc0 count:3 mapcount:1 mapping:ffff88007e806461 index:0x6000015ef
> > flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> > page dumped because: VM_BUG_ON_PAGE(1)
> > page->mem_cgroup:ffff88007f2de000
> > ------------[ cut here ]------------
> > kernel BUG at mm/rmap.c:517!
> > invalid opcode: 0000 [#1] SMP 
> > Dumping ftrace buffer:
> >    (ftrace buffer empty)
> > Modules linked in:
> > CPU: 0 PID: 24935 Comm: madvise_test Not tainted 4.3.0-rc5-mm1-THP-ref-madv_free+ #1555
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > task: ffff880000ce8000 ti: ffff8800ada28000 task.ti: ffff8800ada28000
> > RIP: 0010:[<ffffffff81128f6e>]  [<ffffffff81128f6e>] page_lock_anon_vma_read+0x18e/0x190
> > RSP: 0000:ffff8800ada2b868  EFLAGS: 00010296
> > RAX: 0000000000000021 RBX: ffffea0001b87bc0 RCX: 0000000000000000
> > RDX: 0000000000000001 RSI: 0000000000000282 RDI: ffffffff81830db0
> > RBP: ffff8800ada2b888 R08: 0000000000000021 R09: ffff8800ba40eb75
> > R10: 0000000001ff14bc R11: 0000000000000000 R12: ffff88007e806461
> > R13: ffff88007e806460 R14: 0000000000000000 R15: ffffffff818464c0
> > FS:  00007f6d93212740(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 0000600003c14000 CR3: 00000000a674b000 CR4: 00000000000006b0
> > Stack:
> >  ffffea0001b87bc0 ffff8800ada2b8f8 ffff88007f2de000 0000000000000000
> >  ffff8800ada2b8d0 ffffffff81129593 ffff880000000000 ffffffff8105f8c0
> >  ffffea0001b87bc0 ffff8800ada2b9f8 ffff88007f2de000 0000000000000000
> > Call Trace:
> >  [<ffffffff81129593>] rmap_walk+0x1b3/0x3f0
> >  [<ffffffff8105f8c0>] ? finish_task_switch+0x70/0x260
> >  [<ffffffff81129973>] page_referenced+0x1a3/0x220
> >  [<ffffffff81127c10>] ? __page_check_address+0x1d0/0x1d0
> >  [<ffffffff81128de0>] ? page_get_anon_vma+0xd0/0xd0
> >  [<ffffffff81127580>] ? anon_vma_ctor+0x40/0x40
> >  [<ffffffff81103e9e>] shrink_page_list+0x5ce/0xdc0
> >  [<ffffffff81104d4c>] shrink_inactive_list+0x18c/0x4b0
> >  [<ffffffff811059af>] shrink_lruvec+0x58f/0x730
> >  [<ffffffff81105c24>] shrink_zone+0xd4/0x280
> >  [<ffffffff81105efd>] do_try_to_free_pages+0x12d/0x3b0
> >  [<ffffffff8110635d>] try_to_free_mem_cgroup_pages+0x9d/0x120
> >  [<ffffffff8114e235>] try_charge+0x175/0x720
> >  [<ffffffff810fdf80>] ? __activate_page+0x230/0x230
> >  [<ffffffff81152005>] mem_cgroup_try_charge+0x85/0x1d0
> >  [<ffffffff8111e69a>] handle_mm_fault+0xc9a/0x1000
> >  [<ffffffff8106215b>] ? __set_cpus_allowed_ptr+0x9b/0x1a0
> >  [<ffffffff81033629>] __do_page_fault+0x189/0x400
> >  [<ffffffff810338ac>] do_page_fault+0xc/0x10
> >  [<ffffffff81428782>] page_fault+0x22/0x30
> > Code: c9 0f 84 b9 fe ff ff 8d 51 01 89 c8 f0 0f b1 16 39 c1 0f 84 11 ff ff ff 89 c1 eb e3 48 c7 c6 88 02 78 81 48 89 df e8 02 f3 fe ff <0f> 0b 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 45 31 f6 
> > 41 55 4c 
> > RIP  [<ffffffff81128f6e>] page_lock_anon_vma_read+0x18e/0x190
> >  RSP <ffff8800ada2b868>
> > ---[ end trace cfbb87f54f12290e ]---
> > Kernel panic - not syncing: Fatal exception
> > Dumping ftrace buffer:
> >    (ftrace buffer empty)
> > Kernel Offset: disabled
> > 
> > On Tue, Oct 20, 2015 at 10:38:54AM +0900, Minchan Kim wrote:
> > > On Mon, Oct 19, 2015 at 07:01:50PM +0900, Minchan Kim wrote:
> > > > On Mon, Oct 19, 2015 at 03:31:42PM +0900, Minchan Kim wrote:
> > > > > Hello, it's too late since I sent previos patch.
> > > > > https://lkml.org/lkml/2015/6/3/37
> > > > > 
> > > > > This patch is alomost new compared to previos approach.
> > > > > I think this is more simple, clear and easy to review.
> > > > > 
> > > > > One thing I should notice is that I have tested this patch
> > > > > and couldn't find any critical problem so I rebased patchset
> > > > > onto recent mmotm(ie, mmotm-2015-10-15-15-20) to send formal
> > > > > patchset. Unfortunately, I start to see sudden discarding of
> > > > > the page we shouldn't do. IOW, application's valid anonymous page
> > > > > was disappeared suddenly.
> > > > > 
> > > > > When I look through THP changes, I think we could lose
> > > > > dirty bit of pte between freeze_page and unfreeze_page
> > > > > when we mark it as migration entry and restore it.
> > > > > So, I added below simple code without enough considering
> > > > > and cannot see the problem any more.
> > > > > I hope it's good hint to find right fix this problem.
> > > > > 
> > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > > > index d5ea516ffb54..e881c04f5950 100644
> > > > > --- a/mm/huge_memory.c
> > > > > +++ b/mm/huge_memory.c
> > > > > @@ -3138,6 +3138,9 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page,
> > > > >  		if (is_write_migration_entry(swp_entry))
> > > > >  			entry = maybe_mkwrite(entry, vma);
> > > > >  
> > > > > +		if (PageDirty(page))
> > > > > +			SetPageDirty(page);
> > > > 
> > > > The condition of PageDirty was typo. I didn't add the condition.
> > > > Just added.
> > > > 
> > > >                 SetPageDirty(page);
> > > 
> > > For the first step to find this bug, I removed all MADV_FREE related
> > > code in mmotm-2015-10-15-15-20. IOW, git checkout 54bad5da4834
> > > (arm64: add pmd_[dirty|mkclean] for THP) so the tree doesn't have
> > > any core code of MADV_FREE.
> > > 
> > > I tested following workloads in my KVM machine.
> > > 
> > > 0. make memcg
> > > 1. limit memcg
> > > 2. fork several processes
> > > 3. each process allocates THP page and fill
> > > 4. increase limit of the memcg to swapoff successfully
> > > 5. swapoff
> > > 6. kill all of processes
> > > 7. goto 1
> > > 
> > > Within a few hours, I encounter following bug.
> > > Attached detailed boot log and dmesg result.
> > > 
> > > 
> > > Initializing cgroup subsys cpu
> > > Command line: hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic console=ttyS0,115200 console=tty0 earlyprintk=ttyS0 ignore_loglevel ftrace_dump_on_oops vga=normal root=/dev/vda1 rw
> > > KERNEL supported cpus:
> > >   Intel GenuineIntel
> > > x86/fpu: Legacy x87 FPU detected.
> > > x86/fpu: Using 'lazy' FPU context switches.
> > > e820: BIOS-provided physical RAM map:
> > > BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> > > BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> > > BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
> > > BIOS-e820: [mem 0x0000000000100000-0x00000000bfffbfff] usable
> > > BIOS-e820: [mem 0x00000000bfffc000-0x00000000bfffffff] reserved
> > > BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
> > > BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
> > > 
> > > <snip>
> > > 
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> > > IP: [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> > > PGD 0 
> > > Oops: 0000 [#1] SMP 
> > > Dumping ftrace buffer:
> > >    (ftrace buffer empty)
> > > Modules linked in:
> > > CPU: 1 PID: 26445 Comm: sh Not tainted 4.3.0-rc5-mm1-diet-meta+ #1545
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > task: ffff8800b9af3480 ti: ffff88007fea0000 task.ti: ffff88007fea0000
> > > RIP: 0010:[<ffffffff810782a9>]  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> > > RSP: 0018:ffff88007fea3648  EFLAGS: 00010202
> > > RAX: 0000000000000001 RBX: ffffea0002324900 RCX: ffff88007fea37e8
> > > RDX: 0000000000000000 RSI: ffff88007fea36e8 RDI: 0000000000000008
> > > RBP: ffff88007fea3648 R08: ffffffff818446a0 R09: ffff8800b9af4c80
> > > R10: 0000000000000216 R11: 0000000000000001 R12: ffff88007f58d6e1
> > > R13: ffff88007f58d6e0 R14: 0000000000000008 R15: 0000000000000001
> > > FS:  00007f0993e78740(0000) GS:ffff8800bfa20000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > CR2: 0000000000000008 CR3: 000000007edee000 CR4: 00000000000006a0
> > > Stack:
> > >  ffff88007fea3678 ffffffff81124ff0 ffffea0002324900 ffff88007fea36e8
> > >  ffff88009ffe8400 0000000000000000 ffff88007fea36c0 ffffffff81125733
> > >  ffff8800bfa34540 ffffffff8105dc9d ffffea0002324900 ffff88007fea37e8
> > > Call Trace:
> > >  [<ffffffff81124ff0>] page_lock_anon_vma_read+0x60/0x180
> > >  [<ffffffff81125733>] rmap_walk+0x1b3/0x3f0
> > >  [<ffffffff8105dc9d>] ? finish_task_switch+0x5d/0x1f0
> > >  [<ffffffff81125b13>] page_referenced+0x1a3/0x220
> > >  [<ffffffff81123e30>] ? __page_check_address+0x1a0/0x1a0
> > >  [<ffffffff81124f90>] ? page_get_anon_vma+0xd0/0xd0
> > >  [<ffffffff81123820>] ? anon_vma_ctor+0x40/0x40
> > >  [<ffffffff8110087b>] shrink_page_list+0x5ab/0xde0
> > >  [<ffffffff8110174c>] shrink_inactive_list+0x18c/0x4b0
> > >  [<ffffffff811023bd>] shrink_lruvec+0x59d/0x740
> > >  [<ffffffff811025f0>] shrink_zone+0x90/0x250
> > >  [<ffffffff811028dd>] do_try_to_free_pages+0x12d/0x3b0
> > >  [<ffffffff81102d3d>] try_to_free_mem_cgroup_pages+0x9d/0x120
> > >  [<ffffffff811496c3>] try_charge+0x163/0x700
> > >  [<ffffffff81149cb4>] mem_cgroup_do_precharge+0x54/0x70
> > >  [<ffffffff81149e45>] mem_cgroup_can_attach+0x175/0x1b0
> > >  [<ffffffff811b2c57>] ? kernfs_iattrs.isra.6+0x37/0xd0
> > >  [<ffffffff81148e70>] ? get_mctgt_type+0x320/0x320
> > >  [<ffffffff810a9d29>] cgroup_migrate+0x149/0x440
> > >  [<ffffffff810aa60c>] cgroup_attach_task+0x7c/0xe0
> > >  [<ffffffff810aa904>] __cgroup_procs_write.isra.33+0x1d4/0x2b0
> > >  [<ffffffff810aaa10>] cgroup_tasks_write+0x10/0x20
> > >  [<ffffffff810a6238>] cgroup_file_write+0x38/0xf0
> > >  [<ffffffff811b54ad>] kernfs_fop_write+0x11d/0x170
> > >  [<ffffffff81153918>] __vfs_write+0x28/0xe0
> > >  [<ffffffff8116e614>] ? __fd_install+0x24/0xc0
> > >  [<ffffffff810784a1>] ? percpu_down_read+0x21/0x50
> > >  [<ffffffff81153e91>] vfs_write+0xa1/0x170
> > >  [<ffffffff81154716>] SyS_write+0x46/0xa0
> > >  [<ffffffff81420a17>] entry_SYSCALL_64_fastpath+0x12/0x6a
> > > Code: 5e 82 3a 00 48 83 c4 08 5b 5d c3 48 89 45 f0 e8 9b 6a 3a 00 48 8b 45 f0 eb df 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 <48> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7 
> > > RIP  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> > >  RSP <ffff88007fea3648>
> > > CR2: 0000000000000008
> > > BUG: unable to handle kernel ---[ end trace e81a82c8122b447d ]---
> > > Kernel panic - not syncing: Fatal exception
> > > 
> > > NULL pointer dereference at 0000000000000008
> > > IP: [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> > > PGD 0 
> > > Oops: 0000 [#2] SMP 
> > > Dumping ftrace buffer:
> > >    (ftrace buffer empty)
> > > Modules linked in:
> > > CPU: 10 PID: 59 Comm: khugepaged Tainted: G      D         4.3.0-rc5-mm1-diet-meta+ #1545
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > task: ffff8800b9851a40 ti: ffff8800b985c000 task.ti: ffff8800b985c000
> > > RIP: 0010:[<ffffffff810782a9>]  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> > > RSP: 0018:ffff8800b985f778  EFLAGS: 00010202
> > > RAX: 0000000000000001 RBX: ffffea0002321800 RCX: ffff8800b985f918
> > > RDX: 0000000000000000 RSI: ffff8800b985f818 RDI: 0000000000000008
> > > RBP: ffff8800b985f778 R08: ffffffff818446a0 R09: ffff8800b9853240
> > > R10: 000000000000ba03 R11: 0000000000000001 R12: ffff88007f58d6e1
> > > R13: ffff88007f58d6e0 R14: 0000000000000008 R15: 0000000000000001
> > > FS:  0000000000000000(0000) GS:ffff8800bfb40000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > CR2: 0000000000000008 CR3: 0000000001808000 CR4: 00000000000006a0
> > > Stack:
> > >  ffff8800b985f7a8 ffffffff81124ff0 ffffea0002321800 ffff8800b985f818
> > >  ffff88009ffe8400 0000000000000000 ffff8800b985f7f0 ffffffff81125733
> > >  ffff8800bfb54540 ffffffff8105dc9d ffffea0002321800 ffff8800b985f918
> > > Call Trace:
> > >  [<ffffffff81124ff0>] page_lock_anon_vma_read+0x60/0x180
> > >  [<ffffffff81125733>] rmap_walk+0x1b3/0x3f0
> > >  [<ffffffff8105dc9d>] ? finish_task_switch+0x5d/0x1f0
> > >  [<ffffffff81125b13>] page_referenced+0x1a3/0x220
> > >  [<ffffffff81123e30>] ? __page_check_address+0x1a0/0x1a0
> > >  [<ffffffff81124f90>] ? page_get_anon_vma+0xd0/0xd0
> > >  [<ffffffff81123820>] ? anon_vma_ctor+0x40/0x40
> > >  [<ffffffff8110087b>] shrink_page_list+0x5ab/0xde0
> > >  [<ffffffff8110174c>] shrink_inactive_list+0x18c/0x4b0
> > >  [<ffffffff811023bd>] shrink_lruvec+0x59d/0x740
> > >  [<ffffffff811025f0>] shrink_zone+0x90/0x250
> > >  [<ffffffff811028dd>] do_try_to_free_pages+0x12d/0x3b0
> > >  [<ffffffff81102d3d>] try_to_free_mem_cgroup_pages+0x9d/0x120
> > >  [<ffffffff811496c3>] try_charge+0x163/0x700
> > >  [<ffffffff8141d1f3>] ? schedule+0x33/0x80
> > >  [<ffffffff8114d45f>] mem_cgroup_try_charge+0x9f/0x1d0
> > >  [<ffffffff811434bc>] khugepaged+0x7cc/0x1ac0
> > >  [<ffffffff81066e01>] ? hrtick_update+0x1/0x70
> > >  [<ffffffff81072430>] ? prepare_to_wait_event+0xf0/0xf0
> > >  [<ffffffff81142cf0>] ? total_mapcount+0x70/0x70
> > >  [<ffffffff81056cd9>] kthread+0xc9/0xe0
> > >  [<ffffffff81056c10>] ? kthread_park+0x60/0x60
> > >  [<ffffffff81420d6f>] ret_from_fork+0x3f/0x70
> > >  [<ffffffff81056c10>] ? kthread_park+0x60/0x60
> > > Code: 5e 82 3a 00 48 83 c4 08 5b 5d c3 48 89 45 f0 e8 9b 6a 3a 00 48 8b 45 f0 eb df 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 <48> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7 
> > > RIP  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> > >  RSP <ffff8800b985f778>
> > > CR2: 0000000000000008
> > > ---[ end trace e81a82c8122b447e ]---
> > > Shutting down cpus with NMI
> > > Dumping ftrace buffer:
> > >    (ftrace buffer empty)
> > > Kernel Offset: disabled
> > > 
> > 
> > > QEMU 2.0.0 monitor - type 'help' for more information
> > > (qemu) s^[[Kearly console in setup code
> > > Initializing cgroup subsys cpu
> > > Linux version 4.3.0-rc5-mm1-diet-meta+ (barrios@bbox) (gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04) ) #1545 SMP Tue Oct 20 08:55:45 KST 2015
> > > Command line: hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic console=ttyS0,115200 console=tty0 earlyprintk=ttyS0 ignore_loglevel ftrace_dump_on_oops vga=normal root=/dev/vda1 rw
> > > KERNEL supported cpus:
> > >   Intel GenuineIntel
> > > x86/fpu: Legacy x87 FPU detected.
> > > x86/fpu: Using 'lazy' FPU context switches.
> > > e820: BIOS-provided physical RAM map:
> > > BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> > > BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> > > BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
> > > BIOS-e820: [mem 0x0000000000100000-0x00000000bfffbfff] usable
> > > BIOS-e820: [mem 0x00000000bfffc000-0x00000000bfffffff] reserved
> > > BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
> > > BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
> > > bootconsole [earlyser0] enabled
> > > debug: ignoring loglevel setting.
> > > NX (Execute Disable) protection: active
> > > SMBIOS 2.4 present.
> > > DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
> > > e820: remove [mem 0x000a0000-0x000fffff] usable
> > > e820: last_pfn = 0xbfffc max_arch_pfn = 0x400000000
> > > MTRR default type: write-back
> > > MTRR fixed ranges enabled:
> > >   00000-9FFFF write-back
> > >   A0000-BFFFF uncachable
> > >   C0000-FFFFF write-protect
> > > MTRR variable ranges enabled:
> > >   0 base 00C0000000 mask FFC0000000 uncachable
> > >   1 disabled
> > >   2 disabled
> > >   3 disabled
> > >   4 disabled
> > >   5 disabled
> > >   6 disabled
> > >   7 disabled
> > > x86/PAT: PAT not supported by CPU.
> > > Scan for SMP in [mem 0x00000000-0x000003ff]
> > > Scan for SMP in [mem 0x0009fc00-0x0009ffff]
> > > Scan for SMP in [mem 0x000f0000-0x000fffff]
> > > found SMP MP-table at [mem 0x000f0a70-0x000f0a7f] mapped at [ffff8800000f0a70]
> > >   mpc: f0a80-f0c44
> > > Scanning 1 areas for low memory corruption
> > > Base memory trampoline at [ffff880000099000] 99000 size 24576
> > > init_memory_mapping: [mem 0x00000000-0x000fffff]
> > >  [mem 0x00000000-0x000fffff] page 4k
> > > BRK [0x0220e000, 0x0220efff] PGTABLE
> > > BRK [0x0220f000, 0x0220ffff] PGTABLE
> > > BRK [0x02210000, 0x02210fff] PGTABLE
> > > init_memory_mapping: [mem 0xbfc00000-0xbfdfffff]
> > >  [mem 0xbfc00000-0xbfdfffff] page 2M
> > > BRK [0x02211000, 0x02211fff] PGTABLE
> > > init_memory_mapping: [mem 0xa0000000-0xbfbfffff]
> > >  [mem 0xa0000000-0xbfbfffff] page 2M
> > > init_memory_mapping: [mem 0x80000000-0x9fffffff]
> > >  [mem 0x80000000-0x9fffffff] page 2M
> > > init_memory_mapping: [mem 0x00100000-0x7fffffff]
> > >  [mem 0x00100000-0x001fffff] page 4k
> > >  [mem 0x00200000-0x7fffffff] page 2M
> > > init_memory_mapping: [mem 0xbfe00000-0xbfffbfff]
> > >  [mem 0xbfe00000-0xbfffbfff] page 4k
> > > BRK [0x02212000, 0x02212fff] PGTABLE
> > > RAMDISK: [mem 0x7851a000-0x7fffffff]
> > >  [ffffea0000000000-ffffea0002ffffff] PMD -> [ffff8800bc400000-ffff8800bf3fffff] on node 0
> > > Zone ranges:
> > >   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> > >   DMA32    [mem 0x0000000001000000-0x00000000bfffbfff]
> > >   Normal   empty
> > > Movable zone start for each node
> > > Early memory node ranges
> > >   node   0: [mem 0x0000000000001000-0x000000000009efff]
> > >   node   0: [mem 0x0000000000100000-0x00000000bfffbfff]
> > > Initmem setup node 0 [mem 0x0000000000001000-0x00000000bfffbfff]
> > > On node 0 totalpages: 786330
> > >   DMA zone: 64 pages used for memmap
> > >   DMA zone: 21 pages reserved
> > >   DMA zone: 3998 pages, LIFO batch:0
> > >   DMA32 zone: 12224 pages used for memmap
> > >   DMA32 zone: 782332 pages, LIFO batch:31
> > > Intel MultiProcessor Specification v1.4
> > >   mpc: f0a80-f0c44
> > > MPTABLE: OEM ID: BOCHSCPU
> > > MPTABLE: Product ID: 0.1         
> > > MPTABLE: APIC at: 0xFEE00000
> > > mapped APIC to ffffffffff5fd000 (        fee00000)
> > > Processor #0 (Bootup-CPU)
> > > Processor #1
> > > Processor #2
> > > Processor #3
> > > Processor #4
> > > Processor #5
> > > Processor #6
> > > Processor #7
> > > Processor #8
> > > Processor #9
> > > Processor #10
> > > Processor #11
> > > Bus #0 is PCI   
> > > Bus #1 is ISA   
> > > IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
> > > Int: type 0, pol 1, trig 0, bus 00, IRQ 04, APIC ID 0, APIC INT 09
> > > Int: type 0, pol 1, trig 0, bus 00, IRQ 0c, APIC ID 0, APIC INT 0b
> > > Int: type 0, pol 1, trig 0, bus 00, IRQ 10, APIC ID 0, APIC INT 0b
> > > Int: type 0, pol 1, trig 0, bus 00, IRQ 14, APIC ID 0, APIC INT 0a
> > > Int: type 0, pol 1, trig 0, bus 00, IRQ 18, APIC ID 0, APIC INT 0a
> > > Int: type 0, pol 0, trig 0, bus 01, IRQ 00, APIC ID 0, APIC INT 02
> > > Int: type 0, pol 0, trig 0, bus 01, IRQ 01, APIC ID 0, APIC INT 01
> > > Int: type 0, pol 0, trig 0, bus 01, IRQ 03, APIC ID 0, APIC INT 03
> > > Int: type 0, pol 0, trig 0, bus 01, IRQ 04, APIC ID 0, APIC INT 04
> > > Int: type 0, pol 0, trig 0, bus 01, IRQ 06, APIC ID 0, APIC INT 06
> > > Int: type 0, pol 0, trig 0, bus 01, IRQ 07, APIC ID 0, APIC INT 07
> > > Int: type 0, pol 0, trig 0, bus 01, IRQ 08, APIC ID 0, APIC INT 08
> > > Int: type 0, pol 0, trig 0, bus 01, IRQ 0c, APIC ID 0, APIC INT 0c
> > > Int: type 0, pol 0, trig 0, bus 01, IRQ 0d, APIC ID 0, APIC INT 0d
> > > Int: type 0, pol 0, trig 0, bus 01, IRQ 0e, APIC ID 0, APIC INT 0e
> > > Int: type 0, pol 0, trig 0, bus 01, IRQ 0f, APIC ID 0, APIC INT 0f
> > > Lint: type 3, pol 0, trig 0, bus 01, IRQ 00, APIC ID 0, APIC LINT 00
> > > Lint: type 1, pol 0, trig 0, bus 01, IRQ 00, APIC ID ff, APIC LINT 01
> > > Processors: 12
> > > smpboot: Allowing 12 CPUs, 0 hotplug CPUs
> > > mapped IOAPIC to ffffffffff5fc000 (fec00000)
> > > e820: [mem 0xc0000000-0xfeffbfff] available for PCI devices
> > > clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
> > > setup_percpu: NR_CPUS:16 nr_cpumask_bits:16 nr_cpu_ids:12 nr_node_ids:1
> > > PERCPU: Embedded 31 pages/cpu @ffff8800bfa00000 s87640 r8192 d31144 u131072
> > > pcpu-alloc: s87640 r8192 d31144 u131072 alloc=1*2097152
> > > pcpu-alloc: [0] 00 01 02 03 04 05 06 07 08 09 10 11 -- -- -- -- 
> > > Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 774021
> > > Kernel command line: hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic console=ttyS0,115200 console=tty0 earlyprintk=ttyS0 ignore_loglevel ftrace_dump_on_oops vga=normal root=/dev/vda1 rw
> > > sysrq: sysrq always enabled.
> > > log_buf_len individual max cpu contribution: 2097152 bytes
> > > log_buf_len total cpu_extra contributions: 23068672 bytes
> > > log_buf_len min size: 8388608 bytes
> > > log_buf_len: 33554432 bytes
> > > early log buf free: 8380096(99%)
> > > PID hash table entries: 4096 (order: 3, 32768 bytes)
> > > Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
> > > Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
> > > Memory: 2911172K/3145320K available (4237K kernel code, 721K rwdata, 1988K rodata, 936K init, 8608K bss, 234148K reserved, 0K cma-reserved)
> > > SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=12, Nodes=1
> > > Hierarchical RCU implementation.
> > > 	Build-time adjustment of leaf fanout to 64.
> > > 	RCU restricting CPUs from NR_CPUS=16 to nr_cpu_ids=12.
> > > RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=12
> > > NR_IRQS:4352 nr_irqs:136 16
> > > Console: colour VGA+ 80x25
> > > console [tty0] enabled
> > > bootconsole [earlyser0] disabled
> > > Initializing cgroup subsys cpu
> > > Linux version 4.3.0-rc5-mm1-diet-meta+ (barrios@bbox) (gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04) ) #1545 SMP Tue Oct 20 08:55:45 KST 2015
> > > Command line: hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic console=ttyS0,115200 console=tty0 earlyprintk=ttyS0 ignore_loglevel ftrace_dump_on_oops vga=normal root=/dev/vda1 rw
> > > KERNEL supported cpus:
> > >   Intel GenuineIntel
> > > x86/fpu: Legacy x87 FPU detected.
> > > x86/fpu: Using 'lazy' FPU context switches.
> > > e820: BIOS-provided physical RAM map:
> > > BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> > > BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> > > BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
> > > BIOS-e820: [mem 0x0000000000100000-0x00000000bfffbfff] usable
> > > BIOS-e820: [mem 0x00000000bfffc000-0x00000000bfffffff] reserved
> > > BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
> > > BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
> > > bootconsole [earlyser0] enabled
> > > debug: ignoring loglevel setting.
> > > NX (Execute Disable) protection: active
> > > SMBIOS 2.4 present.
> > > DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
> > > e820: remove [mem 0x000a0000-0x000fffff] usable
> > > e820: last_pfn = 0xbfffc max_arch_pfn = 0x400000000
> > > MTRR default type: write-back
> > > MTRR fixed ranges enabled:
> > >   00000-9FFFF write-back
> > >   A0000-BFFFF uncachable
> > >   C0000-FFFFF write-protect
> > > MTRR variable ranges enabled:
> > >   0 base 00C0000000 mask FFC0000000 uncachable
> > >   1 disabled
> > >   2 disabled
> > >   3 disabled
> > >   4 disabled
> > >   5 disabled
> > >   6 disabled
> > >   7 disabled
> > > x86/PAT: PAT not supported by CPU.
> > > Scan for SMP in [mem 0x00000000-0x000003ff]
> > > Scan for SMP in [mem 0x0009fc00-0x0009ffff]
> > > Scan for SMP in [mem 0x000f0000-0x000fffff]
> > > found SMP MP-table at [mem 0x000f0a70-0x000f0a7f] mapped at [ffff8800000f0a70]
> > >   mpc: f0a80-f0c44
> > > Scanning 1 areas for low memory corruption
> > > Base memory trampoline at [ffff880000099000] 99000 size 24576
> > > init_memory_mapping: [mem 0x00000000-0x000fffff]
> > >  [mem 0x00000000-0x000fffff] page 4k
> > > BRK [0x0220e000, 0x0220efff] PGTABLE
> > > BRK [0x0220f000, 0x0220ffff] PGTABLE
> > > BRK [0x02210000, 0x02210fff] PGTABLE
> > > init_memory_mapping: [mem 0xbfc00000-0xbfdfffff]
> > >  [mem 0xbfc00000-0xbfdfffff] page 2M
> > > BRK [0x02211000, 0x02211fff] PGTABLE
> > > init_memory_mapping: [mem 0xa0000000-0xbfbfffff]
> > >  [mem 0xa0000000-0xbfbfffff] page 2M
> > > init_memory_mapping: [mem 0x80000000-0x9fffffff]
> > >  [mem 0x80000000-0x9fffffff] page 2M
> > > init_memory_mapping: [mem 0x00100000-0x7fffffff]
> > >  [mem 0x00100000-0x001fffff] page 4k
> > >  [mem 0x00200000-0x7fffffff] page 2M
> > > init_memory_mapping: [mem 0xbfe00000-0xbfffbfff]
> > >  [mem 0xbfe00000-0xbfffbfff] page 4k
> > > BRK [0x02212000, 0x02212fff] PGTABLE
> > > RAMDISK: [mem 0x7851a000-0x7fffffff]
> > >  [ffffea0000000000-ffffea0002ffffff] PMD -> [ffff8800bc400000-ffff8800bf3fffff] on node 0
> > > Zone ranges:
> > >   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> > >   DMA32    [mem 0x0000000001000000-0x00000000bfffbfff]
> > >   Normal   empty
> > > Movable zone start for each node
> > > Early memory node ranges
> > >   node   0: [mem 0x0000000000001000-0x000000000009efff]
> > >   node   0: [mem 0x0000000000100000-0x00000000bfffbfff]
> > > Initmem setup node 0 [mem 0x0000000000001000-0x00000000bfffbfff]
> > > On node 0 totalpages: 786330
> > >   DMA zone: 64 pages used for memmap
> > >   DMA zone: 21 pages reserved
> > >   DMA zone: 3998 pages, LIFO batch:0
> > >   DMA32 zone: 12224 pages used for memmap
> > >   DMA32 zone: 782332 pages, LIFO batch:31
> > > Intel MultiProcessor Specification v1.4
> > >   mpc: f0a80-f0c44
> > > MPTABLE: OEM ID: BOCHSCPU
> > > MPTABLE: Product ID: 0.1         
> > > MPTABLE: APIC at: 0xFEE00000
> > > mapped APIC to ffffffffff5fd000 (        fee00000)
> > > Processor #0 (Bootup-CPU)
> > > Processor #1
> > > Processor #2
> > > Processor #3
> > > Processor #4
> > > Processor #5
> > > Processor #6
> > > Processor #7
> > > Processor #8
> > > Processor #9
> > > Processor #10
> > > Processor #11
> > > Bus #0 is PCI   
> > > Bus #1 is ISA   
> > > IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
> > > Int: type 0, pol 1, trig 0, bus 00, IRQ 04, APIC ID 0, APIC INT 09
> > > Int: type 0, pol 1, trig 0, bus 00, IRQ 0c, APIC ID 0, APIC INT 0b
> > > Int: type 0, pol 1, trig 0, bus 00, IRQ 10, APIC ID 0, APIC INT 0b
> > > Int: type 0, pol 1, trig 0, bus 00, IRQ 14, APIC ID 0, APIC INT 0a
> > > Int: type 0, pol 1, trig 0, bus 00, IRQ 18, APIC ID 0, APIC INT 0a
> > > Int: type 0, pol 0, trig 0, bus 01, IRQ 00, APIC ID 0, APIC INT 02
> > > Int: type 0, pol 0, trig 0, bus 01, IRQ 01, APIC ID 0, APIC INT 01
> > > Int: type 0, pol 0, trig 0, bus 01, IRQ 03, APIC ID 0, APIC INT 03
> > > Int: type 0, pol 0, trig 0, bus 01, IRQ 04, APIC ID 0, APIC INT 04
> > > Int: type 0, pol 0, trig 0, bus 01, IRQ 06, APIC ID 0, APIC INT 06
> > > Int: type 0, pol 0, trig 0, bus 01, IRQ 07, APIC ID 0, APIC INT 07
> > > Int: type 0, pol 0, trig 0, bus 01, IRQ 08, APIC ID 0, APIC INT 08
> > > Int: type 0, pol 0, trig 0, bus 01, IRQ 0c, APIC ID 0, APIC INT 0c
> > > Int: type 0, pol 0, trig 0, bus 01, IRQ 0d, APIC ID 0, APIC INT 0d
> > > Int: type 0, pol 0, trig 0, bus 01, IRQ 0e, APIC ID 0, APIC INT 0e
> > > Int: type 0, pol 0, trig 0, bus 01, IRQ 0f, APIC ID 0, APIC INT 0f
> > > Lint: type 3, pol 0, trig 0, bus 01, IRQ 00, APIC ID 0, APIC LINT 00
> > > Lint: type 1, pol 0, trig 0, bus 01, IRQ 00, APIC ID ff, APIC LINT 01
> > > Processors: 12
> > > smpboot: Allowing 12 CPUs, 0 hotplug CPUs
> > > mapped IOAPIC to ffffffffff5fc000 (fec00000)
> > > e820: [mem 0xc0000000-0xfeffbfff] available for PCI devices
> > > clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
> > > setup_percpu: NR_CPUS:16 nr_cpumask_bits:16 nr_cpu_ids:12 nr_node_ids:1
> > > PERCPU: Embedded 31 pages/cpu @ffff8800bfa00000 s87640 r8192 d31144 u131072
> > > pcpu-alloc: s87640 r8192 d31144 u131072 alloc=1*2097152
> > > pcpu-alloc: [0] 00 01 02 03 04 05 06 07 08 09 10 11 -- -- -- -- 
> > > Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 774021
> > > Kernel command line: hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic console=ttyS0,115200 console=tty0 earlyprintk=ttyS0 ignore_loglevel ftrace_dump_on_oops vga=normal root=/dev/vda1 rw
> > > sysrq: sysrq always enabled.
> > > log_buf_len individual max cpu contribution: 2097152 bytes
> > > log_buf_len total cpu_extra contributions: 23068672 bytes
> > > log_buf_len min size: 8388608 bytes
> > > log_buf_len: 33554432 bytes
> > > early log buf free: 8380096(99%)
> > > PID hash table entries: 4096 (order: 3, 32768 bytes)
> > > Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
> > > Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
> > > Memory: 2911172K/3145320K available (4237K kernel code, 721K rwdata, 1988K rodata, 936K init, 8608K bss, 234148K reserved, 0K cma-reserved)
> > > SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=12, Nodes=1
> > > Hierarchical RCU implementation.
> > > 	Build-time adjustment of leaf fanout to 64.
> > > 	RCU restricting CPUs from NR_CPUS=16 to nr_cpu_ids=12.
> > > RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=12
> > > NR_IRQS:4352 nr_irqs:136 16
> > > Console: colour VGA+ 80x25
> > > console [tty0] enabled
> > > bootconsole [earlyser0] disabled
> > > console [ttyS0] enabled
> > > tsc: Fast TSC calibration using PIT
> > > tsc: Detected 3199.926 MHz processor
> > > Calibrating delay loop (skipped), value calculated using timer frequency.. 6399.85 BogoMIPS (lpj=12799704)
> > > pid_max: default: 32768 minimum: 301
> > > Mount-cache hash table entries: 8192 (order: 4, 65536 bytes)
> > > Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes)
> > > Initializing cgroup subsys memory
> > > Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
> > > Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
> > > Freeing SMP alternatives memory: 20K (ffffffff819a0000 - ffffffff819a5000)
> > > ftrace: allocating 16664 entries in 66 pages
> > > Switched APIC routing to physical flat.
> > > enabled ExtINT on CPU#0
> > > ENABLING IO-APIC IRQs
> > > init IO_APIC IRQs
> > >  apic 0 pin 0 not connected
> > > IOAPIC[0]: Set routing entry (0-1 -> 0x31 -> IRQ 1 Mode:0 Active:0 Dest:0)
> > > IOAPIC[0]: Set routing entry (0-2 -> 0x30 -> IRQ 0 Mode:0 Active:0 Dest:0)
> > > IOAPIC[0]: Set routing entry (0-3 -> 0x33 -> IRQ 3 Mode:0 Active:0 Dest:0)
> > > IOAPIC[0]: Set routing entry (0-4 -> 0x34 -> IRQ 4 Mode:0 Active:0 Dest:0)
> > >  apic 0 pin 5 not connected
> > > IOAPIC[0]: Set routing entry (0-6 -> 0x36 -> IRQ 6 Mode:0 Active:0 Dest:0)
> > > IOAPIC[0]: Set routing entry (0-7 -> 0x37 -> IRQ 7 Mode:0 Active:0 Dest:0)
> > > IOAPIC[0]: Set routing entry (0-8 -> 0x38 -> IRQ 8 Mode:0 Active:0 Dest:0)
> > > IOAPIC[0]: Set routing entry (0-9 -> 0x39 -> IRQ 9 Mode:1 Active:0 Dest:0)
> > > IOAPIC[0]: Set routing entry (0-10 -> 0x3a -> IRQ 10 Mode:1 Active:0 Dest:0)
> > > IOAPIC[0]: Set routing entry (0-11 -> 0x3b -> IRQ 11 Mode:1 Active:0 Dest:0)
> > > IOAPIC[0]: Set routing entry (0-12 -> 0x3c -> IRQ 12 Mode:0 Active:0 Dest:0)
> > > IOAPIC[0]: Set routing entry (0-13 -> 0x3d -> IRQ 13 Mode:0 Active:0 Dest:0)
> > > IOAPIC[0]: Set routing entry (0-14 -> 0x3e -> IRQ 14 Mode:0 Active:0 Dest:0)
> > > IOAPIC[0]: Set routing entry (0-15 -> 0x3f -> IRQ 15 Mode:0 Active:0 Dest:0)
> > >  apic 0 pin 16 not connected
> > >  apic 0 pin 17 not connected
> > >  apic 0 pin 18 not connected
> > >  apic 0 pin 19 not connected
> > >  apic 0 pin 20 not connected
> > >  apic 0 pin 21 not connected
> > >  apic 0 pin 22 not connected
> > >  apic 0 pin 23 not connected
> > > ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> > > Using local APIC timer interrupts.
> > > calibrating APIC timer ...
> > > ... lapic delta = 6251755
> > > ..... delta 6251755
> > > ..... mult: 268510832
> > > ..... calibration result: 4001123
> > > ..... CPU clock speed is 3200.3592 MHz.
> > > ..... host bus clock speed is 1000.1123 MHz.
> > > ... verify APIC timer
> > > ... jiffies delta = 25
> > > ... jiffies result ok
> > > smpboot: CPU0: Intel QEMU Virtual CPU version 2.0.0 (family: 0x6, model: 0x6, stepping: 0x3)
> > > Performance Events: Broken PMU hardware detected, using software events only.
> > > Failed to access perfctr msr (MSR c2 is 0)
> > > x86: Booting SMP configuration:
> > > .... node  #0, CPUs:        #1
> > > masked ExtINT on CPU#1
> > >   #2
> > > masked ExtINT on CPU#2
> > >   #3
> > > masked ExtINT on CPU#3
> > >   #4
> > > masked ExtINT on CPU#4
> > >   #5
> > > masked ExtINT on CPU#5
> > >   #6
> > > masked ExtINT on CPU#6
> > >   #7
> > > masked ExtINT on CPU#7
> > >   #8
> > > masked ExtINT on CPU#8
> > >   #9
> > > masked ExtINT on CPU#9
> > >  #10
> > > masked ExtINT on CPU#10
> > >  #11
> > > masked ExtINT on CPU#11
> > > x86: Booted up 1 node, 12 CPUs
> > > smpboot: Total of 12 processors activated (76818.13 BogoMIPS)
> > > devtmpfs: initialized
> > > clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
> > > NET: Registered protocol family 16
> > > PCI: Using configuration type 1 for base access
> > > vgaarb: loaded
> > > SCSI subsystem initialized
> > > libata version 3.00 loaded.
> > > PCI: Probing PCI hardware
> > > PCI: root bus 00: using default resources
> > > PCI: Probing PCI hardware (bus 00)
> > > PCI host bridge to bus 0000:00
> > > pci_bus 0000:00: root bus resource [io  0x0000-0xffff]
> > > pci_bus 0000:00: root bus resource [mem 0x00000000-0xffffffffff]
> > > pci_bus 0000:00: No busn resource found for root bus, will use [bus 00-ff]
> > > pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
> > > pci 0000:00:01.0: [8086:7000] type 00 class 0x060100
> > > pci 0000:00:01.1: [8086:7010] type 00 class 0x010180
> > > pci 0000:00:01.1: reg 0x20: [io  0xc0c0-0xc0cf]
> > > pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io  0x01f0-0x01f7]
> > > pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io  0x03f6]
> > > pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io  0x0170-0x0177]
> > > pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io  0x0376]
> > > pci 0000:00:01.3: [8086:7113] type 00 class 0x068000
> > > pci 0000:00:02.0: [1013:00b8] type 00 class 0x030000
> > > pci 0000:00:02.0: reg 0x10: [mem 0xfc000000-0xfdffffff pref]
> > > pci 0000:00:02.0: reg 0x14: [mem 0xfebd0000-0xfebd0fff]
> > > pci 0000:00:02.0: reg 0x30: [mem 0xfebc0000-0xfebcffff pref]
> > > vgaarb: setting as boot device: PCI:0000:00:02.0
> > > vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
> > > pci 0000:00:03.0: [1af4:1000] type 00 class 0x020000
> > > pci 0000:00:03.0: reg 0x10: [io  0xc080-0xc09f]
> > > pci 0000:00:03.0: reg 0x14: [mem 0xfebd1000-0xfebd1fff]
> > > pci 0000:00:03.0: reg 0x30: [mem 0xfeb80000-0xfebbffff pref]
> > > pci 0000:00:04.0: [1af4:1002] type 00 class 0x00ff00
> > > pci 0000:00:04.0: reg 0x10: [io  0xc0a0-0xc0bf]
> > > pci 0000:00:05.0: [1af4:1001] type 00 class 0x010000
> > > pci 0000:00:05.0: reg 0x10: [io  0xc000-0xc03f]
> > > pci 0000:00:05.0: reg 0x14: [mem 0xfebd2000-0xfebd2fff]
> > > pci 0000:00:06.0: [1af4:1001] type 00 class 0x010000
> > > pci 0000:00:06.0: reg 0x10: [io  0xc040-0xc07f]
> > > pci 0000:00:06.0: reg 0x14: [mem 0xfebd3000-0xfebd3fff]
> > > pci 0000:00:07.0: [8086:25ab] type 00 class 0x088000
> > > pci 0000:00:07.0: reg 0x10: [mem 0xfebd4000-0xfebd400f]
> > > pci_bus 0000:00: busn_res: [bus 00-ff] end is updated to 00
> > > pci 0000:00:01.0: PIIX/ICH IRQ router [8086:7000]
> > > PCI: pci_cache_line_size set to 64 bytes
> > > e820: reserve RAM buffer [mem 0x0009fc00-0x0009ffff]
> > > e820: reserve RAM buffer [mem 0xbfffc000-0xbfffffff]
> > > clocksource: Switched to clocksource refined-jiffies
> > > pci_bus 0000:00: resource 4 [io  0x0000-0xffff]
> > > pci_bus 0000:00: resource 5 [mem 0x00000000-0xffffffffff]
> > > NET: Registered protocol family 2
> > > TCP established hash table entries: 32768 (order: 6, 262144 bytes)
> > > TCP bind hash table entries: 32768 (order: 7, 524288 bytes)
> > > TCP: Hash tables configured (established 32768 bind 32768)
> > > UDP hash table entries: 2048 (order: 4, 65536 bytes)
> > > UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes)
> > > NET: Registered protocol family 1
> > > Trying to unpack rootfs image as initramfs...
> > > Freeing initrd memory: 125848K (ffff88007851a000 - ffff880080000000)
> > > platform rtc_cmos: registered platform RTC device (no PNP device found)
> > > Scanning for low memory corruption every 60 seconds
> > > futex hash table entries: 4096 (order: 6, 262144 bytes)
> > > HugeTLB registered 2 MB page size, pre-allocated 0 pages
> > > fuse init (API version 7.23)
> > > 9p: Installing v9fs 9p2000 file system support
> > > cryptomgr_test (74) used greatest stack depth: 15352 bytes left
> > > cryptomgr_test (82) used greatest stack depth: 15136 bytes left
> > > Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251)
> > > io scheduler noop registered
> > > io scheduler deadline registered
> > > io scheduler cfq registered (default)
> > > querying PCI -> IRQ mapping bus:0, slot:3, pin:0.
> > > virtio-pci 0000:00:03.0: PCI->APIC IRQ transform: INT A -> IRQ 11
> > > virtio-pci 0000:00:03.0: virtio_pci: leaving for legacy driver
> > > querying PCI -> IRQ mapping bus:0, slot:4, pin:0.
> > > virtio-pci 0000:00:04.0: PCI->APIC IRQ transform: INT A -> IRQ 11
> > > virtio-pci 0000:00:04.0: virtio_pci: leaving for legacy driver
> > > querying PCI -> IRQ mapping bus:0, slot:5, pin:0.
> > > virtio-pci 0000:00:05.0: PCI->APIC IRQ transform: INT A -> IRQ 10
> > > virtio-pci 0000:00:05.0: virtio_pci: leaving for legacy driver
> > > querying PCI -> IRQ mapping bus:0, slot:6, pin:0.
> > > virtio-pci 0000:00:06.0: PCI->APIC IRQ transform: INT A -> IRQ 10
> > > virtio-pci 0000:00:06.0: virtio_pci: leaving for legacy driver
> > > Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
> > > serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> > > Linux agpgart interface v0.103
> > > brd: module loaded
> > > loop: module loaded
> > >  vda: vda1 vda2 < vda5 >
> > > zram: Added device: zram0
> > > libphy: Fixed MDIO Bus: probed
> > > tun: Universal TUN/TAP device driver, 1.6
> > > tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
> > > serio: i8042 KBD port at 0x60,0x64 irq 1
> > > serio: i8042 AUX port at 0x60,0x64 irq 12
> > > mousedev: PS/2 mouse device common for all mice
> > > rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
> > > rtc_cmos rtc_cmos: alarms up to one day, 114 bytes nvram
> > > device-mapper: ioctl: 4.33.0-ioctl (2015-8-18) initialised: dm-devel@redhat.com
> > > device-mapper: cache cleaner: version 1.0.0 loaded
> > > NET: Registered protocol family 17
> > > 9pnet: Installing 9P2000 support
> > > ... APIC ID:      00000000 (0)
> > > ... APIC VERSION: 01050014
> > > 0000000000000000000000000000000000000000000000000000000000000000
> > > 000000000e000000000000000000000000000000000000000000000000000000
> > > 0000000000020000000000000000000000000000000000000000000000008000
> > > 
> > > number of MP IRQ sources: 16.
> > > number of IO-APIC #0 registers: 24.
> > > testing the IO APIC.......................
> > > IO APIC #0......
> > > .... register #00: 00000000
> > > .......    : physical APIC id: 00
> > > .......    : Delivery Type: 0
> > > .......    : LTS          : 0
> > > .... register #01: 00170011
> > > .......     : max redirection entries: 17
> > > .......     : PRQ implemented: 0
> > > .......     : IO APIC version: 11
> > > .... register #02: 00000000
> > > .......     : arbitration: 00
> > > .... IRQ redirection table:
> > > IOAPIC 0:
> > >  pin00, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
> > >  pin01, enabled , edge , high, V(31), IRR(0), S(0), physical, D(00), M(0)
> > >  pin02, enabled , edge , high, V(30), IRR(0), S(0), physical, D(00), M(0)
> > >  pin03, enabled , edge , high, V(33), IRR(0), S(0), physical, D(00), M(0)
> > >  pin04, disabled, edge , high, V(34), IRR(0), S(0), physical, D(00), M(0)
> > >  pin05, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
> > >  pin06, enabled , edge , high, V(36), IRR(0), S(0), physical, D(00), M(0)
> > >  pin07, enabled , edge , high, V(37), IRR(0), S(0), physical, D(00), M(0)
> > >  pin08, enabled , edge , high, V(38), IRR(0), S(0), physical, D(00), M(0)
> > >  pin09, disabled, level, high, V(39), IRR(0), S(0), physical, D(00), M(0)
> > >  pin0a, enabled , level, high, V(3A), IRR(0), S(0), physical, D(00), M(0)
> > >  pin0b, enabled , level, high, V(3B), IRR(0), S(0), physical, D(00), M(0)
> > >  pin0c, enabled , edge , high, V(3C), IRR(0), S(0), physical, D(00), M(0)
> > >  pin0d, enabled , edge , high, V(3D), IRR(0), S(0), physical, D(00), M(0)
> > >  pin0e, enabled , edge , high, V(3E), IRR(0), S(0), physical, D(00), M(0)
> > >  pin0f, enabled , edge , high, V(3F), IRR(0), S(0), physical, D(00), M(0)
> > >  pin10, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
> > >  pin11, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
> > >  pin12, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
> > >  pin13, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
> > >  pin14, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
> > >  pin15, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
> > >  pin16, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
> > >  pin17, disabled, edge , high, V(00), IRR(0), S(0), physical, D(00), M(0)
> > > IRQ to pin mappings:
> > > IRQ0 -> 0:2
> > > IRQ1 -> 0:1
> > > IRQ3 -> 0:3
> > > IRQ4 -> 0:4
> > > IRQ6 -> 0:6
> > > IRQ7 -> 0:7
> > > IRQ8 -> 0:8
> > > IRQ9 -> 0:9
> > > IRQ10 -> 0:10
> > > IRQ11 -> 0:11
> > > IRQ12 -> 0:12
> > > IRQ13 -> 0:13
> > > IRQ14 -> 0:14
> > > IRQ15 -> 0:15
> > > .................................... done.
> > > rtc_cmos rtc_cmos: setting system clock to 2015-10-20 08:57:55 UTC (1445331475)
> > > input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
> > > Freeing unused kernel memory: 936K (ffffffff818b6000 - ffffffff819a0000)
> > > Write protecting the kernel read-only data: 8192k
> > > Freeing unused kernel memory: 1900K (ffff880001425000 - ffff880001600000)
> > > Freeing unused kernel memory: 60K (ffff8800017f1000 - ffff880001800000)
> > > busybox (117) used greatest stack depth: 14480 bytes left
> > > exe (124) used greatest stack depth: 14024 bytes left
> > > udevd[140]: starting version 175
> > > blkid (151) used greatest stack depth: 13920 bytes left
> > > modprobe (242) used greatest stack depth: 13784 bytes left
> > > clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2e200418439, max_idle_ns: 440795220848 ns
> > > clocksource: Switched to clocksource tsc
> > > EXT4-fs (vda1): recovery complete
> > > EXT4-fs (vda1): mounted filesystem with ordered data mode. Opts: (null)
> > > exe (262) used greatest stack depth: 13032 bytes left
> > > random: init urandom read with 9 bits of entropy available
> > > init: plymouth-upstart-bridge main process (279) terminated with status 1
> > > init: plymouth-upstart-bridge main process ended, respawning
> > > init: plymouth-upstart-bridge main process (289) terminated with status 1
> > > init: plymouth-upstart-bridge main process ended, respawning
> > > init: plymouth-upstart-bridge main process (293) terminated with status 1
> > > init: plymouth-upstart-bridge main process ended, respawning
> > > init: ureadahead main process (282) terminated with status 5
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > systemd-udevd[423]: starting version 204
> > > EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: errors=remount-ro
> > >  * Stopping Send an event to indicate plymouth is up^[[74G[ OK ]
> > >  * Starting Mount filesystems on boot^[[74G[ OK ]
> > >  * Starting Signal sysvinit that the rootfs is mounted^[[74G[ OK ]
> > >  * Starting Populate /dev filesystem^[[74G[ OK ]
> > >  * Starting Populate and link to /run filesystem^[[74G[ OK ]
> > >  * Stopping Populate /dev filesystem^[[74G[ OK ]
> > >  * Stopping Populate and link to /run filesystem^[[74G[ OK ]
> > >  * Starting Clean /tmp directory^[[74G[ OK ]
> > >  * Stopping Track if upstart is running in a container^[[74G[ OK ]
> > >  * Stopping Clean /tmp directory^[[74G[ OK ]
> > >  * Starting Initialize or finalize resolvconf^[[74G[ OK ]
> > >  * Starting set console keymap^[[74G[ OK ]
> > >  * Starting Signal sysvinit that virtual filesystems are mounted^[[74G[ OK ]
> > >  * Starting Signal sysvinit that virtual filesystems are mounted^[[74G[ OK ]
> > >  * Starting Bridge udev events into upstart^[[74G[ OK ]
> > >  * Starting Signal sysvinit that remote filesystems are mounted^[[74G[ OK ]
> > >  * Stopping set console keymap^[[74G[ OK ]
> > >  * Starting device node and kernel event manager^[[74G[ OK ]
> > >  * Starting load modules from /etc/modules^[[74G[ OK ]
> > >  * Starting cold plug devices^[[74G[ OK ]
> > >  * Starting log initial device creation^[[74G[ OK ]
> > >  * Stopping Read required files in advance (for other mountpoints)^[[74G[ OK ]
> > >  * Stopping load modules from /etc/modules^[[74G[ OK ]
> > >  * Starting Signal sysvinit that local filesystems are mounted^[[74G[ OK ]
> > >  * Starting flush early job output to logs^[[74G[ OK ]
> > >  * Stopping Mount filesystems on boot^[[74G[ OK ]
> > >  * Stopping flush early job output to logs^[[74G[ OK ]
> > >  * Starting D-Bus system message bus^[[74G[ OK ]
> > >  * Starting SystemD login management service^[[74G[ OK ]
> > >  * Starting system logging daemon^[[74G[ OK ]
> > >  * Stopping cold plug devices^[[74G[ OK ]
> > >  * Starting Uncomplicated firewall^[[74G[ OK ]
> > >  * Starting configure network device security^[[74G[ OK ]
> > >  * Stopping log initial device creation^[[74G[ OK ]
> > >  * Starting configure network device security^[[74G[ OK ]
> > >  * Starting save udev log and update rules^[[74G[ OK ]
> > >  * Starting set console font^[[74G[ OK ]
> > >  * Stopping save udev log and update rules^[[74G[ OK ]
> > >  * Starting Mount network filesystems^[[74G[ OK ]
> > >  * Starting Failsafe Boot Delay^[[74G[ OK ]
> > >  * Starting configure network device security^[[74G[ OK ]
> > >  * Stopping Mount network filesystems^[[74G[ OK ]
> > >  * Starting configure network device^[[74G[ OK ]
> > >  * Starting configure network device^[[74G[ OK ]
> > >  * Starting Bridge file events into upstart^[[74G[ OK ]
> > >  * Starting Bridge socket events into upstart^[[74G[ OK ]
> > >  * Stopping set console font^[[74G[ OK ]
> > >  * Starting userspace bootsplash^[[74G[ OK ]
> > >  * Starting Send an event to indicate plymouth is up^[[74G[ OK ]
> > >  * Stopping userspace bootsplash^[[74G[ OK ]
> > >  * Stopping Send an event to indicate plymouth is up^[[74G[ OK ]
> > >  * Starting Mount network filesystems^[[74G[ OK ]
> > > init: failsafe main process (591) killed by TERM signal
> > >  * Stopping Failsafe Boot Delay^[[74G[ OK ]
> > >  * Starting System V initialisation compatibility^[[74G[ OK ]
> > >  * Stopping Mount network filesystems^[[74G[ OK ]
> > >  * Starting configure virtual network devices^[[74G[ OK ]
> > >  * Stopping System V initialisation compatibility^[[74G[ OK ]
> > >  * Starting System V runlevel compatibility^[[74G[ OK ]
> > >  * Starting deferred execution scheduler^[[74G[ OK ]
> > >  * Starting regular background program processing daemon^[[74G[ OK ]
> > >  * Starting ACPI daemon^[[74G[ OK ]
> > >  * Starting save kernel messages^[[74G[ OK ]
> > >  * Starting CPU interrupts balancing daemon^[[74G[ OK ]
> > >  * Stopping save kernel messages^[[74G[ OK ]
> > >  * Starting OpenSSH server^[[74G[ OK ]
> > >  * Starting automatic crash report generation^[[74G[ OK ]
> > >  * Restoring resolver state...       ^[[80G 
^[[74G[ OK ]
> > > eth0 Link encap:Ethernet HWaddr 52:54:79:12:34:57 inet addr:192.168.0.21 Bcast:192.168.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:34 errors:0 dropped:24 overruns:0 frame:0 TX packets:4 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:5780 (5.7 KB) TX bytes:800 (800.0 B) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
> > >  * Stopping System V runlevel compatibility^[[74G[ OK ]
> > > init: plymouth-upstart-bridge main process ended, respawning
> > > sh (1429) used greatest stack depth: 11752 bytes left
> > > sh (1454) used greatest stack depth: 11528 bytes left
> > > random: nonblocking pool is initialized
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > sh (2785) used greatest stack depth: 11480 bytes left
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> > > IP: [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> > > PGD 0 
> > > Oops: 0000 [#1] SMP 
> > > Dumping ftrace buffer:
> > >    (ftrace buffer empty)
> > > Modules linked in:
> > > CPU: 1 PID: 26445 Comm: sh Not tainted 4.3.0-rc5-mm1-diet-meta+ #1545
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > task: ffff8800b9af3480 ti: ffff88007fea0000 task.ti: ffff88007fea0000
> > > RIP: 0010:[<ffffffff810782a9>]  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> > > RSP: 0018:ffff88007fea3648  EFLAGS: 00010202
> > > RAX: 0000000000000001 RBX: ffffea0002324900 RCX: ffff88007fea37e8
> > > RDX: 0000000000000000 RSI: ffff88007fea36e8 RDI: 0000000000000008
> > > RBP: ffff88007fea3648 R08: ffffffff818446a0 R09: ffff8800b9af4c80
> > > R10: 0000000000000216 R11: 0000000000000001 R12: ffff88007f58d6e1
> > > R13: ffff88007f58d6e0 R14: 0000000000000008 R15: 0000000000000001
> > > FS:  00007f0993e78740(0000) GS:ffff8800bfa20000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > CR2: 0000000000000008 CR3: 000000007edee000 CR4: 00000000000006a0
> > > Stack:
> > >  ffff88007fea3678 ffffffff81124ff0 ffffea0002324900 ffff88007fea36e8
> > >  ffff88009ffe8400 0000000000000000 ffff88007fea36c0 ffffffff81125733
> > >  ffff8800bfa34540 ffffffff8105dc9d ffffea0002324900 ffff88007fea37e8
> > > Call Trace:
> > >  [<ffffffff81124ff0>] page_lock_anon_vma_read+0x60/0x180
> > >  [<ffffffff81125733>] rmap_walk+0x1b3/0x3f0
> > >  [<ffffffff8105dc9d>] ? finish_task_switch+0x5d/0x1f0
> > >  [<ffffffff81125b13>] page_referenced+0x1a3/0x220
> > >  [<ffffffff81123e30>] ? __page_check_address+0x1a0/0x1a0
> > >  [<ffffffff81124f90>] ? page_get_anon_vma+0xd0/0xd0
> > >  [<ffffffff81123820>] ? anon_vma_ctor+0x40/0x40
> > >  [<ffffffff8110087b>] shrink_page_list+0x5ab/0xde0
> > >  [<ffffffff8110174c>] shrink_inactive_list+0x18c/0x4b0
> > >  [<ffffffff811023bd>] shrink_lruvec+0x59d/0x740
> > >  [<ffffffff811025f0>] shrink_zone+0x90/0x250
> > >  [<ffffffff811028dd>] do_try_to_free_pages+0x12d/0x3b0
> > >  [<ffffffff81102d3d>] try_to_free_mem_cgroup_pages+0x9d/0x120
> > >  [<ffffffff811496c3>] try_charge+0x163/0x700
> > >  [<ffffffff81149cb4>] mem_cgroup_do_precharge+0x54/0x70
> > >  [<ffffffff81149e45>] mem_cgroup_can_attach+0x175/0x1b0
> > >  [<ffffffff811b2c57>] ? kernfs_iattrs.isra.6+0x37/0xd0
> > >  [<ffffffff81148e70>] ? get_mctgt_type+0x320/0x320
> > >  [<ffffffff810a9d29>] cgroup_migrate+0x149/0x440
> > >  [<ffffffff810aa60c>] cgroup_attach_task+0x7c/0xe0
> > >  [<ffffffff810aa904>] __cgroup_procs_write.isra.33+0x1d4/0x2b0
> > >  [<ffffffff810aaa10>] cgroup_tasks_write+0x10/0x20
> > >  [<ffffffff810a6238>] cgroup_file_write+0x38/0xf0
> > >  [<ffffffff811b54ad>] kernfs_fop_write+0x11d/0x170
> > >  [<ffffffff81153918>] __vfs_write+0x28/0xe0
> > >  [<ffffffff8116e614>] ? __fd_install+0x24/0xc0
> > >  [<ffffffff810784a1>] ? percpu_down_read+0x21/0x50
> > >  [<ffffffff81153e91>] vfs_write+0xa1/0x170
> > >  [<ffffffff81154716>] SyS_write+0x46/0xa0
> > >  [<ffffffff81420a17>] entry_SYSCALL_64_fastpath+0x12/0x6a
> > > Code: 5e 82 3a 00 48 83 c4 08 5b 5d c3 48 89 45 f0 e8 9b 6a 3a 00 48 8b 45 f0 eb df 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 <48> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7 
> > > RIP  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> > >  RSP <ffff88007fea3648>
> > > CR2: 0000000000000008
> > > BUG: unable to handle kernel ---[ end trace e81a82c8122b447d ]---
> > > Kernel panic - not syncing: Fatal exception
> > > 
> > > NULL pointer dereference at 0000000000000008
> > > IP: [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> > > PGD 0 
> > > Oops: 0000 [#2] SMP 
> > > Dumping ftrace buffer:
> > >    (ftrace buffer empty)
> > > Modules linked in:
> > > CPU: 10 PID: 59 Comm: khugepaged Tainted: G      D         4.3.0-rc5-mm1-diet-meta+ #1545
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > task: ffff8800b9851a40 ti: ffff8800b985c000 task.ti: ffff8800b985c000
> > > RIP: 0010:[<ffffffff810782a9>]  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> > > RSP: 0018:ffff8800b985f778  EFLAGS: 00010202
> > > RAX: 0000000000000001 RBX: ffffea0002321800 RCX: ffff8800b985f918
> > > RDX: 0000000000000000 RSI: ffff8800b985f818 RDI: 0000000000000008
> > > RBP: ffff8800b985f778 R08: ffffffff818446a0 R09: ffff8800b9853240
> > > R10: 000000000000ba03 R11: 0000000000000001 R12: ffff88007f58d6e1
> > > R13: ffff88007f58d6e0 R14: 0000000000000008 R15: 0000000000000001
> > > FS:  0000000000000000(0000) GS:ffff8800bfb40000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > CR2: 0000000000000008 CR3: 0000000001808000 CR4: 00000000000006a0
> > > Stack:
> > >  ffff8800b985f7a8 ffffffff81124ff0 ffffea0002321800 ffff8800b985f818
> > >  ffff88009ffe8400 0000000000000000 ffff8800b985f7f0 ffffffff81125733
> > >  ffff8800bfb54540 ffffffff8105dc9d ffffea0002321800 ffff8800b985f918
> > > Call Trace:
> > >  [<ffffffff81124ff0>] page_lock_anon_vma_read+0x60/0x180
> > >  [<ffffffff81125733>] rmap_walk+0x1b3/0x3f0
> > >  [<ffffffff8105dc9d>] ? finish_task_switch+0x5d/0x1f0
> > >  [<ffffffff81125b13>] page_referenced+0x1a3/0x220
> > >  [<ffffffff81123e30>] ? __page_check_address+0x1a0/0x1a0
> > >  [<ffffffff81124f90>] ? page_get_anon_vma+0xd0/0xd0
> > >  [<ffffffff81123820>] ? anon_vma_ctor+0x40/0x40
> > >  [<ffffffff8110087b>] shrink_page_list+0x5ab/0xde0
> > >  [<ffffffff8110174c>] shrink_inactive_list+0x18c/0x4b0
> > >  [<ffffffff811023bd>] shrink_lruvec+0x59d/0x740
> > >  [<ffffffff811025f0>] shrink_zone+0x90/0x250
> > >  [<ffffffff811028dd>] do_try_to_free_pages+0x12d/0x3b0
> > >  [<ffffffff81102d3d>] try_to_free_mem_cgroup_pages+0x9d/0x120
> > >  [<ffffffff811496c3>] try_charge+0x163/0x700
> > >  [<ffffffff8141d1f3>] ? schedule+0x33/0x80
> > >  [<ffffffff8114d45f>] mem_cgroup_try_charge+0x9f/0x1d0
> > >  [<ffffffff811434bc>] khugepaged+0x7cc/0x1ac0
> > >  [<ffffffff81066e01>] ? hrtick_update+0x1/0x70
> > >  [<ffffffff81072430>] ? prepare_to_wait_event+0xf0/0xf0
> > >  [<ffffffff81142cf0>] ? total_mapcount+0x70/0x70
> > >  [<ffffffff81056cd9>] kthread+0xc9/0xe0
> > >  [<ffffffff81056c10>] ? kthread_park+0x60/0x60
> > >  [<ffffffff81420d6f>] ret_from_fork+0x3f/0x70
> > >  [<ffffffff81056c10>] ? kthread_park+0x60/0x60
> > > Code: 5e 82 3a 00 48 83 c4 08 5b 5d c3 48 89 45 f0 e8 9b 6a 3a 00 48 8b 45 f0 eb df 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 <48> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7 
> > > RIP  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
> > >  RSP <ffff8800b985f778>
> > > CR2: 0000000000000008
> > > ---[ end trace e81a82c8122b447e ]---
> > > Shutting down cpus with NMI
> > > Dumping ftrace buffer:
> > >    (ftrace buffer empty)
> > > Kernel Offset: disabled
> > 
> 
> -- 
>  Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-10-22  0:06   ` Minchan Kim
@ 2015-10-22  0:59     ` Hugh Dickins
  2015-10-22  1:21       ` Minchan Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Hugh Dickins @ 2015-10-22  0:59 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Kirill A. Shutemov, Andrew Morton, linux-mm, linux-kernel,
	Hugh Dickins, Rik van Riel, Mel Gorman, Michal Hocko,
	Johannes Weiner, Vlastimil Babka

On Thu, 22 Oct 2015, Minchan Kim wrote:
> 
> I added the code to check it and queued it again but I had another oops
> in this time but symptom is related to anon_vma, too.
> (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> It seems page_get_anon_vma returns NULL since the page was not page_mapped
> at that time but second check of page_mapped right before try_to_unmap seems
> to be true.
> 
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
> flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)

That's interesting, that's one I added in my page migration series.
Let me think on it, but it could well relate to the one you got before.

> page->mem_cgroup:ffff88007f3dcc00
> ------------[ cut here ]------------
> kernel BUG at mm/migrate.c:889!
> invalid opcode: 0000 [#1] SMP 
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 11 PID: 59 Comm: khugepaged Not tainted 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557

Hmm, it might be me to blame, or it might be Kirill, don't know yet.

Oh, hold on, I think Andrew has just posted a new mmotm, and it includes
an update to Kirill's migrate_pages-try-to-split-pages-on-queueing.patch:
I haven't digested yet, but it might turn out to be relevant.

Hugh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-10-22  0:59     ` Hugh Dickins
@ 2015-10-22  1:21       ` Minchan Kim
  2015-10-22  9:00         ` Minchan Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Minchan Kim @ 2015-10-22  1:21 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Kirill A. Shutemov, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

Hello Hugh,

On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> On Thu, 22 Oct 2015, Minchan Kim wrote:
> > 
> > I added the code to check it and queued it again but I had another oops
> > in this time but symptom is related to anon_vma, too.
> > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > at that time but second check of page_mapped right before try_to_unmap seems
> > to be true.
> > 
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
> > flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
> 
> That's interesting, that's one I added in my page migration series.
> Let me think on it, but it could well relate to the one you got before.

I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
instead of next-20151021 to remove noise from your migration cleanup
series and will test it again.
If it is fixed, I will test again with your migration patchset, then.

> 
> > page->mem_cgroup:ffff88007f3dcc00
> > ------------[ cut here ]------------
> > kernel BUG at mm/migrate.c:889!
> > invalid opcode: 0000 [#1] SMP 
> > Dumping ftrace buffer:
> >    (ftrace buffer empty)
> > Modules linked in:
> > CPU: 11 PID: 59 Comm: khugepaged Not tainted 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557
> 
> Hmm, it might be me to blame, or it might be Kirill, don't know yet.

It might be me, either.

> 
> Oh, hold on, I think Andrew has just posted a new mmotm, and it includes
> an update to Kirill's migrate_pages-try-to-split-pages-on-queueing.patch:
> I haven't digested yet, but it might turn out to be relevant.
> 
> Hugh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-10-21  5:28 kernel oops on mmotm-2015-10-15-15-20 Minchan Kim
  2015-10-21 11:07 ` Kirill A. Shutemov
@ 2015-10-22  2:15 ` Hugh Dickins
  2015-10-22  4:25   ` Hugh Dickins
  1 sibling, 1 reply; 33+ messages in thread
From: Hugh Dickins @ 2015-10-22  2:15 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Kirill A. Shutemov, linux-mm, linux-kernel,
	Hugh Dickins, Rik van Riel, Mel Gorman, Michal Hocko,
	Johannes Weiner, Vlastimil Babka

On Thu, 22 Oct 2015, Minchan Kim wrote:
> Hello Hugh,
> 
> On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > 
> > > I added the code to check it and queued it again but I had another oops
> > > in this time but symptom is related to anon_vma, too.
> > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > at that time but second check of page_mapped right before try_to_unmap seems
> > > to be true.
> > > 
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
> > > flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
> > 
> > That's interesting, that's one I added in my page migration series.
> > Let me think on it, but it could well relate to the one you got before.
> 
> I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> instead of next-20151021 to remove noise from your migration cleanup
> series and will test it again.
> If it is fixed, I will test again with your migration patchset, then.

Not a good use of your time, I think.  It's sure to be fixed in the
rc5-mmotm because that VM_BUG_ON_PAGE(blah) just does not exist in
that tree: I added it to verify my reasoning in changing the comments
about page_get_anon_vma() and PageSwapCache in mm/migrate.c.

> 
> > 
> > > page->mem_cgroup:ffff88007f3dcc00
> > > ------------[ cut here ]------------
> > > kernel BUG at mm/migrate.c:889!
> > > invalid opcode: 0000 [#1] SMP 
> > > Dumping ftrace buffer:
> > >    (ftrace buffer empty)
> > > Modules linked in:
> > > CPU: 11 PID: 59 Comm: khugepaged Not tainted 4.3.0-rc6-next-20151021-THP-ref-madv_free+ #1557
> > 
> > Hmm, it might be me to blame, or it might be Kirill, don't know yet.
> 
> It might be me, either.
> 
> > 
> > Oh, hold on, I think Andrew has just posted a new mmotm, and it includes
> > an update to Kirill's migrate_pages-try-to-split-pages-on-queueing.patch:
> > I haven't digested yet, but it might turn out to be relevant.

Sorry, I think that was an irrelevant suggestion: today's new rc6-mmotm
is identical to yesterday's there, and the patch that was removed appears
to be identical to the one added.

Hugh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-10-22  2:15 ` Hugh Dickins
@ 2015-10-22  4:25   ` Hugh Dickins
  2015-10-22 22:26     ` Hugh Dickins
  0 siblings, 1 reply; 33+ messages in thread
From: Hugh Dickins @ 2015-10-22  4:25 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Minchan Kim, Andrew Morton, Kirill A. Shutemov, linux-mm,
	linux-kernel, Rik van Riel, Mel Gorman, Michal Hocko,
	Johannes Weiner, Vlastimil Babka

On Wed, 21 Oct 2015, Hugh Dickins wrote:
> On Thu, 22 Oct 2015, Minchan Kim wrote:
> > Hello Hugh,
> > 
> > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > 
> > > > I added the code to check it and queued it again but I had another oops
> > > > in this time but symptom is related to anon_vma, too.
> > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > > at that time but second check of page_mapped right before try_to_unmap seems
> > > > to be true.
> > > > 
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
> > > > flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
> > > 
> > > That's interesting, that's one I added in my page migration series.
> > > Let me think on it, but it could well relate to the one you got before.

I think I have introduced a bug there; or rather, made more evident
a pre-existing bug.  But I'm not sure yet: the stacktrace was from
compaction (called by khugepaged, but that may not be relevant at all),
and thinking through the races with isolate_migratepages_block() is
never easy.

What's certain is that I was not giving any thought to
isolate_migratepages_block() when I added that VM_BUG_ON_PAGE():
I was thinking about "stable" anonymous pages, and how they get
faulted back in from swapcache while holding page lock.

It looks to me now as if a page might not yet be PageAnon when it's
first tested in __unmap_and_move(), when going to page_get_anon_vma();
but is page_mapped() and PageAnon() by time of calling try_to_unmap(),
where I inserted the VM_BUG_ON_PAGE().

If so, the code would always have been wrong (trying to unmap the
anonymous page, and later remap its replacement, without a hold on
the anon_vma needed to guide both lookups); but I'll have made it
more glaringly wrong with the VM_BUG_ON_PAGE() - let me pretend
that's a good step forward :)

There's a reference count check in isolated_migratepages_block()
before this, which would make it unlikely, but I doubt rules it out.

However... you did hit an anon_vma reference counting problem before
my migration changes went in, and Kirill had a vague suspicion that
he might be screwing up anon_vma refcounting in split_huge_page():
if he confirms that, I'd say it's more likely to be the cause of
your crash on this occasion.

Not hard to fix mine (though we'll probably have to lose the
VM_BUG_ON_PAGE on the way, so the real fix will be hidden by that
trivial fix), I just want to give the races more thought.

However it turns out, I think you have a very useful test there.

(And I've observed no PageDirty problems with your recent patchsets,
though I don't use MADV_FREE at all myself.)

Hugh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-10-22  1:21       ` Minchan Kim
@ 2015-10-22  9:00         ` Minchan Kim
  2015-10-29  0:25           ` Kirill A. Shutemov
  0 siblings, 1 reply; 33+ messages in thread
From: Minchan Kim @ 2015-10-22  9:00 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Kirill A. Shutemov, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

[-- Attachment #1: Type: text/plain, Size: 4093 bytes --]

On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> Hello Hugh,
> 
> On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > 
> > > I added the code to check it and queued it again but I had another oops
> > > in this time but symptom is related to anon_vma, too.
> > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > at that time but second check of page_mapped right before try_to_unmap seems
> > > to be true.
> > > 
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
> > > flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
> > 
> > That's interesting, that's one I added in my page migration series.
> > Let me think on it, but it could well relate to the one you got before.
> 
> I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> instead of next-20151021 to remove noise from your migration cleanup
> series and will test it again.
> If it is fixed, I will test again with your migration patchset, then.

I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
Therefore, there is no patchset from Hugh's migration patch in there.
And I added below debug code with request from Kirill to all test kernels.

diff --git a/mm/rmap.c b/mm/rmap.c
index ddfb9be72366..1c23b70b1f57 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -513,6 +513,13 @@ struct anon_vma *page_lock_anon_vma_read(struct page *page)
 
        anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
        root_anon_vma = READ_ONCE(anon_vma->root);
+
+       if (root_anon_vma == NULL) {
+               printk("anon_vma %p refcount %d\n", anon_vma,
+                       atomic_read(&anon_vma->refcount));
+               VM_BUG_ON_PAGE(1, page);
+       }
+
        if (down_read_trylock(&root_anon_vma->rwsem)) {
                /*
                 * If the page is still mapped, then this anon_vma is still


1. mmotm-2015-10-15-15-20 + kirill's pte_mkdirty

1st trial:
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:ffff88007f1ed780 idx:1 val:488
BUG: Bad rss-counter state mm:ffff88007f1ed780 idx:2 val:24

2nd trial:

Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:ffff8800a5cca680 idx:1 val:512
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS

2. mmotm-2015-10-15-15-20-no-madvise_free, IOW it means git head for
54bad5da4834 arm64: add pmd_[dirty|mkclean] for THP.

1st trial:
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:ffff88007f4c2d80 idx:1 val:511
BUG: Bad rss-counter state mm:ffff88007f4c2d80 idx:2 val:1

2nd trial:
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
anon_vma ffff880000089aa0 refcount 0
page:ffffea0001a2ea40 count:3 mapcount:1 mapping:ffff880000089aa1 index:0x6000047a9

I tested it with KVM which guest system has 12 core and 3G memory.
In mmotm-2015-10-15-15-20-no-madvise_free, I tweaked test program does
madvise_dontneed intead of madvise_free via below patch

For the testing,

        gcc -o oops oops.c
        ./memcg_test.sh

I will be off from now on so please understand late response
but I hope my test program will reproduce it in your machine.

diff --git a/oops.c b/oops.c
index e50330a..c8298f8 100644
--- a/oops.c
+++ b/oops.c
@@ -8,7 +8,7 @@
 #include <errno.h>
 #include <signal.h>
 
-#define MADV_FREE 5
+#define MADV_FREE 4
 
 int pid;


[-- Attachment #2: memcg_move_task.sh --]
[-- Type: application/x-sh, Size: 338 bytes --]

[-- Attachment #3: memcg_test.sh --]
[-- Type: application/x-sh, Size: 228 bytes --]

[-- Attachment #4: oops.c --]
[-- Type: text/x-csrc, Size: 3945 bytes --]

#include <sys/types.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <signal.h>

#define MADV_FREE 4

int pid;

void sig_handler(int signo)
{
        printf("pid %d sig received %d\n", pid, signo);
	exit(1);
}

void free_bufs(void **bufs, unsigned long buf_count, unsigned long buf_size)
{
	int i;

	for (i = 0; i < buf_count; i++) {
		if (bufs[i] != NULL) {
			munmap(bufs[i],  buf_size);
			bufs[i] = NULL;
		}
	}
}

void alloc_bufs(void **bufs, unsigned long buf_count, unsigned long buf_size)
{
	int i;
	time_t rawtime;
	struct tm * timeinfo;
	void *addr = (void*)0x600000000000;

	for (i = 0; i < buf_count; i++) {
		void *ptr = NULL;

		ptr = mmap(addr, buf_size, PROT_READ|PROT_WRITE,
			MAP_ANON|MAP_PRIVATE|MAP_FIXED, 0, 0);

		if (ptr == MAP_FAILED) {
			char bufs[64];

			sprintf(bufs, "cat /proc/%d/maps", pid);
			printf("error to allocate %p\n", addr);

			system(bufs);
			exit(1);
		}

		addr += buf_size;
		bufs[i] = ptr;
	}
}

void fill_bufs(void **bufs, unsigned long buf_count, unsigned long buf_size)
{
	int i;
	char msg[64] = {0, };

	for (i = 0; i < buf_count; i++)
		memset(bufs[i], 'a' + i, buf_size);

	sprintf(msg, "pid %d buf_count %ld complete", pid, buf_count);
}

void madvise_bufs(void **bufs, unsigned long buf_count,
			unsigned long buf_size, int advise)
{
	int i, ret;

	for (i = 0; i < buf_count; i++) {
retry:
		if (ret = madvise(bufs[i], buf_size, advise)) {
			perror("fail to madvise\n");
			if (ret == EAGAIN) {
				sleep(1);
				goto retry;
			}
			exit(1);
		}
	}
}

void madvise_free_bufs(void **bufs, unsigned long buf_count,
			unsigned long buf_size)
{
	int i;

	for (i = 0; i < buf_count; i++) {
		if (madvise(bufs[i], buf_size, MADV_FREE)) {
			printf("[%d] bufs[%d] %p madvise_free fail\n",
				pid, i, bufs[i]);
		}
	}
}

void check_madvise_bufs(void **bufs, unsigned long buf_count,
			unsigned long buf_size, int freeable)
{
	int i, j;

	for (i = 0; i < buf_count; i++) {
		char tmp;
		void *buf = bufs[i];

		for (j = 0; j < buf_size; j++) {
			int ret;
			unsigned long addr;

			tmp = *(char*)(buf + j);
			/* The page was not purged */
			if (tmp == 'a' + i)
				continue;

			/* The page was purged */
			if (freeable && (int)tmp == 0)
				continue;

			/* Something wrong happens */
			addr = (unsigned long)(buf + j);
			printf("pid %d bufaddr %p ofs %d freeable %d expected %c but %c\n",
					pid, buf, j, freeable, 'a' + i, tmp);
			exit(1);
		}

	}
}

int main(int argc, char *argv[])
{
	int i, ret, advise;
	unsigned long buf_size, buf_count, loop;
	void **bufs;

	pid = getpid();

	if (argc != 4) {
		printf("check your argument\n");
		return 1;
	}

	buf_size = atol(argv[1]);
	buf_count = atol(argv[2]);
	advise = atol(argv[3]);

	if (buf_size & ((2<<20) - 1)) {
		printf("buf_size should be 2M aligned\n");
		return 1;
	}

	printf("[%d] buf size %ld buf_count %ld advise %d\n",
			pid, buf_size, buf_count, advise);

        if (signal(SIGINT, sig_handler) == SIG_ERR) {
                printf("Fail to register signal handler\n");
                return 1;
        }

        if (signal(SIGHUP, sig_handler) == SIG_ERR) {
                printf("Fail to register signal handler\n");
                return 1;
        }

	bufs = malloc(sizeof(void *) * buf_count);
	if (!bufs)
		return 1;

	memset(bufs, 0, sizeof(void *) * buf_count);

	srandom(pid);

	while (1) {
		int madvise_free = madvise_free = random() % 2;

		alloc_bufs(bufs, buf_count, buf_size);

		fill_bufs(bufs, buf_count, buf_size);

		/* We touched buffers so MADV_FREE cannot free pages */
		check_madvise_bufs(bufs, buf_count, buf_size, 0);

		madvise_bufs(bufs, buf_count, buf_size, advise);

		sleep(1);

		/* syscall MADV_FREE */
		madvise_free_bufs(bufs, buf_count, buf_size);

		sleep(1);

		check_madvise_bufs(bufs, buf_count, buf_size, 1);
		free_bufs(bufs, buf_count, buf_size);
	}

	return 0;
}

[-- Attachment #5: oops.sh --]
[-- Type: application/x-sh, Size: 882 bytes --]

[-- Attachment #6: setup_memcg.sh --]
[-- Type: application/x-sh, Size: 311 bytes --]

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-10-22  4:25   ` Hugh Dickins
@ 2015-10-22 22:26     ` Hugh Dickins
  0 siblings, 0 replies; 33+ messages in thread
From: Hugh Dickins @ 2015-10-22 22:26 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Minchan Kim, Andrew Morton, Kirill A. Shutemov, linux-mm,
	linux-kernel, Rik van Riel, Mel Gorman, Michal Hocko,
	Johannes Weiner, Vlastimil Babka

On Wed, 21 Oct 2015, Hugh Dickins wrote:
> On Wed, 21 Oct 2015, Hugh Dickins wrote:
> > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > Hello Hugh,
> > > 
> > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > 
> > > > > I added the code to check it and queued it again but I had another oops
> > > > > in this time but symptom is related to anon_vma, too.
> > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > > > at that time but second check of page_mapped right before try_to_unmap seems
> > > > > to be true.
> > > > > 
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
> > > > > flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
> > > > 
> > > > That's interesting, that's one I added in my page migration series.
> > > > Let me think on it, but it could well relate to the one you got before.
> 
> I think I have introduced a bug there; or rather, made more evident
> a pre-existing bug.  But I'm not sure yet: the stacktrace was from
> compaction (called by khugepaged, but that may not be relevant at all),
> and thinking through the races with isolate_migratepages_block() is
> never easy.
> 
> What's certain is that I was not giving any thought to
> isolate_migratepages_block() when I added that VM_BUG_ON_PAGE():
> I was thinking about "stable" anonymous pages, and how they get
> faulted back in from swapcache while holding page lock.
> 
> It looks to me now as if a page might not yet be PageAnon when it's
> first tested in __unmap_and_move(), when going to page_get_anon_vma();
> but is page_mapped() and PageAnon() by time of calling try_to_unmap(),
> where I inserted the VM_BUG_ON_PAGE().
> 
> If so, the code would always have been wrong (trying to unmap the
> anonymous page, and later remap its replacement, without a hold on
> the anon_vma needed to guide both lookups); but I'll have made it
> more glaringly wrong with the VM_BUG_ON_PAGE() - let me pretend
> that's a good step forward :)
> 
> There's a reference count check in isolated_migratepages_block()
> before this, which would make it unlikely, but I doubt rules it out.
> 
> However... you did hit an anon_vma reference counting problem before
> my migration changes went in, and Kirill had a vague suspicion that
> he might be screwing up anon_vma refcounting in split_huge_page():
> if he confirms that, I'd say it's more likely to be the cause of
> your crash on this occasion.
> 
> Not hard to fix mine (though we'll probably have to lose the
> VM_BUG_ON_PAGE on the way, so the real fix will be hidden by that
> trivial fix), I just want to give the races more thought.

And after giving it more thought, I realize that I was wrong yesterday,
and the new VM_BUG_ON_PAGE() should be good as is: my guess is that it
is simply alerting you to the same anon_vma reference counting issue
as you had already hit without that patch.

What I was forgetting yesterday, is that isolate_migratepages_block()
can only take the page for migration when it's PageLRU(): and
do_anonymous_page() only adds a page to the LRU after it has been
marked as mapped and PageAnon.

So the window that worried me yesterday, that __unmap_and_move()
might see !PageAnon, then reach try_to_unmap() with it page_mapped
and PageAnon: that window does not exist, with or without my changes.

Hugh

> 
> However it turns out, I think you have a very useful test there.
> 
> (And I've observed no PageDirty problems with your recent patchsets,
> though I don't use MADV_FREE at all myself.)
> 
> Hugh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-10-22  9:00         ` Minchan Kim
@ 2015-10-29  0:25           ` Kirill A. Shutemov
  2015-10-29  7:58             ` Minchan Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Kirill A. Shutemov @ 2015-10-29  0:25 UTC (permalink / raw)
  To: Minchan Kim, Hugh Dickins, Sasha Levin
  Cc: Andrew Morton, linux-mm, linux-kernel, Rik van Riel, Mel Gorman,
	Michal Hocko, Johannes Weiner, Vlastimil Babka

On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > Hello Hugh,
> > 
> > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > 
> > > > I added the code to check it and queued it again but I had another oops
> > > > in this time but symptom is related to anon_vma, too.
> > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > > at that time but second check of page_mapped right before try_to_unmap seems
> > > > to be true.
> > > > 
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
> > > > flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
> > > 
> > > That's interesting, that's one I added in my page migration series.
> > > Let me think on it, but it could well relate to the one you got before.
> > 
> > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > instead of next-20151021 to remove noise from your migration cleanup
> > series and will test it again.
> > If it is fixed, I will test again with your migration patchset, then.
> 
> I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> Therefore, there is no patchset from Hugh's migration patch in there.
> And I added below debug code with request from Kirill to all test kernels.

It took too long time (and a lot of printk()), but I think I track it down
finally.
 
The patch below seems fixes issue for me. It's not yet properly tested, but
looks like it works.

The problem was my wrong assumption on how migration works: I thought that
kernel would wait migration to finish on before deconstruction mapping.

But turn out that's not true.

As result if zap_pte_range() races with split_huge_page(), we can end up
with page which is not mapped anymore but has _count and _mapcount
elevated. The page is on LRU too. So it's still reachable by vmscan and by
pfn scanners (Sasha showed few similar traces from compaction too).
It's likely that page->mapping in this case would point to freed anon_vma.

BOOM!

The patch modify freeze/unfreeze_page() code to match normal migration
entries logic: on setup we remove page from rmap and drop pin, on removing
we get pin back and put page on rmap. This way even if migration entry
will be removed under us we don't corrupt page's state.

Please, test.

Not-Yet-Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5e0fe82a0fae..192b50c7526c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2934,6 +2934,13 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 
 	smp_wmb(); /* make pte visible before pmd */
 	pmd_populate(mm, pmd, pgtable);
+
+	if (freeze) {
+		for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
+			page_remove_rmap(page + i, false);
+			put_page(page + i);
+		}
+	}
 }
 
 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
@@ -3079,6 +3086,8 @@ static void freeze_page_vma(struct vm_area_struct *vma, struct page *page,
 		if (pte_soft_dirty(entry))
 			swp_pte = pte_swp_mksoft_dirty(swp_pte);
 		set_pte_at(vma->vm_mm, address, pte + i, swp_pte);
+		page_remove_rmap(page, false);
+		put_page(page);
 	}
 	pte_unmap_unlock(pte, ptl);
 }
@@ -3117,8 +3126,6 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page,
 		return;
 	pte = pte_offset_map_lock(vma->vm_mm, pmd, address, &ptl);
 	for (i = 0; i < HPAGE_PMD_NR; i++, address += PAGE_SIZE, page++) {
-		if (!page_mapped(page))
-			continue;
 		if (!is_swap_pte(pte[i]))
 			continue;
 
@@ -3128,6 +3135,9 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page,
 		if (migration_entry_to_page(swp_entry) != page)
 			continue;
 
+		get_page(page);
+		page_add_anon_rmap(page, vma, address, false);
+
 		entry = pte_mkold(mk_pte(page, vma->vm_page_prot));
 		entry = pte_mkdirty(entry);
 		if (is_write_migration_entry(swp_entry))
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-10-29  0:25           ` Kirill A. Shutemov
@ 2015-10-29  7:58             ` Minchan Kim
  2015-10-29  9:43               ` Kirill A. Shutemov
  2015-10-29  9:52               ` Kirill A. Shutemov
  0 siblings, 2 replies; 33+ messages in thread
From: Minchan Kim @ 2015-10-29  7:58 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Sasha Levin, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > Hello Hugh,
> > > 
> > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > 
> > > > > I added the code to check it and queued it again but I had another oops
> > > > > in this time but symptom is related to anon_vma, too.
> > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > > > at that time but second check of page_mapped right before try_to_unmap seems
> > > > > to be true.
> > > > > 
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
> > > > > flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
> > > > 
> > > > That's interesting, that's one I added in my page migration series.
> > > > Let me think on it, but it could well relate to the one you got before.
> > > 
> > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > instead of next-20151021 to remove noise from your migration cleanup
> > > series and will test it again.
> > > If it is fixed, I will test again with your migration patchset, then.
> > 
> > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> > Therefore, there is no patchset from Hugh's migration patch in there.
> > And I added below debug code with request from Kirill to all test kernels.
> 
> It took too long time (and a lot of printk()), but I think I track it down
> finally.
>  
> The patch below seems fixes issue for me. It's not yet properly tested, but
> looks like it works.
> 
> The problem was my wrong assumption on how migration works: I thought that
> kernel would wait migration to finish on before deconstruction mapping.
> 
> But turn out that's not true.
> 
> As result if zap_pte_range() races with split_huge_page(), we can end up
> with page which is not mapped anymore but has _count and _mapcount
> elevated. The page is on LRU too. So it's still reachable by vmscan and by
> pfn scanners (Sasha showed few similar traces from compaction too).
> It's likely that page->mapping in this case would point to freed anon_vma.
> 
> BOOM!
> 
> The patch modify freeze/unfreeze_page() code to match normal migration
> entries logic: on setup we remove page from rmap and drop pin, on removing
> we get pin back and put page on rmap. This way even if migration entry
> will be removed under us we don't corrupt page's state.
> 
> Please, test.
> 

kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I tested
one I sent to you(ie, oops.c + memcg_test.sh)

page:ffffea00016a0000 count:3 mapcount:0 mapping:ffff88007f49d001 index:0x600001800 compound_mapcount: 0
flags: 0x4000000000044009(locked|uptodate|head|swapbacked)
page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
page->mem_cgroup:ffff88007f613c00
------------[ cut here ]------------
kernel BUG at mm/rmap.c:1156!
invalid opcode: 0000 [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 7 PID: 3312 Comm: oops Not tainted 4.3.0-rc5-mm1-madv-free-no-lazy-thp+ #1573
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff8800b8804ec0 ti: ffff88000005c000 task.ti: ffff88000005c000
RIP: 0010:[<ffffffff81128223>]  [<ffffffff81128223>] do_page_add_anon_rmap+0x323/0x360
RSP: 0000:ffff88000005f758  EFLAGS: 00010292
RAX: 0000000000000021 RBX: ffffea00016a0000 RCX: ffffffff81830db8
RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffffff821df4d8
RBP: ffff88000005f780 R08: 0000000000000000 R09: ffff8800000b8be0
R10: ffffffff8163d7c0 R11: 00000000000001a5 R12: ffff88007e85ddc0
R13: 0000600001800000 R14: 0000000000000000 R15: ffff88007e85ddc0
FS:  00007f5cd5fea740(0000) GS:ffff8800bfae0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000600004c03000 CR3: 000000007f017000 CR4: 00000000000006a0
Stack:
 ffff88007f351000 ffff88007f352000 ffffea00016a0000 0000600001800000
 ffff88007e85ddc0 ffff88000005f790 ffffffff81128278 ffff88000005f800
 ffffffff81146dbb 00000006000019ff 0000000600001800 0000160000000000
Call Trace:
 [<ffffffff81128278>] page_add_anon_rmap+0x18/0x20
 [<ffffffff81146dbb>] unfreeze_page+0x24b/0x330
 [<ffffffff8114bb5f>] split_huge_page_to_list+0x3df/0x920
 [<ffffffff811321cf>] ? scan_swap_map+0x37f/0x550
 [<ffffffff8112f996>] add_to_swap+0xb6/0x100
 [<ffffffff81103c87>] shrink_page_list+0x3b7/0xdc0
 [<ffffffff81104d4c>] shrink_inactive_list+0x18c/0x4b0
 [<ffffffff811059af>] shrink_lruvec+0x58f/0x730
 [<ffffffff81105c24>] shrink_zone+0xd4/0x280
 [<ffffffff81105efd>] do_try_to_free_pages+0x12d/0x3b0
 [<ffffffff8110635d>] try_to_free_mem_cgroup_pages+0x9d/0x120
 [<ffffffff8114e2f5>] try_charge+0x175/0x720
 [<ffffffff812728c3>] ? radix_tree_lookup_slot+0x13/0x30
 [<ffffffff810efd6e>] ? find_get_entry+0x1e/0xc0
 [<ffffffff811520c5>] mem_cgroup_try_charge+0x85/0x1d0
 [<ffffffff8111c4d7>] do_swap_page+0xd7/0x5a0
 [<ffffffff8111e203>] handle_mm_fault+0x803/0x1000
 [<ffffffff8106efda>] ? pick_next_task_fair+0x3ba/0x480
 [<ffffffff8105f8c0>] ? finish_task_switch+0x70/0x260
 [<ffffffff81033629>] __do_page_fault+0x189/0x400
 [<ffffffff810338ac>] do_page_fault+0xc/0x10
 [<ffffffff81428842>] page_fault+0x22/0x30




> Not-Yet-Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 5e0fe82a0fae..192b50c7526c 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2934,6 +2934,13 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  
>  	smp_wmb(); /* make pte visible before pmd */
>  	pmd_populate(mm, pmd, pgtable);
> +
> +	if (freeze) {
> +		for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> +			page_remove_rmap(page + i, false);
> +			put_page(page + i);
> +		}
> +	}
>  }
>  
>  void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
> @@ -3079,6 +3086,8 @@ static void freeze_page_vma(struct vm_area_struct *vma, struct page *page,
>  		if (pte_soft_dirty(entry))
>  			swp_pte = pte_swp_mksoft_dirty(swp_pte);
>  		set_pte_at(vma->vm_mm, address, pte + i, swp_pte);
> +		page_remove_rmap(page, false);
> +		put_page(page);
>  	}
>  	pte_unmap_unlock(pte, ptl);
>  }
> @@ -3117,8 +3126,6 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page,
>  		return;
>  	pte = pte_offset_map_lock(vma->vm_mm, pmd, address, &ptl);
>  	for (i = 0; i < HPAGE_PMD_NR; i++, address += PAGE_SIZE, page++) {
> -		if (!page_mapped(page))
> -			continue;
>  		if (!is_swap_pte(pte[i]))
>  			continue;
>  
> @@ -3128,6 +3135,9 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page,
>  		if (migration_entry_to_page(swp_entry) != page)
>  			continue;
>  
> +		get_page(page);
> +		page_add_anon_rmap(page, vma, address, false);
> +
>  		entry = pte_mkold(mk_pte(page, vma->vm_page_prot));
>  		entry = pte_mkdirty(entry);
>  		if (is_write_migration_entry(swp_entry))
> -- 
>  Kirill A. Shutemov
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-10-29  7:58             ` Minchan Kim
@ 2015-10-29  9:43               ` Kirill A. Shutemov
  2015-10-29  9:52               ` Kirill A. Shutemov
  1 sibling, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2015-10-29  9:43 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Hugh Dickins, Sasha Levin, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > Hello Hugh,
> > > > 
> > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > 
> > > > > > I added the code to check it and queued it again but I had another oops
> > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > > > > at that time but second check of page_mapped right before try_to_unmap seems
> > > > > > to be true.
> > > > > > 
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
> > > > > > flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
> > > > > 
> > > > > That's interesting, that's one I added in my page migration series.
> > > > > Let me think on it, but it could well relate to the one you got before.
> > > > 
> > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > series and will test it again.
> > > > If it is fixed, I will test again with your migration patchset, then.
> > > 
> > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > And I added below debug code with request from Kirill to all test kernels.
> > 
> > It took too long time (and a lot of printk()), but I think I track it down
> > finally.
> >  
> > The patch below seems fixes issue for me. It's not yet properly tested, but
> > looks like it works.
> > 
> > The problem was my wrong assumption on how migration works: I thought that
> > kernel would wait migration to finish on before deconstruction mapping.
> > 
> > But turn out that's not true.
> > 
> > As result if zap_pte_range() races with split_huge_page(), we can end up
> > with page which is not mapped anymore but has _count and _mapcount
> > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > pfn scanners (Sasha showed few similar traces from compaction too).
> > It's likely that page->mapping in this case would point to freed anon_vma.
> > 
> > BOOM!
> > 
> > The patch modify freeze/unfreeze_page() code to match normal migration
> > entries logic: on setup we remove page from rmap and drop pin, on removing
> > we get pin back and put page on rmap. This way even if migration entry
> > will be removed under us we don't corrupt page's state.
> > 
> > Please, test.
> > 
> 
> kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I tested
> one I sent to you(ie, oops.c + memcg_test.sh)
> 
> page:ffffea00016a0000 count:3 mapcount:0 mapping:ffff88007f49d001 index:0x600001800 compound_mapcount: 0
> flags: 0x4000000000044009(locked|uptodate|head|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))

The VM_BUG_ON_PAGE() is bogus after the patch. Just drop it.

> page->mem_cgroup:ffff88007f613c00
> ------------[ cut here ]------------
> kernel BUG at mm/rmap.c:1156!
> invalid opcode: 0000 [#1] SMP 
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 7 PID: 3312 Comm: oops Not tainted 4.3.0-rc5-mm1-madv-free-no-lazy-thp+ #1573
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: ffff8800b8804ec0 ti: ffff88000005c000 task.ti: ffff88000005c000
> RIP: 0010:[<ffffffff81128223>]  [<ffffffff81128223>] do_page_add_anon_rmap+0x323/0x360
> RSP: 0000:ffff88000005f758  EFLAGS: 00010292
> RAX: 0000000000000021 RBX: ffffea00016a0000 RCX: ffffffff81830db8
> RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffffff821df4d8
> RBP: ffff88000005f780 R08: 0000000000000000 R09: ffff8800000b8be0
> R10: ffffffff8163d7c0 R11: 00000000000001a5 R12: ffff88007e85ddc0
> R13: 0000600001800000 R14: 0000000000000000 R15: ffff88007e85ddc0
> FS:  00007f5cd5fea740(0000) GS:ffff8800bfae0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000600004c03000 CR3: 000000007f017000 CR4: 00000000000006a0
> Stack:
>  ffff88007f351000 ffff88007f352000 ffffea00016a0000 0000600001800000
>  ffff88007e85ddc0 ffff88000005f790 ffffffff81128278 ffff88000005f800
>  ffffffff81146dbb 00000006000019ff 0000000600001800 0000160000000000
> Call Trace:
>  [<ffffffff81128278>] page_add_anon_rmap+0x18/0x20
>  [<ffffffff81146dbb>] unfreeze_page+0x24b/0x330
>  [<ffffffff8114bb5f>] split_huge_page_to_list+0x3df/0x920
>  [<ffffffff811321cf>] ? scan_swap_map+0x37f/0x550
>  [<ffffffff8112f996>] add_to_swap+0xb6/0x100
>  [<ffffffff81103c87>] shrink_page_list+0x3b7/0xdc0
>  [<ffffffff81104d4c>] shrink_inactive_list+0x18c/0x4b0
>  [<ffffffff811059af>] shrink_lruvec+0x58f/0x730
>  [<ffffffff81105c24>] shrink_zone+0xd4/0x280
>  [<ffffffff81105efd>] do_try_to_free_pages+0x12d/0x3b0
>  [<ffffffff8110635d>] try_to_free_mem_cgroup_pages+0x9d/0x120
>  [<ffffffff8114e2f5>] try_charge+0x175/0x720
>  [<ffffffff812728c3>] ? radix_tree_lookup_slot+0x13/0x30
>  [<ffffffff810efd6e>] ? find_get_entry+0x1e/0xc0
>  [<ffffffff811520c5>] mem_cgroup_try_charge+0x85/0x1d0
>  [<ffffffff8111c4d7>] do_swap_page+0xd7/0x5a0
>  [<ffffffff8111e203>] handle_mm_fault+0x803/0x1000
>  [<ffffffff8106efda>] ? pick_next_task_fair+0x3ba/0x480
>  [<ffffffff8105f8c0>] ? finish_task_switch+0x70/0x260
>  [<ffffffff81033629>] __do_page_fault+0x189/0x400
>  [<ffffffff810338ac>] do_page_fault+0xc/0x10
>  [<ffffffff81428842>] page_fault+0x22/0x30
> 
> 
> 
> 
> > Not-Yet-Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > 
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 5e0fe82a0fae..192b50c7526c 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2934,6 +2934,13 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> >  
> >  	smp_wmb(); /* make pte visible before pmd */
> >  	pmd_populate(mm, pmd, pgtable);
> > +
> > +	if (freeze) {
> > +		for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> > +			page_remove_rmap(page + i, false);
> > +			put_page(page + i);
> > +		}
> > +	}
> >  }
> >  
> >  void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
> > @@ -3079,6 +3086,8 @@ static void freeze_page_vma(struct vm_area_struct *vma, struct page *page,
> >  		if (pte_soft_dirty(entry))
> >  			swp_pte = pte_swp_mksoft_dirty(swp_pte);
> >  		set_pte_at(vma->vm_mm, address, pte + i, swp_pte);
> > +		page_remove_rmap(page, false);
> > +		put_page(page);
> >  	}
> >  	pte_unmap_unlock(pte, ptl);
> >  }
> > @@ -3117,8 +3126,6 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page,
> >  		return;
> >  	pte = pte_offset_map_lock(vma->vm_mm, pmd, address, &ptl);
> >  	for (i = 0; i < HPAGE_PMD_NR; i++, address += PAGE_SIZE, page++) {
> > -		if (!page_mapped(page))
> > -			continue;
> >  		if (!is_swap_pte(pte[i]))
> >  			continue;
> >  
> > @@ -3128,6 +3135,9 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page,
> >  		if (migration_entry_to_page(swp_entry) != page)
> >  			continue;
> >  
> > +		get_page(page);
> > +		page_add_anon_rmap(page, vma, address, false);
> > +
> >  		entry = pte_mkold(mk_pte(page, vma->vm_page_prot));
> >  		entry = pte_mkdirty(entry);
> >  		if (is_write_migration_entry(swp_entry))
> > -- 
> >  Kirill A. Shutemov
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-10-29  7:58             ` Minchan Kim
  2015-10-29  9:43               ` Kirill A. Shutemov
@ 2015-10-29  9:52               ` Kirill A. Shutemov
  2015-10-30  7:03                 ` Minchan Kim
  1 sibling, 1 reply; 33+ messages in thread
From: Kirill A. Shutemov @ 2015-10-29  9:52 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Hugh Dickins, Sasha Levin, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > Hello Hugh,
> > > > 
> > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > 
> > > > > > I added the code to check it and queued it again but I had another oops
> > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > > > > at that time but second check of page_mapped right before try_to_unmap seems
> > > > > > to be true.
> > > > > > 
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
> > > > > > flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
> > > > > 
> > > > > That's interesting, that's one I added in my page migration series.
> > > > > Let me think on it, but it could well relate to the one you got before.
> > > > 
> > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > series and will test it again.
> > > > If it is fixed, I will test again with your migration patchset, then.
> > > 
> > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > And I added below debug code with request from Kirill to all test kernels.
> > 
> > It took too long time (and a lot of printk()), but I think I track it down
> > finally.
> >  
> > The patch below seems fixes issue for me. It's not yet properly tested, but
> > looks like it works.
> > 
> > The problem was my wrong assumption on how migration works: I thought that
> > kernel would wait migration to finish on before deconstruction mapping.
> > 
> > But turn out that's not true.
> > 
> > As result if zap_pte_range() races with split_huge_page(), we can end up
> > with page which is not mapped anymore but has _count and _mapcount
> > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > pfn scanners (Sasha showed few similar traces from compaction too).
> > It's likely that page->mapping in this case would point to freed anon_vma.
> > 
> > BOOM!
> > 
> > The patch modify freeze/unfreeze_page() code to match normal migration
> > entries logic: on setup we remove page from rmap and drop pin, on removing
> > we get pin back and put page on rmap. This way even if migration entry
> > will be removed under us we don't corrupt page's state.
> > 
> > Please, test.
> > 
> 
> kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I tested
> one I sent to you(ie, oops.c + memcg_test.sh)
> 
> page:ffffea00016a0000 count:3 mapcount:0 mapping:ffff88007f49d001 index:0x600001800 compound_mapcount: 0
> flags: 0x4000000000044009(locked|uptodate|head|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> page->mem_cgroup:ffff88007f613c00

Ignore my previous answer. Still sleeping.

The right way to fix I think is something like:

diff --git a/mm/rmap.c b/mm/rmap.c
index 35643176bc15..f2d46792a554 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
 	bool compound = flags & RMAP_COMPOUND;
 	bool first;
 
-	if (PageTransCompound(page)) {
+	if (PageTransCompound(page) && compound) {
+		atomic_t *mapcount;
 		VM_BUG_ON_PAGE(!PageLocked(page), page);
-		if (compound) {
-			atomic_t *mapcount;
-
-			VM_BUG_ON_PAGE(!PageTransHuge(page), page);
-			mapcount = compound_mapcount_ptr(page);
-			first = atomic_inc_and_test(mapcount);
-		} else {
-			/* Anon THP always mapped first with PMD */
-			first = 0;
-			VM_BUG_ON_PAGE(!page_mapcount(page), page);
-			atomic_inc(&page->_mapcount);
-		}
+		VM_BUG_ON_PAGE(!PageTransHuge(page), page);
+		mapcount = compound_mapcount_ptr(page);
+		first = atomic_inc_and_test(mapcount);
 	} else {
 		VM_BUG_ON_PAGE(compound, page);
 		first = atomic_inc_and_test(&page->_mapcount);
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-10-29  9:52               ` Kirill A. Shutemov
@ 2015-10-30  7:03                 ` Minchan Kim
  2015-11-02 12:57                   ` Kirill A. Shutemov
  0 siblings, 1 reply; 33+ messages in thread
From: Minchan Kim @ 2015-10-30  7:03 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Sasha Levin, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > Hello Hugh,
> > > > > 
> > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > 
> > > > > > > I added the code to check it and queued it again but I had another oops
> > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > > > > > at that time but second check of page_mapped right before try_to_unmap seems
> > > > > > > to be true.
> > > > > > > 
> > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
> > > > > > > flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
> > > > > > 
> > > > > > That's interesting, that's one I added in my page migration series.
> > > > > > Let me think on it, but it could well relate to the one you got before.
> > > > > 
> > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > > series and will test it again.
> > > > > If it is fixed, I will test again with your migration patchset, then.
> > > > 
> > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> > > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > > And I added below debug code with request from Kirill to all test kernels.
> > > 
> > > It took too long time (and a lot of printk()), but I think I track it down
> > > finally.
> > >  
> > > The patch below seems fixes issue for me. It's not yet properly tested, but
> > > looks like it works.
> > > 
> > > The problem was my wrong assumption on how migration works: I thought that
> > > kernel would wait migration to finish on before deconstruction mapping.
> > > 
> > > But turn out that's not true.
> > > 
> > > As result if zap_pte_range() races with split_huge_page(), we can end up
> > > with page which is not mapped anymore but has _count and _mapcount
> > > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > It's likely that page->mapping in this case would point to freed anon_vma.
> > > 
> > > BOOM!
> > > 
> > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > entries logic: on setup we remove page from rmap and drop pin, on removing
> > > we get pin back and put page on rmap. This way even if migration entry
> > > will be removed under us we don't corrupt page's state.
> > > 
> > > Please, test.
> > > 
> > 
> > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I tested
> > one I sent to you(ie, oops.c + memcg_test.sh)
> > 
> > page:ffffea00016a0000 count:3 mapcount:0 mapping:ffff88007f49d001 index:0x600001800 compound_mapcount: 0
> > flags: 0x4000000000044009(locked|uptodate|head|swapbacked)
> > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > page->mem_cgroup:ffff88007f613c00
> 
> Ignore my previous answer. Still sleeping.
> 
> The right way to fix I think is something like:
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 35643176bc15..f2d46792a554 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
>  	bool compound = flags & RMAP_COMPOUND;
>  	bool first;
>  
> -	if (PageTransCompound(page)) {
> +	if (PageTransCompound(page) && compound) {
> +		atomic_t *mapcount;
>  		VM_BUG_ON_PAGE(!PageLocked(page), page);
> -		if (compound) {
> -			atomic_t *mapcount;
> -
> -			VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> -			mapcount = compound_mapcount_ptr(page);
> -			first = atomic_inc_and_test(mapcount);
> -		} else {
> -			/* Anon THP always mapped first with PMD */
> -			first = 0;
> -			VM_BUG_ON_PAGE(!page_mapcount(page), page);
> -			atomic_inc(&page->_mapcount);
> -		}
> +		VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> +		mapcount = compound_mapcount_ptr(page);
> +		first = atomic_inc_and_test(mapcount);
>  	} else {
>  		VM_BUG_ON_PAGE(compound, page);
>  		first = atomic_inc_and_test(&page->_mapcount);
> -- 

kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + freeze/unfreeze patch + above patch,

Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS

<SNIP>

Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:ffff880046980700 idx:1 val:511
BUG: Bad rss-counter state mm:ffff880046980700 idx:2 val:1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-10-30  7:03                 ` Minchan Kim
@ 2015-11-02 12:57                   ` Kirill A. Shutemov
  2015-11-03  3:02                     ` Minchan Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Kirill A. Shutemov @ 2015-11-02 12:57 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Hugh Dickins, Sasha Levin, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > Hello Hugh,
> > > > > > 
> > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > 
> > > > > > > > I added the code to check it and queued it again but I had another oops
> > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > > > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > > > > > > at that time but second check of page_mapped right before try_to_unmap seems
> > > > > > > > to be true.
> > > > > > > > 
> > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
> > > > > > > > flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
> > > > > > > 
> > > > > > > That's interesting, that's one I added in my page migration series.
> > > > > > > Let me think on it, but it could well relate to the one you got before.
> > > > > > 
> > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > > > series and will test it again.
> > > > > > If it is fixed, I will test again with your migration patchset, then.
> > > > > 
> > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> > > > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > > > And I added below debug code with request from Kirill to all test kernels.
> > > > 
> > > > It took too long time (and a lot of printk()), but I think I track it down
> > > > finally.
> > > >  
> > > > The patch below seems fixes issue for me. It's not yet properly tested, but
> > > > looks like it works.
> > > > 
> > > > The problem was my wrong assumption on how migration works: I thought that
> > > > kernel would wait migration to finish on before deconstruction mapping.
> > > > 
> > > > But turn out that's not true.
> > > > 
> > > > As result if zap_pte_range() races with split_huge_page(), we can end up
> > > > with page which is not mapped anymore but has _count and _mapcount
> > > > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > > It's likely that page->mapping in this case would point to freed anon_vma.
> > > > 
> > > > BOOM!
> > > > 
> > > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > > entries logic: on setup we remove page from rmap and drop pin, on removing
> > > > we get pin back and put page on rmap. This way even if migration entry
> > > > will be removed under us we don't corrupt page's state.
> > > > 
> > > > Please, test.
> > > > 
> > > 
> > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I tested
> > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > 
> > > page:ffffea00016a0000 count:3 mapcount:0 mapping:ffff88007f49d001 index:0x600001800 compound_mapcount: 0
> > > flags: 0x4000000000044009(locked|uptodate|head|swapbacked)
> > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > > page->mem_cgroup:ffff88007f613c00
> > 
> > Ignore my previous answer. Still sleeping.
> > 
> > The right way to fix I think is something like:
> > 
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 35643176bc15..f2d46792a554 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
> >  	bool compound = flags & RMAP_COMPOUND;
> >  	bool first;
> >  
> > -	if (PageTransCompound(page)) {
> > +	if (PageTransCompound(page) && compound) {
> > +		atomic_t *mapcount;
> >  		VM_BUG_ON_PAGE(!PageLocked(page), page);
> > -		if (compound) {
> > -			atomic_t *mapcount;
> > -
> > -			VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > -			mapcount = compound_mapcount_ptr(page);
> > -			first = atomic_inc_and_test(mapcount);
> > -		} else {
> > -			/* Anon THP always mapped first with PMD */
> > -			first = 0;
> > -			VM_BUG_ON_PAGE(!page_mapcount(page), page);
> > -			atomic_inc(&page->_mapcount);
> > -		}
> > +		VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > +		mapcount = compound_mapcount_ptr(page);
> > +		first = atomic_inc_and_test(mapcount);
> >  	} else {
> >  		VM_BUG_ON_PAGE(compound, page);
> >  		first = atomic_inc_and_test(&page->_mapcount);
> > -- 
> 
> kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + freeze/unfreeze patch + above patch,
> 
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> 
> <SNIP>
> 
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> BUG: Bad rss-counter state mm:ffff880046980700 idx:1 val:511
> BUG: Bad rss-counter state mm:ffff880046980700 idx:2 val:1

Hm. I was not able to trigger this and don't see anything obviuous what can
lead to this kind of missmatch :-/

I found one more bug: clearing of PageTail can be visible to other CPUs
before updated page->flags on the page.

I don't think this bug is connected to what you've reported, but worth
testing.

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5e0fe82a0fae..12bd8c5a4409 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2934,6 +2934,13 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 
 	smp_wmb(); /* make pte visible before pmd */
 	pmd_populate(mm, pmd, pgtable);
+
+	if (freeze) {
+		for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
+			page_remove_rmap(page + i, false);
+			put_page(page + i);
+		}
+	}
 }
 
 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
@@ -3079,6 +3086,8 @@ static void freeze_page_vma(struct vm_area_struct *vma, struct page *page,
 		if (pte_soft_dirty(entry))
 			swp_pte = pte_swp_mksoft_dirty(swp_pte);
 		set_pte_at(vma->vm_mm, address, pte + i, swp_pte);
+		page_remove_rmap(page, false);
+		put_page(page);
 	}
 	pte_unmap_unlock(pte, ptl);
 }
@@ -3117,8 +3126,6 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page,
 		return;
 	pte = pte_offset_map_lock(vma->vm_mm, pmd, address, &ptl);
 	for (i = 0; i < HPAGE_PMD_NR; i++, address += PAGE_SIZE, page++) {
-		if (!page_mapped(page))
-			continue;
 		if (!is_swap_pte(pte[i]))
 			continue;
 
@@ -3128,6 +3135,9 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page,
 		if (migration_entry_to_page(swp_entry) != page)
 			continue;
 
+		get_page(page);
+		page_add_anon_rmap(page, vma, address, false);
+
 		entry = pte_mkold(mk_pte(page, vma->vm_page_prot));
 		entry = pte_mkdirty(entry);
 		if (is_write_migration_entry(swp_entry))
@@ -3181,8 +3191,6 @@ static int __split_huge_page_tail(struct page *head, int tail,
 	 */
 	atomic_add(mapcount + 1, &page_tail->_count);
 
-	/* after clearing PageTail the gup refcount can be released */
-	smp_mb__after_atomic();
 
 	page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
 	page_tail->flags |= (head->flags &
@@ -3195,6 +3203,12 @@ static int __split_huge_page_tail(struct page *head, int tail,
 			 (1L << PG_unevictable)));
 	page_tail->flags |= (1L << PG_dirty);
 
+	/*
+	 * After clearing PageTail the gup refcount can be released.
+	 * Page flags also must be visible before we make the page non-compound.
+	 */
+	smp_wmb();
+
 	clear_compound_head(page_tail);
 
 	if (page_is_young(head))
diff --git a/mm/rmap.c b/mm/rmap.c
index 35643176bc15..e4f8d9fb1c3d 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
 	bool compound = flags & RMAP_COMPOUND;
 	bool first;
 
-	if (PageTransCompound(page)) {
+	if (compound) {
+		atomic_t *mapcount;
 		VM_BUG_ON_PAGE(!PageLocked(page), page);
-		if (compound) {
-			atomic_t *mapcount;
-
-			VM_BUG_ON_PAGE(!PageTransHuge(page), page);
-			mapcount = compound_mapcount_ptr(page);
-			first = atomic_inc_and_test(mapcount);
-		} else {
-			/* Anon THP always mapped first with PMD */
-			first = 0;
-			VM_BUG_ON_PAGE(!page_mapcount(page), page);
-			atomic_inc(&page->_mapcount);
-		}
+		VM_BUG_ON_PAGE(!PageTransHuge(page), page);
+		mapcount = compound_mapcount_ptr(page);
+		first = atomic_inc_and_test(mapcount);
 	} else {
 		VM_BUG_ON_PAGE(compound, page);
 		first = atomic_inc_and_test(&page->_mapcount);
@@ -1201,7 +1193,6 @@ void do_page_add_anon_rmap(struct page *page,
 		 * disabled.
 		 */
 		if (compound) {
-			VM_BUG_ON_PAGE(!PageTransHuge(page), page);
 			__inc_zone_page_state(page,
 					      NR_ANON_TRANSPARENT_HUGEPAGES);
 		}
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-11-02 12:57                   ` Kirill A. Shutemov
@ 2015-11-03  3:02                     ` Minchan Kim
  2015-11-03  7:16                       ` Kirill A. Shutemov
  0 siblings, 1 reply; 33+ messages in thread
From: Minchan Kim @ 2015-11-03  3:02 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Sasha Levin, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

Hello Kirill,

On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > Hello Hugh,
> > > > > > > 
> > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > 
> > > > > > > > > I added the code to check it and queued it again but I had another oops
> > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > > > > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > > > > > > > at that time but second check of page_mapped right before try_to_unmap seems
> > > > > > > > > to be true.
> > > > > > > > > 
> > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
> > > > > > > > > flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
> > > > > > > > 
> > > > > > > > That's interesting, that's one I added in my page migration series.
> > > > > > > > Let me think on it, but it could well relate to the one you got before.
> > > > > > > 
> > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > > > > series and will test it again.
> > > > > > > If it is fixed, I will test again with your migration patchset, then.
> > > > > > 
> > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> > > > > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > > > > And I added below debug code with request from Kirill to all test kernels.
> > > > > 
> > > > > It took too long time (and a lot of printk()), but I think I track it down
> > > > > finally.
> > > > >  
> > > > > The patch below seems fixes issue for me. It's not yet properly tested, but
> > > > > looks like it works.
> > > > > 
> > > > > The problem was my wrong assumption on how migration works: I thought that
> > > > > kernel would wait migration to finish on before deconstruction mapping.
> > > > > 
> > > > > But turn out that's not true.
> > > > > 
> > > > > As result if zap_pte_range() races with split_huge_page(), we can end up
> > > > > with page which is not mapped anymore but has _count and _mapcount
> > > > > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > > > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > > > It's likely that page->mapping in this case would point to freed anon_vma.
> > > > > 
> > > > > BOOM!
> > > > > 
> > > > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > > > entries logic: on setup we remove page from rmap and drop pin, on removing
> > > > > we get pin back and put page on rmap. This way even if migration entry
> > > > > will be removed under us we don't corrupt page's state.
> > > > > 
> > > > > Please, test.
> > > > > 
> > > > 
> > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I tested
> > > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > > 
> > > > page:ffffea00016a0000 count:3 mapcount:0 mapping:ffff88007f49d001 index:0x600001800 compound_mapcount: 0
> > > > flags: 0x4000000000044009(locked|uptodate|head|swapbacked)
> > > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > > > page->mem_cgroup:ffff88007f613c00
> > > 
> > > Ignore my previous answer. Still sleeping.
> > > 
> > > The right way to fix I think is something like:
> > > 
> > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > index 35643176bc15..f2d46792a554 100644
> > > --- a/mm/rmap.c
> > > +++ b/mm/rmap.c
> > > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
> > >  	bool compound = flags & RMAP_COMPOUND;
> > >  	bool first;
> > >  
> > > -	if (PageTransCompound(page)) {
> > > +	if (PageTransCompound(page) && compound) {
> > > +		atomic_t *mapcount;
> > >  		VM_BUG_ON_PAGE(!PageLocked(page), page);
> > > -		if (compound) {
> > > -			atomic_t *mapcount;
> > > -
> > > -			VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > > -			mapcount = compound_mapcount_ptr(page);
> > > -			first = atomic_inc_and_test(mapcount);
> > > -		} else {
> > > -			/* Anon THP always mapped first with PMD */
> > > -			first = 0;
> > > -			VM_BUG_ON_PAGE(!page_mapcount(page), page);
> > > -			atomic_inc(&page->_mapcount);
> > > -		}
> > > +		VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > > +		mapcount = compound_mapcount_ptr(page);
> > > +		first = atomic_inc_and_test(mapcount);
> > >  	} else {
> > >  		VM_BUG_ON_PAGE(compound, page);
> > >  		first = atomic_inc_and_test(&page->_mapcount);
> > > -- 
> > 
> > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + freeze/unfreeze patch + above patch,
> > 
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > 
> > <SNIP>
> > 
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > BUG: Bad rss-counter state mm:ffff880046980700 idx:1 val:511
> > BUG: Bad rss-counter state mm:ffff880046980700 idx:2 val:1
> 
> Hm. I was not able to trigger this and don't see anything obviuous what can
> lead to this kind of missmatch :-/
> 
> I found one more bug: clearing of PageTail can be visible to other CPUs
> before updated page->flags on the page.
> 
> I don't think this bug is connected to what you've reported, but worth
> testing.

I'm happy to test but I ask one thing.
I hope you send new formal all-on-one patch instead of code snippets.
It can help to test/communicate easy and others understands current
issues and your approaches.

And please say what kernel your patch based on.

Thanks.

> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 5e0fe82a0fae..12bd8c5a4409 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2934,6 +2934,13 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  
>  	smp_wmb(); /* make pte visible before pmd */
>  	pmd_populate(mm, pmd, pgtable);
> +
> +	if (freeze) {
> +		for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> +			page_remove_rmap(page + i, false);
> +			put_page(page + i);
> +		}
> +	}
>  }
>  
>  void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
> @@ -3079,6 +3086,8 @@ static void freeze_page_vma(struct vm_area_struct *vma, struct page *page,
>  		if (pte_soft_dirty(entry))
>  			swp_pte = pte_swp_mksoft_dirty(swp_pte);
>  		set_pte_at(vma->vm_mm, address, pte + i, swp_pte);
> +		page_remove_rmap(page, false);
> +		put_page(page);
>  	}
>  	pte_unmap_unlock(pte, ptl);
>  }
> @@ -3117,8 +3126,6 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page,
>  		return;
>  	pte = pte_offset_map_lock(vma->vm_mm, pmd, address, &ptl);
>  	for (i = 0; i < HPAGE_PMD_NR; i++, address += PAGE_SIZE, page++) {
> -		if (!page_mapped(page))
> -			continue;
>  		if (!is_swap_pte(pte[i]))
>  			continue;
>  
> @@ -3128,6 +3135,9 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page,
>  		if (migration_entry_to_page(swp_entry) != page)
>  			continue;
>  
> +		get_page(page);
> +		page_add_anon_rmap(page, vma, address, false);
> +
>  		entry = pte_mkold(mk_pte(page, vma->vm_page_prot));
>  		entry = pte_mkdirty(entry);
>  		if (is_write_migration_entry(swp_entry))
> @@ -3181,8 +3191,6 @@ static int __split_huge_page_tail(struct page *head, int tail,
>  	 */
>  	atomic_add(mapcount + 1, &page_tail->_count);
>  
> -	/* after clearing PageTail the gup refcount can be released */
> -	smp_mb__after_atomic();
>  
>  	page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
>  	page_tail->flags |= (head->flags &
> @@ -3195,6 +3203,12 @@ static int __split_huge_page_tail(struct page *head, int tail,
>  			 (1L << PG_unevictable)));
>  	page_tail->flags |= (1L << PG_dirty);
>  
> +	/*
> +	 * After clearing PageTail the gup refcount can be released.
> +	 * Page flags also must be visible before we make the page non-compound.
> +	 */
> +	smp_wmb();
> +
>  	clear_compound_head(page_tail);
>  
>  	if (page_is_young(head))
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 35643176bc15..e4f8d9fb1c3d 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
>  	bool compound = flags & RMAP_COMPOUND;
>  	bool first;
>  
> -	if (PageTransCompound(page)) {
> +	if (compound) {
> +		atomic_t *mapcount;
>  		VM_BUG_ON_PAGE(!PageLocked(page), page);
> -		if (compound) {
> -			atomic_t *mapcount;
> -
> -			VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> -			mapcount = compound_mapcount_ptr(page);
> -			first = atomic_inc_and_test(mapcount);
> -		} else {
> -			/* Anon THP always mapped first with PMD */
> -			first = 0;
> -			VM_BUG_ON_PAGE(!page_mapcount(page), page);
> -			atomic_inc(&page->_mapcount);
> -		}
> +		VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> +		mapcount = compound_mapcount_ptr(page);
> +		first = atomic_inc_and_test(mapcount);
>  	} else {
>  		VM_BUG_ON_PAGE(compound, page);
>  		first = atomic_inc_and_test(&page->_mapcount);
> @@ -1201,7 +1193,6 @@ void do_page_add_anon_rmap(struct page *page,
>  		 * disabled.
>  		 */
>  		if (compound) {
> -			VM_BUG_ON_PAGE(!PageTransHuge(page), page);
>  			__inc_zone_page_state(page,
>  					      NR_ANON_TRANSPARENT_HUGEPAGES);
>  		}
> -- 
>  Kirill A. Shutemov
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-11-03  3:02                     ` Minchan Kim
@ 2015-11-03  7:16                       ` Kirill A. Shutemov
  2015-11-03  7:33                         ` Minchan Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Kirill A. Shutemov @ 2015-11-03  7:16 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Hugh Dickins, Sasha Levin, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> Hello Kirill,
> 
> On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > Hello Hugh,
> > > > > > > > 
> > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > 
> > > > > > > > > > I added the code to check it and queued it again but I had another oops
> > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > > > > > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > > > > > > > > at that time but second check of page_mapped right before try_to_unmap seems
> > > > > > > > > > to be true.
> > > > > > > > > > 
> > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > > page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
> > > > > > > > > > flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
> > > > > > > > > 
> > > > > > > > > That's interesting, that's one I added in my page migration series.
> > > > > > > > > Let me think on it, but it could well relate to the one you got before.
> > > > > > > > 
> > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > > > > > series and will test it again.
> > > > > > > > If it is fixed, I will test again with your migration patchset, then.
> > > > > > > 
> > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> > > > > > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > > > > > And I added below debug code with request from Kirill to all test kernels.
> > > > > > 
> > > > > > It took too long time (and a lot of printk()), but I think I track it down
> > > > > > finally.
> > > > > >  
> > > > > > The patch below seems fixes issue for me. It's not yet properly tested, but
> > > > > > looks like it works.
> > > > > > 
> > > > > > The problem was my wrong assumption on how migration works: I thought that
> > > > > > kernel would wait migration to finish on before deconstruction mapping.
> > > > > > 
> > > > > > But turn out that's not true.
> > > > > > 
> > > > > > As result if zap_pte_range() races with split_huge_page(), we can end up
> > > > > > with page which is not mapped anymore but has _count and _mapcount
> > > > > > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > > > > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > > > > It's likely that page->mapping in this case would point to freed anon_vma.
> > > > > > 
> > > > > > BOOM!
> > > > > > 
> > > > > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > > > > entries logic: on setup we remove page from rmap and drop pin, on removing
> > > > > > we get pin back and put page on rmap. This way even if migration entry
> > > > > > will be removed under us we don't corrupt page's state.
> > > > > > 
> > > > > > Please, test.
> > > > > > 
> > > > > 
> > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I tested
> > > > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > > > 
> > > > > page:ffffea00016a0000 count:3 mapcount:0 mapping:ffff88007f49d001 index:0x600001800 compound_mapcount: 0
> > > > > flags: 0x4000000000044009(locked|uptodate|head|swapbacked)
> > > > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > > > > page->mem_cgroup:ffff88007f613c00
> > > > 
> > > > Ignore my previous answer. Still sleeping.
> > > > 
> > > > The right way to fix I think is something like:
> > > > 
> > > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > > index 35643176bc15..f2d46792a554 100644
> > > > --- a/mm/rmap.c
> > > > +++ b/mm/rmap.c
> > > > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
> > > >  	bool compound = flags & RMAP_COMPOUND;
> > > >  	bool first;
> > > >  
> > > > -	if (PageTransCompound(page)) {
> > > > +	if (PageTransCompound(page) && compound) {
> > > > +		atomic_t *mapcount;
> > > >  		VM_BUG_ON_PAGE(!PageLocked(page), page);
> > > > -		if (compound) {
> > > > -			atomic_t *mapcount;
> > > > -
> > > > -			VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > > > -			mapcount = compound_mapcount_ptr(page);
> > > > -			first = atomic_inc_and_test(mapcount);
> > > > -		} else {
> > > > -			/* Anon THP always mapped first with PMD */
> > > > -			first = 0;
> > > > -			VM_BUG_ON_PAGE(!page_mapcount(page), page);
> > > > -			atomic_inc(&page->_mapcount);
> > > > -		}
> > > > +		VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > > > +		mapcount = compound_mapcount_ptr(page);
> > > > +		first = atomic_inc_and_test(mapcount);
> > > >  	} else {
> > > >  		VM_BUG_ON_PAGE(compound, page);
> > > >  		first = atomic_inc_and_test(&page->_mapcount);
> > > > -- 
> > > 
> > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + freeze/unfreeze patch + above patch,
> > > 
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > 
> > > <SNIP>
> > > 
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > BUG: Bad rss-counter state mm:ffff880046980700 idx:1 val:511
> > > BUG: Bad rss-counter state mm:ffff880046980700 idx:2 val:1
> > 
> > Hm. I was not able to trigger this and don't see anything obviuous what can
> > lead to this kind of missmatch :-/

I managed to trigger this when switched back from MADV_DONTNEED to
MADV_FREE. Hm..

> > I found one more bug: clearing of PageTail can be visible to other CPUs
> > before updated page->flags on the page.
> > 
> > I don't think this bug is connected to what you've reported, but worth
> > testing.
> 
> I'm happy to test but I ask one thing.
> I hope you send new formal all-on-one patch instead of code snippets.
> It can help to test/communicate easy and others understands current
> issues and your approaches.

I'll post patchset with refcounting fixes today.

> And please say what kernel your patch based on.

That's on top of

https://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git since-4.2

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-11-03  7:16                       ` Kirill A. Shutemov
@ 2015-11-03  7:33                         ` Minchan Kim
  2015-11-03 15:20                           ` Minchan Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Minchan Kim @ 2015-11-03  7:33 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Sasha Levin, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > Hello Kirill,
> > 
> > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > > Hello Hugh,
> > > > > > > > > 
> > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > 
> > > > > > > > > > > I added the code to check it and queued it again but I had another oops
> > > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > > > > > > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > > > > > > > > > at that time but second check of page_mapped right before try_to_unmap seems
> > > > > > > > > > > to be true.
> > > > > > > > > > > 
> > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > > > page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
> > > > > > > > > > > flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
> > > > > > > > > > 
> > > > > > > > > > That's interesting, that's one I added in my page migration series.
> > > > > > > > > > Let me think on it, but it could well relate to the one you got before.
> > > > > > > > > 
> > > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > > > > > > series and will test it again.
> > > > > > > > > If it is fixed, I will test again with your migration patchset, then.
> > > > > > > > 
> > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> > > > > > > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > > > > > > And I added below debug code with request from Kirill to all test kernels.
> > > > > > > 
> > > > > > > It took too long time (and a lot of printk()), but I think I track it down
> > > > > > > finally.
> > > > > > >  
> > > > > > > The patch below seems fixes issue for me. It's not yet properly tested, but
> > > > > > > looks like it works.
> > > > > > > 
> > > > > > > The problem was my wrong assumption on how migration works: I thought that
> > > > > > > kernel would wait migration to finish on before deconstruction mapping.
> > > > > > > 
> > > > > > > But turn out that's not true.
> > > > > > > 
> > > > > > > As result if zap_pte_range() races with split_huge_page(), we can end up
> > > > > > > with page which is not mapped anymore but has _count and _mapcount
> > > > > > > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > > > > > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > > > > > It's likely that page->mapping in this case would point to freed anon_vma.
> > > > > > > 
> > > > > > > BOOM!
> > > > > > > 
> > > > > > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > > > > > entries logic: on setup we remove page from rmap and drop pin, on removing
> > > > > > > we get pin back and put page on rmap. This way even if migration entry
> > > > > > > will be removed under us we don't corrupt page's state.
> > > > > > > 
> > > > > > > Please, test.
> > > > > > > 
> > > > > > 
> > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I tested
> > > > > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > > > > 
> > > > > > page:ffffea00016a0000 count:3 mapcount:0 mapping:ffff88007f49d001 index:0x600001800 compound_mapcount: 0
> > > > > > flags: 0x4000000000044009(locked|uptodate|head|swapbacked)
> > > > > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > > > > > page->mem_cgroup:ffff88007f613c00
> > > > > 
> > > > > Ignore my previous answer. Still sleeping.
> > > > > 
> > > > > The right way to fix I think is something like:
> > > > > 
> > > > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > > > index 35643176bc15..f2d46792a554 100644
> > > > > --- a/mm/rmap.c
> > > > > +++ b/mm/rmap.c
> > > > > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
> > > > >  	bool compound = flags & RMAP_COMPOUND;
> > > > >  	bool first;
> > > > >  
> > > > > -	if (PageTransCompound(page)) {
> > > > > +	if (PageTransCompound(page) && compound) {
> > > > > +		atomic_t *mapcount;
> > > > >  		VM_BUG_ON_PAGE(!PageLocked(page), page);
> > > > > -		if (compound) {
> > > > > -			atomic_t *mapcount;
> > > > > -
> > > > > -			VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > > > > -			mapcount = compound_mapcount_ptr(page);
> > > > > -			first = atomic_inc_and_test(mapcount);
> > > > > -		} else {
> > > > > -			/* Anon THP always mapped first with PMD */
> > > > > -			first = 0;
> > > > > -			VM_BUG_ON_PAGE(!page_mapcount(page), page);
> > > > > -			atomic_inc(&page->_mapcount);
> > > > > -		}
> > > > > +		VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > > > > +		mapcount = compound_mapcount_ptr(page);
> > > > > +		first = atomic_inc_and_test(mapcount);
> > > > >  	} else {
> > > > >  		VM_BUG_ON_PAGE(compound, page);
> > > > >  		first = atomic_inc_and_test(&page->_mapcount);
> > > > > -- 
> > > > 
> > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + freeze/unfreeze patch + above patch,
> > > > 
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > 
> > > > <SNIP>
> > > > 
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > BUG: Bad rss-counter state mm:ffff880046980700 idx:1 val:511
> > > > BUG: Bad rss-counter state mm:ffff880046980700 idx:2 val:1
> > > 
> > > Hm. I was not able to trigger this and don't see anything obviuous what can
> > > lead to this kind of missmatch :-/
> 
> I managed to trigger this when switched back from MADV_DONTNEED to
> MADV_FREE. Hm..

Hmm,,
What version of MADV_FREE do you test on?
Old MADV_FREE(ie, before posting MADV_FREE refactoring and fix KSM page)
had a bug.

I tried your patches on top of recent my MADV_FREE patches.
But when I try it with old THP refcount redesign, I couldn't find
any problem so far. However, I'm not saying it's your fault.

I will give it a shot with MADV_DONTNEED to reproduce the problem.
But one thing I could say is MADV_DONTNEED is more hard to hit
compared to MADV_FREE because memory pressure of MADV_DONTNEED test
wouldn't be heavy.

> 
> > > I found one more bug: clearing of PageTail can be visible to other CPUs
> > > before updated page->flags on the page.
> > > 
> > > I don't think this bug is connected to what you've reported, but worth
> > > testing.
> > 
> > I'm happy to test but I ask one thing.
> > I hope you send new formal all-on-one patch instead of code snippets.
> > It can help to test/communicate easy and others understands current
> > issues and your approaches.
> 
> I'll post patchset with refcounting fixes today.

Yeb, I will wait and if I get it before leaving the office,
I will queue it to test machine.

> 
> > And please say what kernel your patch based on.
> 
> That's on top of
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git since-4.2

I have been tested it on git://git.cmpxchg.org/linux-mmotm.git.
I guess applying your patch to hannes's tree is not a difficult.
I will continue to use hannes's mmotm.

> 
> -- 
>  Kirill A. Shutemov
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-11-03  7:33                         ` Minchan Kim
@ 2015-11-03 15:20                           ` Minchan Kim
  2015-11-04 14:21                             ` Kirill A. Shutemov
  0 siblings, 1 reply; 33+ messages in thread
From: Minchan Kim @ 2015-11-03 15:20 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Sasha Levin, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > Hello Kirill,
> > > 
> > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > > > Hello Hugh,
> > > > > > > > > > 
> > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > I added the code to check it and queued it again but I had another oops
> > > > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > > > > > > > > > > at that time but second check of page_mapped right before try_to_unmap seems
> > > > > > > > > > > > to be true.
> > > > > > > > > > > > 
> > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > > > > page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
> > > > > > > > > > > > flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
> > > > > > > > > > > 
> > > > > > > > > > > That's interesting, that's one I added in my page migration series.
> > > > > > > > > > > Let me think on it, but it could well relate to the one you got before.
> > > > > > > > > > 
> > > > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > > > > > > > series and will test it again.
> > > > > > > > > > If it is fixed, I will test again with your migration patchset, then.
> > > > > > > > > 
> > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> > > > > > > > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > > > > > > > And I added below debug code with request from Kirill to all test kernels.
> > > > > > > > 
> > > > > > > > It took too long time (and a lot of printk()), but I think I track it down
> > > > > > > > finally.
> > > > > > > >  
> > > > > > > > The patch below seems fixes issue for me. It's not yet properly tested, but
> > > > > > > > looks like it works.
> > > > > > > > 
> > > > > > > > The problem was my wrong assumption on how migration works: I thought that
> > > > > > > > kernel would wait migration to finish on before deconstruction mapping.
> > > > > > > > 
> > > > > > > > But turn out that's not true.
> > > > > > > > 
> > > > > > > > As result if zap_pte_range() races with split_huge_page(), we can end up
> > > > > > > > with page which is not mapped anymore but has _count and _mapcount
> > > > > > > > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > > > > > > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > > > > > > It's likely that page->mapping in this case would point to freed anon_vma.
> > > > > > > > 
> > > > > > > > BOOM!
> > > > > > > > 
> > > > > > > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > > > > > > entries logic: on setup we remove page from rmap and drop pin, on removing
> > > > > > > > we get pin back and put page on rmap. This way even if migration entry
> > > > > > > > will be removed under us we don't corrupt page's state.
> > > > > > > > 
> > > > > > > > Please, test.
> > > > > > > > 
> > > > > > > 
> > > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I tested
> > > > > > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > > > > > 
> > > > > > > page:ffffea00016a0000 count:3 mapcount:0 mapping:ffff88007f49d001 index:0x600001800 compound_mapcount: 0
> > > > > > > flags: 0x4000000000044009(locked|uptodate|head|swapbacked)
> > > > > > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > > > > > > page->mem_cgroup:ffff88007f613c00
> > > > > > 
> > > > > > Ignore my previous answer. Still sleeping.
> > > > > > 
> > > > > > The right way to fix I think is something like:
> > > > > > 
> > > > > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > > > > index 35643176bc15..f2d46792a554 100644
> > > > > > --- a/mm/rmap.c
> > > > > > +++ b/mm/rmap.c
> > > > > > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
> > > > > >  	bool compound = flags & RMAP_COMPOUND;
> > > > > >  	bool first;
> > > > > >  
> > > > > > -	if (PageTransCompound(page)) {
> > > > > > +	if (PageTransCompound(page) && compound) {
> > > > > > +		atomic_t *mapcount;
> > > > > >  		VM_BUG_ON_PAGE(!PageLocked(page), page);
> > > > > > -		if (compound) {
> > > > > > -			atomic_t *mapcount;
> > > > > > -
> > > > > > -			VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > > > > > -			mapcount = compound_mapcount_ptr(page);
> > > > > > -			first = atomic_inc_and_test(mapcount);
> > > > > > -		} else {
> > > > > > -			/* Anon THP always mapped first with PMD */
> > > > > > -			first = 0;
> > > > > > -			VM_BUG_ON_PAGE(!page_mapcount(page), page);
> > > > > > -			atomic_inc(&page->_mapcount);
> > > > > > -		}
> > > > > > +		VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > > > > > +		mapcount = compound_mapcount_ptr(page);
> > > > > > +		first = atomic_inc_and_test(mapcount);
> > > > > >  	} else {
> > > > > >  		VM_BUG_ON_PAGE(compound, page);
> > > > > >  		first = atomic_inc_and_test(&page->_mapcount);
> > > > > > -- 
> > > > > 
> > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + freeze/unfreeze patch + above patch,
> > > > > 
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > 
> > > > > <SNIP>
> > > > > 
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > BUG: Bad rss-counter state mm:ffff880046980700 idx:1 val:511
> > > > > BUG: Bad rss-counter state mm:ffff880046980700 idx:2 val:1
> > > > 
> > > > Hm. I was not able to trigger this and don't see anything obviuous what can
> > > > lead to this kind of missmatch :-/
> > 
> > I managed to trigger this when switched back from MADV_DONTNEED to
> > MADV_FREE. Hm..
> 
> Hmm,,
> What version of MADV_FREE do you test on?
> Old MADV_FREE(ie, before posting MADV_FREE refactoring and fix KSM page)
> had a bug.
> 
> I tried your patches on top of recent my MADV_FREE patches.
> But when I try it with old THP refcount redesign, I couldn't find
> any problem so far. However, I'm not saying it's your fault.
> 
> I will give it a shot with MADV_DONTNEED to reproduce the problem.
> But one thing I could say is MADV_DONTNEED is more hard to hit
> compared to MADV_FREE because memory pressure of MADV_DONTNEED test
> wouldn't be heavy.

I reproduced this on the kernel which has no code related to MADV_FREE:

mmotm-2015-10-15-15-20-no-madvise_free, IOW it means git head for
54bad5da4834 arm64: add pmd_[dirty|mkclean] for THP so there is no
MADV_FREE code in there
+ pte_mkdirty patch
+ freeze/unfreeze patch
+ do_page_add_anon_rmap patch

Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:ffff88007fdd5b00 idx:1 val:511
BUG: Bad rss-counter state mm:ffff88007fdd5b00 idx:2 val:1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-11-03 15:20                           ` Minchan Kim
@ 2015-11-04 14:21                             ` Kirill A. Shutemov
  2015-11-05  0:19                               ` Minchan Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Kirill A. Shutemov @ 2015-11-04 14:21 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Hugh Dickins, Sasha Levin, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote:
> On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > > Hello Kirill,
> > > > 
> > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > > > > Hello Hugh,
> > > > > > > > > > > 
> > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I added the code to check it and queued it again but I had another oops
> > > > > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > > > > > > > > > > > at that time but second check of page_mapped right before try_to_unmap seems
> > > > > > > > > > > > > to be true.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > > > > > page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
> > > > > > > > > > > > > flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
> > > > > > > > > > > > 
> > > > > > > > > > > > That's interesting, that's one I added in my page migration series.
> > > > > > > > > > > > Let me think on it, but it could well relate to the one you got before.
> > > > > > > > > > > 
> > > > > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > > > > > > > > series and will test it again.
> > > > > > > > > > > If it is fixed, I will test again with your migration patchset, then.
> > > > > > > > > > 
> > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> > > > > > > > > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > > > > > > > > And I added below debug code with request from Kirill to all test kernels.
> > > > > > > > > 
> > > > > > > > > It took too long time (and a lot of printk()), but I think I track it down
> > > > > > > > > finally.
> > > > > > > > >  
> > > > > > > > > The patch below seems fixes issue for me. It's not yet properly tested, but
> > > > > > > > > looks like it works.
> > > > > > > > > 
> > > > > > > > > The problem was my wrong assumption on how migration works: I thought that
> > > > > > > > > kernel would wait migration to finish on before deconstruction mapping.
> > > > > > > > > 
> > > > > > > > > But turn out that's not true.
> > > > > > > > > 
> > > > > > > > > As result if zap_pte_range() races with split_huge_page(), we can end up
> > > > > > > > > with page which is not mapped anymore but has _count and _mapcount
> > > > > > > > > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > > > > > > > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > > > > > > > It's likely that page->mapping in this case would point to freed anon_vma.
> > > > > > > > > 
> > > > > > > > > BOOM!
> > > > > > > > > 
> > > > > > > > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > > > > > > > entries logic: on setup we remove page from rmap and drop pin, on removing
> > > > > > > > > we get pin back and put page on rmap. This way even if migration entry
> > > > > > > > > will be removed under us we don't corrupt page's state.
> > > > > > > > > 
> > > > > > > > > Please, test.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I tested
> > > > > > > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > > > > > > 
> > > > > > > > page:ffffea00016a0000 count:3 mapcount:0 mapping:ffff88007f49d001 index:0x600001800 compound_mapcount: 0
> > > > > > > > flags: 0x4000000000044009(locked|uptodate|head|swapbacked)
> > > > > > > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > > > > > > > page->mem_cgroup:ffff88007f613c00
> > > > > > > 
> > > > > > > Ignore my previous answer. Still sleeping.
> > > > > > > 
> > > > > > > The right way to fix I think is something like:
> > > > > > > 
> > > > > > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > > > > > index 35643176bc15..f2d46792a554 100644
> > > > > > > --- a/mm/rmap.c
> > > > > > > +++ b/mm/rmap.c
> > > > > > > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
> > > > > > >  	bool compound = flags & RMAP_COMPOUND;
> > > > > > >  	bool first;
> > > > > > >  
> > > > > > > -	if (PageTransCompound(page)) {
> > > > > > > +	if (PageTransCompound(page) && compound) {
> > > > > > > +		atomic_t *mapcount;
> > > > > > >  		VM_BUG_ON_PAGE(!PageLocked(page), page);
> > > > > > > -		if (compound) {
> > > > > > > -			atomic_t *mapcount;
> > > > > > > -
> > > > > > > -			VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > > > > > > -			mapcount = compound_mapcount_ptr(page);
> > > > > > > -			first = atomic_inc_and_test(mapcount);
> > > > > > > -		} else {
> > > > > > > -			/* Anon THP always mapped first with PMD */
> > > > > > > -			first = 0;
> > > > > > > -			VM_BUG_ON_PAGE(!page_mapcount(page), page);
> > > > > > > -			atomic_inc(&page->_mapcount);
> > > > > > > -		}
> > > > > > > +		VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > > > > > > +		mapcount = compound_mapcount_ptr(page);
> > > > > > > +		first = atomic_inc_and_test(mapcount);
> > > > > > >  	} else {
> > > > > > >  		VM_BUG_ON_PAGE(compound, page);
> > > > > > >  		first = atomic_inc_and_test(&page->_mapcount);
> > > > > > > -- 
> > > > > > 
> > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + freeze/unfreeze patch + above patch,
> > > > > > 
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > 
> > > > > > <SNIP>
> > > > > > 
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > BUG: Bad rss-counter state mm:ffff880046980700 idx:1 val:511
> > > > > > BUG: Bad rss-counter state mm:ffff880046980700 idx:2 val:1
> > > > > 
> > > > > Hm. I was not able to trigger this and don't see anything obviuous what can
> > > > > lead to this kind of missmatch :-/
> > > 
> > > I managed to trigger this when switched back from MADV_DONTNEED to
> > > MADV_FREE. Hm..
> > 
> > Hmm,,
> > What version of MADV_FREE do you test on?
> > Old MADV_FREE(ie, before posting MADV_FREE refactoring and fix KSM page)
> > had a bug.
> > 
> > I tried your patches on top of recent my MADV_FREE patches.
> > But when I try it with old THP refcount redesign, I couldn't find
> > any problem so far. However, I'm not saying it's your fault.
> > 
> > I will give it a shot with MADV_DONTNEED to reproduce the problem.
> > But one thing I could say is MADV_DONTNEED is more hard to hit
> > compared to MADV_FREE because memory pressure of MADV_DONTNEED test
> > wouldn't be heavy.
> 
> I reproduced this on the kernel which has no code related to MADV_FREE:
> 
> mmotm-2015-10-15-15-20-no-madvise_free, IOW it means git head for
> 54bad5da4834 arm64: add pmd_[dirty|mkclean] for THP so there is no
> MADV_FREE code in there
> + pte_mkdirty patch
> + freeze/unfreeze patch
> + do_page_add_anon_rmap patch
> 
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> BUG: Bad rss-counter state mm:ffff88007fdd5b00 idx:1 val:511
> BUG: Bad rss-counter state mm:ffff88007fdd5b00 idx:2 val:1

I have one idea why it could happen, but not sure yet..

Could you check if it makes any difference for you?

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 5c7b00e88236..194f7f8b8c66 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -103,12 +103,7 @@ void deferred_split_huge_page(struct page *page);
 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 		unsigned long address);
 
-#define split_huge_pmd(__vma, __pmd, __address)				\
-	do {								\
-		pmd_t *____pmd = (__pmd);				\
-		if (pmd_trans_huge(*____pmd))				\
-			__split_huge_pmd(__vma, __pmd, __address);	\
-	}  while (0)
+#define split_huge_pmd(__vma, __pmd, __address)	__split_huge_pmd(__vma, __pmd, __address)
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-11-04 14:21                             ` Kirill A. Shutemov
@ 2015-11-05  0:19                               ` Minchan Kim
  2015-11-08 22:55                                 ` Kirill A. Shutemov
  0 siblings, 1 reply; 33+ messages in thread
From: Minchan Kim @ 2015-11-05  0:19 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Sasha Levin, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Wed, Nov 04, 2015 at 04:21:35PM +0200, Kirill A. Shutemov wrote:
> On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote:
> > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > > > Hello Kirill,
> > > > > 
> > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > > > > > Hello Hugh,
> > > > > > > > > > > > 
> > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I added the code to check it and queued it again but I had another oops
> > > > > > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > > > > > > > > > > > > at that time but second check of page_mapped right before try_to_unmap seems
> > > > > > > > > > > > > > to be true.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > > > > > > page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
> > > > > > > > > > > > > > flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
> > > > > > > > > > > > > 
> > > > > > > > > > > > > That's interesting, that's one I added in my page migration series.
> > > > > > > > > > > > > Let me think on it, but it could well relate to the one you got before.
> > > > > > > > > > > > 
> > > > > > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > > > > > > > > > series and will test it again.
> > > > > > > > > > > > If it is fixed, I will test again with your migration patchset, then.
> > > > > > > > > > > 
> > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> > > > > > > > > > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > > > > > > > > > And I added below debug code with request from Kirill to all test kernels.
> > > > > > > > > > 
> > > > > > > > > > It took too long time (and a lot of printk()), but I think I track it down
> > > > > > > > > > finally.
> > > > > > > > > >  
> > > > > > > > > > The patch below seems fixes issue for me. It's not yet properly tested, but
> > > > > > > > > > looks like it works.
> > > > > > > > > > 
> > > > > > > > > > The problem was my wrong assumption on how migration works: I thought that
> > > > > > > > > > kernel would wait migration to finish on before deconstruction mapping.
> > > > > > > > > > 
> > > > > > > > > > But turn out that's not true.
> > > > > > > > > > 
> > > > > > > > > > As result if zap_pte_range() races with split_huge_page(), we can end up
> > > > > > > > > > with page which is not mapped anymore but has _count and _mapcount
> > > > > > > > > > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > > > > > > > > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > > > > > > > > It's likely that page->mapping in this case would point to freed anon_vma.
> > > > > > > > > > 
> > > > > > > > > > BOOM!
> > > > > > > > > > 
> > > > > > > > > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > > > > > > > > entries logic: on setup we remove page from rmap and drop pin, on removing
> > > > > > > > > > we get pin back and put page on rmap. This way even if migration entry
> > > > > > > > > > will be removed under us we don't corrupt page's state.
> > > > > > > > > > 
> > > > > > > > > > Please, test.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I tested
> > > > > > > > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > > > > > > > 
> > > > > > > > > page:ffffea00016a0000 count:3 mapcount:0 mapping:ffff88007f49d001 index:0x600001800 compound_mapcount: 0
> > > > > > > > > flags: 0x4000000000044009(locked|uptodate|head|swapbacked)
> > > > > > > > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > > > > > > > > page->mem_cgroup:ffff88007f613c00
> > > > > > > > 
> > > > > > > > Ignore my previous answer. Still sleeping.
> > > > > > > > 
> > > > > > > > The right way to fix I think is something like:
> > > > > > > > 
> > > > > > > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > > > > > > index 35643176bc15..f2d46792a554 100644
> > > > > > > > --- a/mm/rmap.c
> > > > > > > > +++ b/mm/rmap.c
> > > > > > > > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
> > > > > > > >  	bool compound = flags & RMAP_COMPOUND;
> > > > > > > >  	bool first;
> > > > > > > >  
> > > > > > > > -	if (PageTransCompound(page)) {
> > > > > > > > +	if (PageTransCompound(page) && compound) {
> > > > > > > > +		atomic_t *mapcount;
> > > > > > > >  		VM_BUG_ON_PAGE(!PageLocked(page), page);
> > > > > > > > -		if (compound) {
> > > > > > > > -			atomic_t *mapcount;
> > > > > > > > -
> > > > > > > > -			VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > > > > > > > -			mapcount = compound_mapcount_ptr(page);
> > > > > > > > -			first = atomic_inc_and_test(mapcount);
> > > > > > > > -		} else {
> > > > > > > > -			/* Anon THP always mapped first with PMD */
> > > > > > > > -			first = 0;
> > > > > > > > -			VM_BUG_ON_PAGE(!page_mapcount(page), page);
> > > > > > > > -			atomic_inc(&page->_mapcount);
> > > > > > > > -		}
> > > > > > > > +		VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > > > > > > > +		mapcount = compound_mapcount_ptr(page);
> > > > > > > > +		first = atomic_inc_and_test(mapcount);
> > > > > > > >  	} else {
> > > > > > > >  		VM_BUG_ON_PAGE(compound, page);
> > > > > > > >  		first = atomic_inc_and_test(&page->_mapcount);
> > > > > > > > -- 
> > > > > > > 
> > > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + freeze/unfreeze patch + above patch,
> > > > > > > 
> > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512
> > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > 
> > > > > > > <SNIP>
> > > > > > > 
> > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > BUG: Bad rss-counter state mm:ffff880046980700 idx:1 val:511
> > > > > > > BUG: Bad rss-counter state mm:ffff880046980700 idx:2 val:1
> > > > > > 
> > > > > > Hm. I was not able to trigger this and don't see anything obviuous what can
> > > > > > lead to this kind of missmatch :-/
> > > > 
> > > > I managed to trigger this when switched back from MADV_DONTNEED to
> > > > MADV_FREE. Hm..
> > > 
> > > Hmm,,
> > > What version of MADV_FREE do you test on?
> > > Old MADV_FREE(ie, before posting MADV_FREE refactoring and fix KSM page)
> > > had a bug.
> > > 
> > > I tried your patches on top of recent my MADV_FREE patches.
> > > But when I try it with old THP refcount redesign, I couldn't find
> > > any problem so far. However, I'm not saying it's your fault.
> > > 
> > > I will give it a shot with MADV_DONTNEED to reproduce the problem.
> > > But one thing I could say is MADV_DONTNEED is more hard to hit
> > > compared to MADV_FREE because memory pressure of MADV_DONTNEED test
> > > wouldn't be heavy.
> > 
> > I reproduced this on the kernel which has no code related to MADV_FREE:
> > 
> > mmotm-2015-10-15-15-20-no-madvise_free, IOW it means git head for
> > 54bad5da4834 arm64: add pmd_[dirty|mkclean] for THP so there is no
> > MADV_FREE code in there
> > + pte_mkdirty patch
> > + freeze/unfreeze patch
> > + do_page_add_anon_rmap patch
> > 
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > BUG: Bad rss-counter state mm:ffff88007fdd5b00 idx:1 val:511
> > BUG: Bad rss-counter state mm:ffff88007fdd5b00 idx:2 val:1
> 
> I have one idea why it could happen, but not sure yet..
> 
> Could you check if it makes any difference for you?
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 5c7b00e88236..194f7f8b8c66 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -103,12 +103,7 @@ void deferred_split_huge_page(struct page *page);
>  void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>  		unsigned long address);
>  
> -#define split_huge_pmd(__vma, __pmd, __address)				\
> -	do {								\
> -		pmd_t *____pmd = (__pmd);				\
> -		if (pmd_trans_huge(*____pmd))				\
> -			__split_huge_pmd(__vma, __pmd, __address);	\
> -	}  while (0)
> +#define split_huge_pmd(__vma, __pmd, __address)	__split_huge_pmd(__vma, __pmd, __address)

mmotm-2015-10-15-15-20-no-madvise_free, IOW it means git head for
54bad5da4834 arm64: add pmd_[dirty|mkclean] for THP so there is no
MADV_FREE code in there
 + pte_mkdirty patch
 + freeze/unfreeze patch
 + do_page_add_anon_rmap patch
 + above split_huge_pmd


Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:ffff88007fa3bb80 idx:1 val:512
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff810782a9>] down_read_trylock+0x9/0x30
PGD 0 
Oops: 0000 [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 11 PID: 59 Comm: khugepaged Not tainted 4.3.0-rc5-mm1-no-madv-free+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff8800b9851a40 ti: ffff8800b985c000 task.ti: ffff8800b985c000
RIP: 0010:[<ffffffff810782a9>]  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
RSP: 0018:ffff8800b985f778  EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffffea0000154b80 RCX: ffff8800b985f918
RDX: 0000000000000000 RSI: ffff8800b985f818 RDI: 0000000000000008
RBP: ffff8800b985f778 R08: ffffffff818446a0 R09: ffff8800b903cff8
R10: ffff8800b903d168 R11: ffff8800b985f7b8 R12: ffff88007ef6c731
R13: ffff88007ef6c730 R14: 0000000000000008 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff8800bfb60000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000000001808000 CR4: 00000000000006a0
Stack:
 ffff8800b985f7a8 ffffffff81124f20 ffffea0000154b80 ffff8800b985f818
 ffff88007ff4dc00 0000000000000000 ffff8800b985f7f0 ffffffff81125663
 0000000000000000 ffffffff818446a0 ffffea0000154b80 ffff8800b985f918
Call Trace:
 [<ffffffff81124f20>] page_lock_anon_vma_read+0x60/0x180
 [<ffffffff81125663>] rmap_walk+0x1b3/0x3f0
 [<ffffffff81125a43>] page_referenced+0x1a3/0x220
 [<ffffffff81123e20>] ? __page_check_address+0x1a0/0x1a0
 [<ffffffff81124ec0>] ? page_get_anon_vma+0xd0/0xd0
 [<ffffffff81123810>] ? anon_vma_ctor+0x40/0x40
 [<ffffffff8110086b>] shrink_page_list+0x5ab/0xde0
 [<ffffffff8110173c>] shrink_inactive_list+0x18c/0x4b0
 [<ffffffff811023ad>] shrink_lruvec+0x59d/0x740
 [<ffffffff811025e0>] shrink_zone+0x90/0x250
 [<ffffffff811028cd>] do_try_to_free_pages+0x12d/0x3b0
 [<ffffffff81102d2d>] try_to_free_mem_cgroup_pages+0x9d/0x120
 [<ffffffff81149949>] try_charge+0x1f9/0x670
 [<ffffffff810fb030>] ? lru_cache_add_file+0x40/0x40
 [<ffffffff8114d0a6>] mem_cgroup_try_charge+0x86/0x120
 [<ffffffff811433bc>] khugepaged+0x7cc/0x1ac0
 [<ffffffff81064f01>] ? __clear_sched_clock_stable+0x11/0x20
 [<ffffffff81072430>] ? prepare_to_wait_event+0xf0/0xf0
 [<ffffffff81142bf0>] ? __split_huge_pmd_locked+0x4a0/0x4a0
 [<ffffffff81056cd9>] kthread+0xc9/0xe0
 [<ffffffff81056c10>] ? kthread_park+0x60/0x60
 [<ffffffff8142066f>] ret_from_fork+0x3f/0x70
 [<ffffffff81056c10>] ? kthread_park+0x60/0x60
Code: 6e 7b 3a 00 48 83 c4 08 5b 5d c3 48 89 45 f0 e8 ab 63 3a 00 48 8b 45 f0 eb df 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 <48> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7 




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-11-05  0:19                               ` Minchan Kim
@ 2015-11-08 22:55                                 ` Kirill A. Shutemov
  2015-11-12  0:36                                   ` Minchan Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Kirill A. Shutemov @ 2015-11-08 22:55 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Hugh Dickins, Sasha Levin, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Thu, Nov 05, 2015 at 09:19:22AM +0900, Minchan Kim wrote:
> On Wed, Nov 04, 2015 at 04:21:35PM +0200, Kirill A. Shutemov wrote:
> > On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote:
> > > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> > > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > > > > Hello Kirill,
> > > > > > 
> > > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > > > > > > Hello Hugh,
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > I added the code to check it and queued it again but I had another oops
> > > > > > > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > > > > > > > > > > > > > at that time but second check of page_mapped right before try_to_unmap seems
> > > > > > > > > > > > > > > to be true.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > > > > > > > page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
> > > > > > > > > > > > > > > flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > That's interesting, that's one I added in my page migration series.
> > > > > > > > > > > > > > Let me think on it, but it could well relate to the one you got before.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > > > > > > > > > > series and will test it again.
> > > > > > > > > > > > > If it is fixed, I will test again with your migration patchset, then.
> > > > > > > > > > > > 
> > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> > > > > > > > > > > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > > > > > > > > > > And I added below debug code with request from Kirill to all test kernels.
> > > > > > > > > > > 
> > > > > > > > > > > It took too long time (and a lot of printk()), but I think I track it down
> > > > > > > > > > > finally.
> > > > > > > > > > >  
> > > > > > > > > > > The patch below seems fixes issue for me. It's not yet properly tested, but
> > > > > > > > > > > looks like it works.
> > > > > > > > > > > 
> > > > > > > > > > > The problem was my wrong assumption on how migration works: I thought that
> > > > > > > > > > > kernel would wait migration to finish on before deconstruction mapping.
> > > > > > > > > > > 
> > > > > > > > > > > But turn out that's not true.
> > > > > > > > > > > 
> > > > > > > > > > > As result if zap_pte_range() races with split_huge_page(), we can end up
> > > > > > > > > > > with page which is not mapped anymore but has _count and _mapcount
> > > > > > > > > > > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > > > > > > > > > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > > > > > > > > > It's likely that page->mapping in this case would point to freed anon_vma.
> > > > > > > > > > > 
> > > > > > > > > > > BOOM!
> > > > > > > > > > > 
> > > > > > > > > > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > > > > > > > > > entries logic: on setup we remove page from rmap and drop pin, on removing
> > > > > > > > > > > we get pin back and put page on rmap. This way even if migration entry
> > > > > > > > > > > will be removed under us we don't corrupt page's state.
> > > > > > > > > > > 
> > > > > > > > > > > Please, test.
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I tested
> > > > > > > > > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > > > > > > > > 
> > > > > > > > > > page:ffffea00016a0000 count:3 mapcount:0 mapping:ffff88007f49d001 index:0x600001800 compound_mapcount: 0
> > > > > > > > > > flags: 0x4000000000044009(locked|uptodate|head|swapbacked)
> > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > > > > > > > > > page->mem_cgroup:ffff88007f613c00
> > > > > > > > > 
> > > > > > > > > Ignore my previous answer. Still sleeping.
> > > > > > > > > 
> > > > > > > > > The right way to fix I think is something like:
> > > > > > > > > 
> > > > > > > > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > > > > > > > index 35643176bc15..f2d46792a554 100644
> > > > > > > > > --- a/mm/rmap.c
> > > > > > > > > +++ b/mm/rmap.c
> > > > > > > > > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
> > > > > > > > >  	bool compound = flags & RMAP_COMPOUND;
> > > > > > > > >  	bool first;
> > > > > > > > >  
> > > > > > > > > -	if (PageTransCompound(page)) {
> > > > > > > > > +	if (PageTransCompound(page) && compound) {
> > > > > > > > > +		atomic_t *mapcount;
> > > > > > > > >  		VM_BUG_ON_PAGE(!PageLocked(page), page);
> > > > > > > > > -		if (compound) {
> > > > > > > > > -			atomic_t *mapcount;
> > > > > > > > > -
> > > > > > > > > -			VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > > > > > > > > -			mapcount = compound_mapcount_ptr(page);
> > > > > > > > > -			first = atomic_inc_and_test(mapcount);
> > > > > > > > > -		} else {
> > > > > > > > > -			/* Anon THP always mapped first with PMD */
> > > > > > > > > -			first = 0;
> > > > > > > > > -			VM_BUG_ON_PAGE(!page_mapcount(page), page);
> > > > > > > > > -			atomic_inc(&page->_mapcount);
> > > > > > > > > -		}
> > > > > > > > > +		VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > > > > > > > > +		mapcount = compound_mapcount_ptr(page);
> > > > > > > > > +		first = atomic_inc_and_test(mapcount);
> > > > > > > > >  	} else {
> > > > > > > > >  		VM_BUG_ON_PAGE(compound, page);
> > > > > > > > >  		first = atomic_inc_and_test(&page->_mapcount);
> > > > > > > > > -- 
> > > > > > > > 
> > > > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + freeze/unfreeze patch + above patch,
> > > > > > > > 
> > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512
> > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > 
> > > > > > > > <SNIP>
> > > > > > > > 
> > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > BUG: Bad rss-counter state mm:ffff880046980700 idx:1 val:511
> > > > > > > > BUG: Bad rss-counter state mm:ffff880046980700 idx:2 val:1
> > > > > > > 
> > > > > > > Hm. I was not able to trigger this and don't see anything obviuous what can
> > > > > > > lead to this kind of missmatch :-/
> > > > > 
> > > > > I managed to trigger this when switched back from MADV_DONTNEED to
> > > > > MADV_FREE. Hm..
> > > > 
> > > > Hmm,,
> > > > What version of MADV_FREE do you test on?
> > > > Old MADV_FREE(ie, before posting MADV_FREE refactoring and fix KSM page)
> > > > had a bug.
> > > > 
> > > > I tried your patches on top of recent my MADV_FREE patches.
> > > > But when I try it with old THP refcount redesign, I couldn't find
> > > > any problem so far. However, I'm not saying it's your fault.
> > > > 
> > > > I will give it a shot with MADV_DONTNEED to reproduce the problem.
> > > > But one thing I could say is MADV_DONTNEED is more hard to hit
> > > > compared to MADV_FREE because memory pressure of MADV_DONTNEED test
> > > > wouldn't be heavy.
> > > 
> > > I reproduced this on the kernel which has no code related to MADV_FREE:
> > > 
> > > mmotm-2015-10-15-15-20-no-madvise_free, IOW it means git head for
> > > 54bad5da4834 arm64: add pmd_[dirty|mkclean] for THP so there is no
> > > MADV_FREE code in there
> > > + pte_mkdirty patch
> > > + freeze/unfreeze patch
> > > + do_page_add_anon_rmap patch
> > > 
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > BUG: Bad rss-counter state mm:ffff88007fdd5b00 idx:1 val:511
> > > BUG: Bad rss-counter state mm:ffff88007fdd5b00 idx:2 val:1
> > 
> > I have one idea why it could happen, but not sure yet..
> > 
> > Could you check if it makes any difference for you?
> > 
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index 5c7b00e88236..194f7f8b8c66 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -103,12 +103,7 @@ void deferred_split_huge_page(struct page *page);
> >  void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
> >  		unsigned long address);
> >  
> > -#define split_huge_pmd(__vma, __pmd, __address)				\
> > -	do {								\
> > -		pmd_t *____pmd = (__pmd);				\
> > -		if (pmd_trans_huge(*____pmd))				\
> > -			__split_huge_pmd(__vma, __pmd, __address);	\
> > -	}  while (0)
> > +#define split_huge_pmd(__vma, __pmd, __address)	__split_huge_pmd(__vma, __pmd, __address)
> 
> mmotm-2015-10-15-15-20-no-madvise_free, IOW it means git head for
> 54bad5da4834 arm64: add pmd_[dirty|mkclean] for THP so there is no
> MADV_FREE code in there
>  + pte_mkdirty patch
>  + freeze/unfreeze patch
>  + do_page_add_anon_rmap patch
>  + above split_huge_pmd
> 
> 
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> BUG: Bad rss-counter state mm:ffff88007fa3bb80 idx:1 val:512

With the patch below my test setup run for 2+ days without triggering the
bug. split_huge_pmd patch should be dropped.

Please test.

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 14cbbad54a3e..7aa0a3fef2aa 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2841,9 +2841,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	write = pmd_write(*pmd);
 	young = pmd_young(*pmd);
 
-	/* leave pmd empty until pte is filled */
-	pmdp_huge_clear_flush_notify(vma, haddr, pmd);
-
 	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
 	pmd_populate(mm, &_pmd, pgtable);
 
@@ -2893,6 +2890,28 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	}
 
 	smp_wmb(); /* make pte visible before pmd */
+	/*
+	 * Up to this point the pmd is present and huge and userland has the
+	 * whole access to the hugepage during the split (which happens in
+	 * place). If we overwrite the pmd with the not-huge version pointing
+	 * to the pte here (which of course we could if all CPUs were bug
+	 * free), userland could trigger a small page size TLB miss on the
+	 * small sized TLB while the hugepage TLB entry is still established in
+	 * the huge TLB. Some CPU doesn't like that.
+	 * See http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum
+	 * 383 on page 93. Intel should be safe but is also warns that it's
+	 * only safe if the permission and cache attributes of the two entries
+	 * loaded in the two TLB is identical (which should be the case here).
+	 * But it is generally safer to never allow small and huge TLB entries
+	 * for the same virtual address to be loaded simultaneously. So instead
+	 * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
+	 * current pmd notpresent (atomically because here the pmd_trans_huge
+	 * and pmd_trans_splitting must remain set at all times on the pmd
+	 * until the split is complete for this pmd), then we flush the SMP TLB
+	 * and finally we write the non-huge version of the pmd entry with
+	 * pmd_populate.
+	 */
+	pmdp_invalidate(vma, haddr, pmd);
 	pmd_populate(mm, pmd, pgtable);
 
 	if (freeze) {
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-11-08 22:55                                 ` Kirill A. Shutemov
@ 2015-11-12  0:36                                   ` Minchan Kim
  2015-11-16  1:45                                     ` Minchan Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Minchan Kim @ 2015-11-12  0:36 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Sasha Levin, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Mon, Nov 09, 2015 at 12:55:22AM +0200, Kirill A. Shutemov wrote:
> On Thu, Nov 05, 2015 at 09:19:22AM +0900, Minchan Kim wrote:
> > On Wed, Nov 04, 2015 at 04:21:35PM +0200, Kirill A. Shutemov wrote:
> > > On Wed, Nov 04, 2015 at 12:20:19AM +0900, Minchan Kim wrote:
> > > > On Tue, Nov 03, 2015 at 04:33:29PM +0900, Minchan Kim wrote:
> > > > > On Tue, Nov 03, 2015 at 09:16:50AM +0200, Kirill A. Shutemov wrote:
> > > > > > On Tue, Nov 03, 2015 at 12:02:58PM +0900, Minchan Kim wrote:
> > > > > > > Hello Kirill,
> > > > > > > 
> > > > > > > On Mon, Nov 02, 2015 at 02:57:49PM +0200, Kirill A. Shutemov wrote:
> > > > > > > > On Fri, Oct 30, 2015 at 04:03:50PM +0900, Minchan Kim wrote:
> > > > > > > > > On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> > > > > > > > > > On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > > > > > > > > > > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > > > > > > > > > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > > > > > > > > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > > > > > > > > > > Hello Hugh,
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > > > > > > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > I added the code to check it and queued it again but I had another oops
> > > > > > > > > > > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > > > > > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > > > > > > > > > > > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > > > > > > > > > > > > > > at that time but second check of page_mapped right before try_to_unmap seems
> > > > > > > > > > > > > > > > to be true.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > > > > > > > > page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
> > > > > > > > > > > > > > > > flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > That's interesting, that's one I added in my page migration series.
> > > > > > > > > > > > > > > Let me think on it, but it could well relate to the one you got before.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > > > > > > > > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > > > > > > > > > > > series and will test it again.
> > > > > > > > > > > > > > If it is fixed, I will test again with your migration patchset, then.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> > > > > > > > > > > > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > > > > > > > > > > > And I added below debug code with request from Kirill to all test kernels.
> > > > > > > > > > > > 
> > > > > > > > > > > > It took too long time (and a lot of printk()), but I think I track it down
> > > > > > > > > > > > finally.
> > > > > > > > > > > >  
> > > > > > > > > > > > The patch below seems fixes issue for me. It's not yet properly tested, but
> > > > > > > > > > > > looks like it works.
> > > > > > > > > > > > 
> > > > > > > > > > > > The problem was my wrong assumption on how migration works: I thought that
> > > > > > > > > > > > kernel would wait migration to finish on before deconstruction mapping.
> > > > > > > > > > > > 
> > > > > > > > > > > > But turn out that's not true.
> > > > > > > > > > > > 
> > > > > > > > > > > > As result if zap_pte_range() races with split_huge_page(), we can end up
> > > > > > > > > > > > with page which is not mapped anymore but has _count and _mapcount
> > > > > > > > > > > > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > > > > > > > > > > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > > > > > > > > > > It's likely that page->mapping in this case would point to freed anon_vma.
> > > > > > > > > > > > 
> > > > > > > > > > > > BOOM!
> > > > > > > > > > > > 
> > > > > > > > > > > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > > > > > > > > > > entries logic: on setup we remove page from rmap and drop pin, on removing
> > > > > > > > > > > > we get pin back and put page on rmap. This way even if migration entry
> > > > > > > > > > > > will be removed under us we don't corrupt page's state.
> > > > > > > > > > > > 
> > > > > > > > > > > > Please, test.
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I tested
> > > > > > > > > > > one I sent to you(ie, oops.c + memcg_test.sh)
> > > > > > > > > > > 
> > > > > > > > > > > page:ffffea00016a0000 count:3 mapcount:0 mapping:ffff88007f49d001 index:0x600001800 compound_mapcount: 0
> > > > > > > > > > > flags: 0x4000000000044009(locked|uptodate|head|swapbacked)
> > > > > > > > > > > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > > > > > > > > > > page->mem_cgroup:ffff88007f613c00
> > > > > > > > > > 
> > > > > > > > > > Ignore my previous answer. Still sleeping.
> > > > > > > > > > 
> > > > > > > > > > The right way to fix I think is something like:
> > > > > > > > > > 
> > > > > > > > > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > > > > > > > > index 35643176bc15..f2d46792a554 100644
> > > > > > > > > > --- a/mm/rmap.c
> > > > > > > > > > +++ b/mm/rmap.c
> > > > > > > > > > @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
> > > > > > > > > >  	bool compound = flags & RMAP_COMPOUND;
> > > > > > > > > >  	bool first;
> > > > > > > > > >  
> > > > > > > > > > -	if (PageTransCompound(page)) {
> > > > > > > > > > +	if (PageTransCompound(page) && compound) {
> > > > > > > > > > +		atomic_t *mapcount;
> > > > > > > > > >  		VM_BUG_ON_PAGE(!PageLocked(page), page);
> > > > > > > > > > -		if (compound) {
> > > > > > > > > > -			atomic_t *mapcount;
> > > > > > > > > > -
> > > > > > > > > > -			VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > > > > > > > > > -			mapcount = compound_mapcount_ptr(page);
> > > > > > > > > > -			first = atomic_inc_and_test(mapcount);
> > > > > > > > > > -		} else {
> > > > > > > > > > -			/* Anon THP always mapped first with PMD */
> > > > > > > > > > -			first = 0;
> > > > > > > > > > -			VM_BUG_ON_PAGE(!page_mapcount(page), page);
> > > > > > > > > > -			atomic_inc(&page->_mapcount);
> > > > > > > > > > -		}
> > > > > > > > > > +		VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > > > > > > > > > +		mapcount = compound_mapcount_ptr(page);
> > > > > > > > > > +		first = atomic_inc_and_test(mapcount);
> > > > > > > > > >  	} else {
> > > > > > > > > >  		VM_BUG_ON_PAGE(compound, page);
> > > > > > > > > >  		first = atomic_inc_and_test(&page->_mapcount);
> > > > > > > > > > -- 
> > > > > > > > > 
> > > > > > > > > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + freeze/unfreeze patch + above patch,
> > > > > > > > > 
> > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512
> > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > 
> > > > > > > > > <SNIP>
> > > > > > > > > 
> > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > > > > > > BUG: Bad rss-counter state mm:ffff880046980700 idx:1 val:511
> > > > > > > > > BUG: Bad rss-counter state mm:ffff880046980700 idx:2 val:1
> > > > > > > > 
> > > > > > > > Hm. I was not able to trigger this and don't see anything obviuous what can
> > > > > > > > lead to this kind of missmatch :-/
> > > > > > 
> > > > > > I managed to trigger this when switched back from MADV_DONTNEED to
> > > > > > MADV_FREE. Hm..
> > > > > 
> > > > > Hmm,,
> > > > > What version of MADV_FREE do you test on?
> > > > > Old MADV_FREE(ie, before posting MADV_FREE refactoring and fix KSM page)
> > > > > had a bug.
> > > > > 
> > > > > I tried your patches on top of recent my MADV_FREE patches.
> > > > > But when I try it with old THP refcount redesign, I couldn't find
> > > > > any problem so far. However, I'm not saying it's your fault.
> > > > > 
> > > > > I will give it a shot with MADV_DONTNEED to reproduce the problem.
> > > > > But one thing I could say is MADV_DONTNEED is more hard to hit
> > > > > compared to MADV_FREE because memory pressure of MADV_DONTNEED test
> > > > > wouldn't be heavy.
> > > > 
> > > > I reproduced this on the kernel which has no code related to MADV_FREE:
> > > > 
> > > > mmotm-2015-10-15-15-20-no-madvise_free, IOW it means git head for
> > > > 54bad5da4834 arm64: add pmd_[dirty|mkclean] for THP so there is no
> > > > MADV_FREE code in there
> > > > + pte_mkdirty patch
> > > > + freeze/unfreeze patch
> > > > + do_page_add_anon_rmap patch
> > > > 
> > > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > > BUG: Bad rss-counter state mm:ffff88007fdd5b00 idx:1 val:511
> > > > BUG: Bad rss-counter state mm:ffff88007fdd5b00 idx:2 val:1
> > > 
> > > I have one idea why it could happen, but not sure yet..
> > > 
> > > Could you check if it makes any difference for you?
> > > 
> > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > index 5c7b00e88236..194f7f8b8c66 100644
> > > --- a/include/linux/huge_mm.h
> > > +++ b/include/linux/huge_mm.h
> > > @@ -103,12 +103,7 @@ void deferred_split_huge_page(struct page *page);
> > >  void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
> > >  		unsigned long address);
> > >  
> > > -#define split_huge_pmd(__vma, __pmd, __address)				\
> > > -	do {								\
> > > -		pmd_t *____pmd = (__pmd);				\
> > > -		if (pmd_trans_huge(*____pmd))				\
> > > -			__split_huge_pmd(__vma, __pmd, __address);	\
> > > -	}  while (0)
> > > +#define split_huge_pmd(__vma, __pmd, __address)	__split_huge_pmd(__vma, __pmd, __address)
> > 
> > mmotm-2015-10-15-15-20-no-madvise_free, IOW it means git head for
> > 54bad5da4834 arm64: add pmd_[dirty|mkclean] for THP so there is no
> > MADV_FREE code in there
> >  + pte_mkdirty patch
> >  + freeze/unfreeze patch
> >  + do_page_add_anon_rmap patch
> >  + above split_huge_pmd
> > 
> > 
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > BUG: Bad rss-counter state mm:ffff88007fa3bb80 idx:1 val:512
> 
> With the patch below my test setup run for 2+ days without triggering the
> bug. split_huge_pmd patch should be dropped.
> 
> Please test.
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 14cbbad54a3e..7aa0a3fef2aa 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2841,9 +2841,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  	write = pmd_write(*pmd);
>  	young = pmd_young(*pmd);
>  
> -	/* leave pmd empty until pte is filled */
> -	pmdp_huge_clear_flush_notify(vma, haddr, pmd);
> -
>  	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
>  	pmd_populate(mm, &_pmd, pgtable);
>  
> @@ -2893,6 +2890,28 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  	}
>  
>  	smp_wmb(); /* make pte visible before pmd */
> +	/*
> +	 * Up to this point the pmd is present and huge and userland has the
> +	 * whole access to the hugepage during the split (which happens in
> +	 * place). If we overwrite the pmd with the not-huge version pointing
> +	 * to the pte here (which of course we could if all CPUs were bug
> +	 * free), userland could trigger a small page size TLB miss on the
> +	 * small sized TLB while the hugepage TLB entry is still established in
> +	 * the huge TLB. Some CPU doesn't like that.
> +	 * See http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum
> +	 * 383 on page 93. Intel should be safe but is also warns that it's
> +	 * only safe if the permission and cache attributes of the two entries
> +	 * loaded in the two TLB is identical (which should be the case here).
> +	 * But it is generally safer to never allow small and huge TLB entries
> +	 * for the same virtual address to be loaded simultaneously. So instead
> +	 * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
> +	 * current pmd notpresent (atomically because here the pmd_trans_huge
> +	 * and pmd_trans_splitting must remain set at all times on the pmd
> +	 * until the split is complete for this pmd), then we flush the SMP TLB
> +	 * and finally we write the non-huge version of the pmd entry with
> +	 * pmd_populate.
> +	 */
> +	pmdp_invalidate(vma, haddr, pmd);
>  	pmd_populate(mm, pmd, pgtable);
>  
>  	if (freeze) {

I have been tested this patch with MADV_DONTNEED for a few days and
I couldn't see the problem any more. And I will continue to test it
with MADV_FREE.

Thanks.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-11-12  0:36                                   ` Minchan Kim
@ 2015-11-16  1:45                                     ` Minchan Kim
  2015-11-16  8:45                                       ` Kirill A. Shutemov
  0 siblings, 1 reply; 33+ messages in thread
From: Minchan Kim @ 2015-11-16  1:45 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Sasha Levin, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Thu, Nov 12, 2015 at 09:36:14AM +0900, Minchan Kim wrote:

<snip>

> > > mmotm-2015-10-15-15-20-no-madvise_free, IOW it means git head for
> > > 54bad5da4834 arm64: add pmd_[dirty|mkclean] for THP so there is no
> > > MADV_FREE code in there
> > >  + pte_mkdirty patch
> > >  + freeze/unfreeze patch
> > >  + do_page_add_anon_rmap patch
> > >  + above split_huge_pmd
> > > 
> > > 
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
> > > BUG: Bad rss-counter state mm:ffff88007fa3bb80 idx:1 val:512
> > 
> > With the patch below my test setup run for 2+ days without triggering the
> > bug. split_huge_pmd patch should be dropped.
> > 
> > Please test.
> > 
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 14cbbad54a3e..7aa0a3fef2aa 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2841,9 +2841,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> >  	write = pmd_write(*pmd);
> >  	young = pmd_young(*pmd);
> >  
> > -	/* leave pmd empty until pte is filled */
> > -	pmdp_huge_clear_flush_notify(vma, haddr, pmd);
> > -
> >  	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
> >  	pmd_populate(mm, &_pmd, pgtable);
> >  
> > @@ -2893,6 +2890,28 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> >  	}
> >  
> >  	smp_wmb(); /* make pte visible before pmd */
> > +	/*
> > +	 * Up to this point the pmd is present and huge and userland has the
> > +	 * whole access to the hugepage during the split (which happens in
> > +	 * place). If we overwrite the pmd with the not-huge version pointing
> > +	 * to the pte here (which of course we could if all CPUs were bug
> > +	 * free), userland could trigger a small page size TLB miss on the
> > +	 * small sized TLB while the hugepage TLB entry is still established in
> > +	 * the huge TLB. Some CPU doesn't like that.
> > +	 * See http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum
> > +	 * 383 on page 93. Intel should be safe but is also warns that it's
> > +	 * only safe if the permission and cache attributes of the two entries
> > +	 * loaded in the two TLB is identical (which should be the case here).
> > +	 * But it is generally safer to never allow small and huge TLB entries
> > +	 * for the same virtual address to be loaded simultaneously. So instead
> > +	 * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
> > +	 * current pmd notpresent (atomically because here the pmd_trans_huge
> > +	 * and pmd_trans_splitting must remain set at all times on the pmd
> > +	 * until the split is complete for this pmd), then we flush the SMP TLB
> > +	 * and finally we write the non-huge version of the pmd entry with
> > +	 * pmd_populate.
> > +	 */
> > +	pmdp_invalidate(vma, haddr, pmd);
> >  	pmd_populate(mm, pmd, pgtable);
> >  
> >  	if (freeze) {
> 
> I have been tested this patch with MADV_DONTNEED for a few days and
> I couldn't see the problem any more. And I will continue to test it
> with MADV_FREE.

During the test with MADV_FREE on kernel I applied your patches,
I couldn't see any problem.

However, in this round, I did another test which is same one
I attached but a liitle bit different because it doesn't do
(memcg things/kill/swapoff) for testing program long-live test.

With that, I encountered this problem.

page:ffffea0000f60080 count:1 mapcount:0 mapping:ffff88007f584691 index:0x600002a02
flags: 0x400000000006a028(uptodate|lru|writeback|swapcache|reclaim|swapbacked)
page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
page->mem_cgroup:ffff880077cf0c00
------------[ cut here ]------------
kernel BUG at mm/huge_memory.c:3340!
invalid opcode: 0000 [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 7 PID: 1657 Comm: memhog Not tainted 4.3.0-rc5-mm1-madv-free+ #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff88006b0f1a40 ti: ffff88004ced4000 task.ti: ffff88004ced4000
RIP: 0010:[<ffffffff8114bf67>]  [<ffffffff8114bf67>] split_huge_page_to_list+0x907/0x920
RSP: 0018:ffff88004ced7a38  EFLAGS: 00010296
RAX: 0000000000000021 RBX: ffffea0000f60080 RCX: ffffffff81830db8
RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffffff821df4d8
RBP: ffff88004ced7ab8 R08: 0000000000000000 R09: ffff8800000bc560
R10: ffffffff8163d880 R11: 0000000000014f25 R12: ffffea0000f60080
R13: ffffea0000f60088 R14: ffffea0000f60080 R15: 0000000000000000
FS:  00007f43d3ced740(0000) GS:ffff8800782e0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff1f6fcdb98 CR3: 000000004cf56000 CR4: 00000000000006a0
Stack:
 cccccccccccccccd ffffea0000f60080 ffff88004ced7ad0 ffffea0000f60088
 ffff88004ced7ad0 0000000000000000 ffff88004ced7ab8 ffffffff810ef9d0
 ffffea0000f60000 0000000000000000 0000000000000000 ffffea0000f60080
Call Trace:
 [<ffffffff810ef9d0>] ? __lock_page+0xa0/0xb0
 [<ffffffff8114c09c>] deferred_split_scan+0x11c/0x260
 [<ffffffff81117bfc>] ? list_lru_count_one+0x1c/0x30
 [<ffffffff81101333>] shrink_slab.part.42+0x1e3/0x350
 [<ffffffff81105daa>] shrink_zone+0x26a/0x280
 [<ffffffff81105eed>] do_try_to_free_pages+0x12d/0x3b0
 [<ffffffff81106224>] try_to_free_pages+0xb4/0x140
 [<ffffffff810f8a59>] __alloc_pages_nodemask+0x459/0x920
 [<ffffffff8111e667>] handle_mm_fault+0xc77/0x1000
 [<ffffffff8142718d>] ? retint_kernel+0x10/0x10
 [<ffffffff81033629>] __do_page_fault+0x189/0x400
 [<ffffffff810338ac>] do_page_fault+0xc/0x10
 [<ffffffff81428142>] page_fault+0x22/0x30
Code: ff ff 48 c7 c6 f0 b2 77 81 4c 89 f7 e8 13 c3 fc ff 0f 0b 48 83 e8 01 e9 88 f7 ff ff 48 c7 c6 70 a1 77 81 4c 89 f7 e8 f9 c2 fc ff <0f> 0b 48 c7 c6 38 af 77 81 4c 89 e7 e8 e8 c2 fc ff 0f 0b 66 0f 
RIP  [<ffffffff8114bf67>] split_huge_page_to_list+0x907/0x920
 RSP <ffff88004ced7a38>
---[ end trace c9a60522e3a296e4 ]---


So, I reverted all MADV_FREE patches and chaged it with MADV_DONTNEED.
In this time, I saw below oops in this time.
If I miss somethings, please let me know it.

------------[ cut here ]------------
kernel BUG at include/linux/swapops.h:129!
invalid opcode: 0000 [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 5 PID: 1563 Comm: madvise_test Not tainted 4.3.0-rc5-mm1-no-madv-free+ #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff88007e8d3480 ti: ffff88007f524000 task.ti: ffff88007f524000
RIP: 0010:[<ffffffff811504be>]  [<ffffffff811504be>] migration_entry_to_page.part.61+0x4/0x6
RSP: 0018:ffff88007f527cd0  EFLAGS: 00010246
RAX: ffffea0000896b00 RBX: 00006000013ac000 RCX: ffffea0000000000
RDX: 0000000000000000 RSI: ffffea0001f93e80 RDI: 3e000000000225ac
RBP: ffff88007f527cd0 R08: 0000000000000101 R09: ffff88007e4fa000
R10: ffffea0001fda740 R11: 0000000000000000 R12: 00000000044b583e
R13: 00006000013ad000 R14: ffff88007f527e00 R15: ffff88007e4fad60
FS:  00007fe2f099a740(0000) GS:ffff8800782a0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000166c0d0 CR3: 000000007e57b000 CR4: 00000000000006a0
Stack:
 ffff88007f527db8 ffffffff81118030 00006000017fffff ffff88007f527e00
 00006000017fffff ffff88007ed71000 ffff88007e57b600 0000600001800000
 0000600001800000 00006000017fffff 0000600001800000 ffff88007efb6b78
Call Trace:
 [<ffffffff81118030>] unmap_single_vma+0x840/0x880
 [<ffffffff811188a1>] unmap_vmas+0x41/0x60
 [<ffffffff8111dfad>] unmap_region+0x9d/0x100
 [<ffffffff81120007>] do_munmap+0x217/0x380
 [<ffffffff811201b1>] vm_munmap+0x41/0x60
 [<ffffffff811210d2>] SyS_munmap+0x22/0x30
 [<ffffffff81420357>] entry_SYSCALL_64_fastpath+0x12/0x6a
Code: df 48 c1 ff 06 49 01 fc 4c 89 e7 e8 9c ff ff ff 85 c0 74 0c 4c 89 e0 48 c1 e0 06 48 29 d8 eb 02 31 c0 5b 41 5c 5d c3 55 48 89 e5 <0f> 0b 55 48 c7 c6 30 80 77 81 48 89 e5 e8 f0 45 fc ff 0f 0b 55 
RIP  [<ffffffff811504be>] migration_entry_to_page.part.61+0x4/0x6
 RSP <ffff88007f527cd0>
---[ end trace 01097fb7f9cf1b6c ]---

Another hit:

page:ffffea0000520080 count:2 mapcount:0 mapping:ffff880072b38a51 index:0x600002602
flags: 0x4000000000048028(uptodate|lru|swapcache|swapbacked)
page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
page->mem_cgroup:ffff880077cf0c00
------------[ cut here ]------------
kernel BUG at mm/huge_memory.c:3306!
invalid opcode: 0000 [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 6 PID: 1419 Comm: madvise_test Not tainted 4.3.0-rc5-mm1-no-madv-free+ #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff88006f108000 ti: ffff88006f054000 task.ti: ffff88006f054000
RIP: 0010:[<ffffffff811473bf>]  [<ffffffff811473bf>] split_huge_page_to_list+0x81f/0x890
RSP: 0000:ffff88006f057a40  EFLAGS: 00010282
RAX: 0000000000000021 RBX: ffffea0000520080 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffffff821dd418
RBP: ffff88006f057ab8 R08: 0000000000000000 R09: ffff8800000bfb20
R10: ffffffff8163d1c0 R11: 0000000000005c5f R12: ffff88006f057ad0
R13: ffffea0000520080 R14: ffffea0000520080 R15: 0000000000000000
FS:  00007f09963a2740(0000) GS:ffff8800782c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000600003d92000 CR3: 000000007372e000 CR4: 00000000000006a0
Stack:
 ffffea0000520080 ffff88006f057ad0 ffffea0000520088 ffff88006f057ad0
 0000000000000000 ffff88006f057ab8 ffffffff810ec700 ffffea0000520000
 0000000000000000 0000000000000000 ffffea0000520080 ffff88006f057ad0
Call Trace:
 [<ffffffff810ec700>] ? __lock_page+0xa0/0xb0
 [<ffffffff81147545>] deferred_split_scan+0x115/0x240
 [<ffffffff8111445c>] ? list_lru_count_one+0x1c/0x30
 [<ffffffff810fdd63>] shrink_slab.part.43+0x1e3/0x350
 [<ffffffff81102788>] shrink_zone+0x238/0x250
 [<ffffffff811028cd>] do_try_to_free_pages+0x12d/0x3b0
 [<ffffffff81102c04>] try_to_free_pages+0xb4/0x140
 [<ffffffff810f57b9>] __alloc_pages_nodemask+0x459/0x920
 [<ffffffff8111aa2a>] handle_mm_fault+0xbca/0xf90
 [<ffffffff8105b8bc>] ? enqueue_task+0x3c/0x60
 [<ffffffff810602eb>] ? __set_cpus_allowed_ptr+0x9b/0x1a0
 [<ffffffff81032b49>] __do_page_fault+0x189/0x400
 [<ffffffff81032dcc>] do_page_fault+0xc/0x10
 [<ffffffff81421e02>] page_fault+0x22/0x30
Code: ff ff 48 c7 c6 d0 91 77 81 4c 89 f7 e8 1b d7 fc ff 0f 0b 48 83 e8 01 e9 70 f8 ff ff 48 c7 c6 50 80 77 81 4c 89 f7 e8 01 d7 fc ff <0f> 0b 48 c7 c6 d8 be 77 81 4c 89 ef e8 f0 d6 fc ff 0f 0b 48 83 
RIP  [<ffffffff811473bf>] split_huge_page_to_list+0x81f/0x890
 RSP <ffff88006f057a40>
---[ end trace 0ce8751b8410cd8e ]---


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-11-16  1:45                                     ` Minchan Kim
@ 2015-11-16  8:45                                       ` Kirill A. Shutemov
  2015-11-16 10:32                                         ` Minchan Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Kirill A. Shutemov @ 2015-11-16  8:45 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Hugh Dickins, Sasha Levin, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> During the test with MADV_FREE on kernel I applied your patches,
> I couldn't see any problem.
> 
> However, in this round, I did another test which is same one
> I attached but a liitle bit different because it doesn't do
> (memcg things/kill/swapoff) for testing program long-live test.

Could you share updated test?

And could you try to reproduce it on clean mmotm-2015-11-10-15-53?

> With that, I encountered this problem.
> 
> page:ffffea0000f60080 count:1 mapcount:0 mapping:ffff88007f584691 index:0x600002a02
> flags: 0x400000000006a028(uptodate|lru|writeback|swapcache|reclaim|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> page->mem_cgroup:ffff880077cf0c00
> ------------[ cut here ]------------
> kernel BUG at mm/huge_memory.c:3340!
> invalid opcode: 0000 [#1] SMP 
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 7 PID: 1657 Comm: memhog Not tainted 4.3.0-rc5-mm1-madv-free+ #4
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: ffff88006b0f1a40 ti: ffff88004ced4000 task.ti: ffff88004ced4000
> RIP: 0010:[<ffffffff8114bf67>]  [<ffffffff8114bf67>] split_huge_page_to_list+0x907/0x920
> RSP: 0018:ffff88004ced7a38  EFLAGS: 00010296
> RAX: 0000000000000021 RBX: ffffea0000f60080 RCX: ffffffff81830db8
> RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffffff821df4d8
> RBP: ffff88004ced7ab8 R08: 0000000000000000 R09: ffff8800000bc560
> R10: ffffffff8163d880 R11: 0000000000014f25 R12: ffffea0000f60080
> R13: ffffea0000f60088 R14: ffffea0000f60080 R15: 0000000000000000
> FS:  00007f43d3ced740(0000) GS:ffff8800782e0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ff1f6fcdb98 CR3: 000000004cf56000 CR4: 00000000000006a0
> Stack:
>  cccccccccccccccd ffffea0000f60080 ffff88004ced7ad0 ffffea0000f60088
>  ffff88004ced7ad0 0000000000000000 ffff88004ced7ab8 ffffffff810ef9d0
>  ffffea0000f60000 0000000000000000 0000000000000000 ffffea0000f60080
> Call Trace:
>  [<ffffffff810ef9d0>] ? __lock_page+0xa0/0xb0
>  [<ffffffff8114c09c>] deferred_split_scan+0x11c/0x260
>  [<ffffffff81117bfc>] ? list_lru_count_one+0x1c/0x30
>  [<ffffffff81101333>] shrink_slab.part.42+0x1e3/0x350
>  [<ffffffff81105daa>] shrink_zone+0x26a/0x280
>  [<ffffffff81105eed>] do_try_to_free_pages+0x12d/0x3b0
>  [<ffffffff81106224>] try_to_free_pages+0xb4/0x140
>  [<ffffffff810f8a59>] __alloc_pages_nodemask+0x459/0x920
>  [<ffffffff8111e667>] handle_mm_fault+0xc77/0x1000
>  [<ffffffff8142718d>] ? retint_kernel+0x10/0x10
>  [<ffffffff81033629>] __do_page_fault+0x189/0x400
>  [<ffffffff810338ac>] do_page_fault+0xc/0x10
>  [<ffffffff81428142>] page_fault+0x22/0x30
> Code: ff ff 48 c7 c6 f0 b2 77 81 4c 89 f7 e8 13 c3 fc ff 0f 0b 48 83 e8 01 e9 88 f7 ff ff 48 c7 c6 70 a1 77 81 4c 89 f7 e8 f9 c2 fc ff <0f> 0b 48 c7 c6 38 af 77 81 4c 89 e7 e8 e8 c2 fc ff 0f 0b 66 0f 
> RIP  [<ffffffff8114bf67>] split_huge_page_to_list+0x907/0x920
>  RSP <ffff88004ced7a38>
> ---[ end trace c9a60522e3a296e4 ]---

I don't see how it's possible: call lock_page() just before
split_huge_page() in deferred_split_scan().

> So, I reverted all MADV_FREE patches and chaged it with MADV_DONTNEED.
> In this time, I saw below oops in this time.
> If I miss somethings, please let me know it.
> 
> ------------[ cut here ]------------
> kernel BUG at include/linux/swapops.h:129!

Looks similar to what I fixed by inserting smp_wmb() just before
clear_compound_head() in __split_huge_page_tail().

Do you have this in place? Like in last -mm tree?

> Another hit:
> 
> page:ffffea0000520080 count:2 mapcount:0 mapping:ffff880072b38a51 index:0x600002602
> flags: 0x4000000000048028(uptodate|lru|swapcache|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> page->mem_cgroup:ffff880077cf0c00
> ------------[ cut here ]------------
> kernel BUG at mm/huge_memory.c:3306!

The same as the first one: no idea.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-11-16  8:45                                       ` Kirill A. Shutemov
@ 2015-11-16 10:32                                         ` Minchan Kim
  2015-11-16 10:54                                           ` Kirill A. Shutemov
  0 siblings, 1 reply; 33+ messages in thread
From: Minchan Kim @ 2015-11-16 10:32 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Sasha Levin, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > During the test with MADV_FREE on kernel I applied your patches,
> > I couldn't see any problem.
> > 
> > However, in this round, I did another test which is same one
> > I attached but a liitle bit different because it doesn't do
> > (memcg things/kill/swapoff) for testing program long-live test.
> 
> Could you share updated test?

It's part of my testing suite so I should factor it out.
I will send it when I go to office tomorrow.

> 
> And could you try to reproduce it on clean mmotm-2015-11-10-15-53?

Befor leaving office, I queued it up and result is below.
It seems you fixed already but didn't apply it to mmotm yet. Right?
Anyway, please confirm and say to me what I should add more patches
into mmotm-2015-11-10-15-53 for follow up your recent many bug
fix patches.

Thanks.

page:ffffea0000553fc0 count:3 mapcount:1 mapping:ffff88007f717a01 index:0x6000002ff
flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
page->mem_cgroup:ffff880077cf0c00
------------[ cut here ]------------
kernel BUG at mm/migrate.c:889!
invalid opcode: 0000 [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 10 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff880073441a40 ti: ffff88007344c000 task.ti: ffff88007344c000
RIP: 0010:[<ffffffff81145466>]  [<ffffffff81145466>] migrate_pages+0x8e6/0x950
RSP: 0018:ffff88007344fa00  EFLAGS: 00010282
RAX: 0000000000000021 RBX: ffffea0001a0bbc0 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffffff821df4d8
RBP: ffff88007344fa80 R08: 0000000000000000 R09: ffff8800000b9540
R10: ffffffff8163e2c0 R11: 00000000000002c2 R12: 0000000000000000
R13: ffffea0000553f80 R14: ffffea0000553fc0 R15: ffffffff8189db40
FS:  0000000000000000(0000) GS:ffff880078340000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f45cc0091d8 CR3: 000000007eba7000 CR4: 00000000000006a0
Stack:
 ffff880073441a40 0000000000000000 0000000000000000 0000000000000000
 ffffffff81114880 0000000000000000 ffffffff81116420 ffffea0000553fe0
 ffff88007344fb30 ffff88007344fb20 0000000000000000 ffff88007344fb20
Call Trace:
 [<ffffffff81114880>] ? trace_raw_output_mm_compaction_defer_template+0xc0/0xc0
 [<ffffffff81116420>] ? isolate_freepages_block+0x3d0/0x3d0
 [<ffffffff81116dfb>] compact_zone+0x2bb/0x720
 [<ffffffff8128793d>] ? list_del+0xd/0x30
 [<ffffffff811172cd>] compact_zone_order+0x6d/0xa0
 [<ffffffff8111751d>] try_to_compact_pages+0xed/0x200
 [<ffffffff81154143>] __alloc_pages_direct_compact+0x3b/0xd4
 [<ffffffff810f921b>] __alloc_pages_nodemask+0x3fb/0x920
 [<ffffffff81147465>] khugepaged+0x155/0x1b10
 [<ffffffff81073ca0>] ? prepare_to_wait_event+0xf0/0xf0
 [<ffffffff81147310>] ? __split_huge_pmd_locked+0x4e0/0x4e0
 [<ffffffff81057e49>] kthread+0xc9/0xe0
 [<ffffffff81057d80>] ? kthread_park+0x60/0x60
 [<ffffffff8142aa6f>] ret_from_fork+0x3f/0x70
 [<ffffffff81057d80>] ? kthread_park+0x60/0x60
Code: 44 c6 48 8b 40 08 83 e0 03 48 83 f8 03 0f 84 fd fa ff ff 4d 85 e4 0f 85 f4 fa ff ff 48 c7 c6 b8 f6 77 81 4c 89 f7 e8 fa 36 fd ff <0f> 0b 48 83 e8 01 e9 d0 fa ff ff f6 40 07 01 0f 84 5b fd ff ff 
RIP  [<ffffffff81145466>] migrate_pages+0x8e6/0x950
 RSP <ffff88007344fa00>
---[ end trace 337555313b7e45be ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-11-16 10:32                                         ` Minchan Kim
@ 2015-11-16 10:54                                           ` Kirill A. Shutemov
  2015-11-17  7:35                                             ` Minchan Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Kirill A. Shutemov @ 2015-11-16 10:54 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Hugh Dickins, Sasha Levin, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > During the test with MADV_FREE on kernel I applied your patches,
> > > I couldn't see any problem.
> > > 
> > > However, in this round, I did another test which is same one
> > > I attached but a liitle bit different because it doesn't do
> > > (memcg things/kill/swapoff) for testing program long-live test.
> > 
> > Could you share updated test?
> 
> It's part of my testing suite so I should factor it out.
> I will send it when I go to office tomorrow.

Thanks.

> > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> 
> Befor leaving office, I queued it up and result is below.
> It seems you fixed already but didn't apply it to mmotm yet. Right?
> Anyway, please confirm and say to me what I should add more patches
> into mmotm-2015-11-10-15-53 for follow up your recent many bug
> fix patches.

The two my patches which are not in the mmotm-2015-11-10-15-53 release:

http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shutemov@linux.intel.com
http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shutemov@linux.intel.com

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-11-16 10:54                                           ` Kirill A. Shutemov
@ 2015-11-17  7:35                                             ` Minchan Kim
  2015-11-17  9:32                                               ` Kirill A. Shutemov
  0 siblings, 1 reply; 33+ messages in thread
From: Minchan Kim @ 2015-11-17  7:35 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Sasha Levin, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > I couldn't see any problem.
> > > > 
> > > > However, in this round, I did another test which is same one
> > > > I attached but a liitle bit different because it doesn't do
> > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > 
> > > Could you share updated test?
> > 
> > It's part of my testing suite so I should factor it out.
> > I will send it when I go to office tomorrow.
> 
> Thanks.
> 
> > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > 
> > Befor leaving office, I queued it up and result is below.
> > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > Anyway, please confirm and say to me what I should add more patches
> > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > fix patches.
> 
> The two my patches which are not in the mmotm-2015-11-10-15-53 release:
> 
> http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shutemov@linux.intel.com
> http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shutemov@linux.intel.com

1. mm: fix __page_mapcount()
2. thp: fix leak due split_huge_page() vs. exit race

If I missed some patches, let me know it.

I applied above two patches based on mmotm-2015-11-10-15-53 and tested again.
But unfortunately, the result was below.

Now, I am making test program I can send to you but it seems to be not easy
because small changes for factoring it out from testing suite seems to change
something(ex, timing) and makes hard to reproduce. I will try it again.


page:ffffea0000240080 count:2 mapcount:1 mapping:ffff88007eff3321 index:0x600000e02
flags: 0x4000000000040018(uptodate|dirty|swapbacked)
page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
page->mem_cgroup:ffff880077cf0c00
------------[ cut here ]------------
kernel BUG at mm/huge_memory.c:3272!
invalid opcode: 0000 [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff880073441a40 ti: ffff88007344c000 task.ti: ffff88007344c000
RIP: 0010:[<ffffffff8114bc9b>]  [<ffffffff8114bc9b>] split_huge_page_to_list+0x8fb/0x910
RSP: 0018:ffff88007344f968  EFLAGS: 00010286
RAX: 0000000000000021 RBX: ffffea0000240080 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffffff821df4d8
RBP: ffff88007344f9e8 R08: 0000000000000000 R09: ffff8800000bc600
R10: ffffffff8163e2c0 R11: 0000000000004b47 R12: ffffea0000240080
R13: ffffea0000240088 R14: ffffea0000240080 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff880078300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007ffd59edcd68 CR3: 0000000001808000 CR4: 00000000000006a0
Stack:
 cccccccccccccccd ffffea0000240080 ffff88007344fa00 ffffea0000240088
 ffff88007344fa00 0000000000000000 ffff88007344f9e8 ffffffff810f0200
 ffffea0000240000 0000000000000000 0000000000000000 ffffea0000240080
Call Trace:
 [<ffffffff810f0200>] ? __lock_page+0xa0/0xb0
 [<ffffffff8114bdc5>] deferred_split_scan+0x115/0x240
 [<ffffffff8111851c>] ? list_lru_count_one+0x1c/0x30
 [<ffffffff811018d3>] shrink_slab.part.42+0x1e3/0x350
 [<ffffffff8110644a>] shrink_zone+0x26a/0x280
 [<ffffffff8110658d>] do_try_to_free_pages+0x12d/0x3b0
 [<ffffffff811068c4>] try_to_free_pages+0xb4/0x140
 [<ffffffff810f9279>] __alloc_pages_nodemask+0x459/0x920
 [<ffffffff8108d750>] ? trace_event_raw_event_tick_stop+0xd0/0xd0
 [<ffffffff81147465>] khugepaged+0x155/0x1b10
 [<ffffffff81073ca0>] ? prepare_to_wait_event+0xf0/0xf0
 [<ffffffff81147310>] ? __split_huge_pmd_locked+0x4e0/0x4e0
 [<ffffffff81057e49>] kthread+0xc9/0xe0
 [<ffffffff81057d80>] ? kthread_park+0x60/0x60
 [<ffffffff8142aa6f>] ret_from_fork+0x3f/0x70
 [<ffffffff81057d80>] ? kthread_park+0x60/0x60
Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 e9 94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 c6 48 c9 77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 
RIP  [<ffffffff8114bc9b>] split_huge_page_to_list+0x8fb/0x910
 RSP <ffff88007344f968>
---[ end trace 0ee39378e850d8de ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-11-17  7:35                                             ` Minchan Kim
@ 2015-11-17  9:32                                               ` Kirill A. Shutemov
  2015-11-19  2:12                                                 ` Minchan Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Kirill A. Shutemov @ 2015-11-17  9:32 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Hugh Dickins, Sasha Levin, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote:
> On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > > I couldn't see any problem.
> > > > > 
> > > > > However, in this round, I did another test which is same one
> > > > > I attached but a liitle bit different because it doesn't do
> > > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > > 
> > > > Could you share updated test?
> > > 
> > > It's part of my testing suite so I should factor it out.
> > > I will send it when I go to office tomorrow.
> > 
> > Thanks.
> > 
> > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > > 
> > > Befor leaving office, I queued it up and result is below.
> > > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > > Anyway, please confirm and say to me what I should add more patches
> > > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > > fix patches.
> > 
> > The two my patches which are not in the mmotm-2015-11-10-15-53 release:
> > 
> > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shutemov@linux.intel.com
> > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shutemov@linux.intel.com
> 
> 1. mm: fix __page_mapcount()
> 2. thp: fix leak due split_huge_page() vs. exit race
> 
> If I missed some patches, let me know it.
> 
> I applied above two patches based on mmotm-2015-11-10-15-53 and tested again.
> But unfortunately, the result was below.
> 
> Now, I am making test program I can send to you but it seems to be not easy
> because small changes for factoring it out from testing suite seems to change
> something(ex, timing) and makes hard to reproduce. I will try it again.

Your test suite seems generate quite a few bug reports. Don't mind make whole
suite public?
 
> page:ffffea0000240080 count:2 mapcount:1 mapping:ffff88007eff3321 index:0x600000e02
> flags: 0x4000000000040018(uptodate|dirty|swapbacked)
> page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> page->mem_cgroup:ffff880077cf0c00
> ------------[ cut here ]------------
> kernel BUG at mm/huge_memory.c:3272!
> invalid opcode: 0000 [#1] SMP 
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: ffff880073441a40 ti: ffff88007344c000 task.ti: ffff88007344c000
> RIP: 0010:[<ffffffff8114bc9b>]  [<ffffffff8114bc9b>] split_huge_page_to_list+0x8fb/0x910
> RSP: 0018:ffff88007344f968  EFLAGS: 00010286
> RAX: 0000000000000021 RBX: ffffea0000240080 RCX: 0000000000000000
> RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffffff821df4d8
> RBP: ffff88007344f9e8 R08: 0000000000000000 R09: ffff8800000bc600
> R10: ffffffff8163e2c0 R11: 0000000000004b47 R12: ffffea0000240080
> R13: ffffea0000240088 R14: ffffea0000240080 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff880078300000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007ffd59edcd68 CR3: 0000000001808000 CR4: 00000000000006a0
> Stack:
>  cccccccccccccccd ffffea0000240080 ffff88007344fa00 ffffea0000240088
>  ffff88007344fa00 0000000000000000 ffff88007344f9e8 ffffffff810f0200
>  ffffea0000240000 0000000000000000 0000000000000000 ffffea0000240080
> Call Trace:
>  [<ffffffff810f0200>] ? __lock_page+0xa0/0xb0
>  [<ffffffff8114bdc5>] deferred_split_scan+0x115/0x240
>  [<ffffffff8111851c>] ? list_lru_count_one+0x1c/0x30
>  [<ffffffff811018d3>] shrink_slab.part.42+0x1e3/0x350
>  [<ffffffff8110644a>] shrink_zone+0x26a/0x280
>  [<ffffffff8110658d>] do_try_to_free_pages+0x12d/0x3b0
>  [<ffffffff811068c4>] try_to_free_pages+0xb4/0x140
>  [<ffffffff810f9279>] __alloc_pages_nodemask+0x459/0x920
>  [<ffffffff8108d750>] ? trace_event_raw_event_tick_stop+0xd0/0xd0
>  [<ffffffff81147465>] khugepaged+0x155/0x1b10
>  [<ffffffff81073ca0>] ? prepare_to_wait_event+0xf0/0xf0
>  [<ffffffff81147310>] ? __split_huge_pmd_locked+0x4e0/0x4e0
>  [<ffffffff81057e49>] kthread+0xc9/0xe0
>  [<ffffffff81057d80>] ? kthread_park+0x60/0x60
>  [<ffffffff8142aa6f>] ret_from_fork+0x3f/0x70
>  [<ffffffff81057d80>] ? kthread_park+0x60/0x60
> Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 e9 94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 c6 48 c9 77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 
> RIP  [<ffffffff8114bc9b>] split_huge_page_to_list+0x8fb/0x910
>  RSP <ffff88007344f968>
> ---[ end trace 0ee39378e850d8de ]---
> Kernel panic - not syncing: Fatal exception
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Kernel Offset: disabled

I looked more into it. It seems a race between split_huge_page() and
deferred_split_scan() as the dumped page is not huge.

Could you check if the patch below makes any difference to the situation?

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 91e2f4b7ca39..923c0f6eb50a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3186,13 +3186,6 @@ static void __split_huge_page(struct page *page, struct list_head *list)
 	spin_lock_irq(&zone->lru_lock);
 	lruvec = mem_cgroup_page_lruvec(head, zone);
 
-	spin_lock(&split_queue_lock);
-	if (!list_empty(page_deferred_list(head))) {
-		split_queue_len--;
-		list_del(page_deferred_list(head));
-	}
-	spin_unlock(&split_queue_lock);
-
 	/* complete memcg works before add pages to LRU */
 	mem_cgroup_split_huge_fixup(head);
 
@@ -3299,12 +3292,20 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 	freeze_page(anon_vma, head);
 	VM_BUG_ON_PAGE(compound_mapcount(head), head);
 
+	/* Prevent deferred_split_scan() touching ->_count */
+	spin_lock(&split_queue_lock);
 	count = page_count(head);
 	mapcount = total_mapcount(head);
 	if (mapcount == count - 1) {
+		if (!list_empty(page_deferred_list(head))) {
+			split_queue_len--;
+			list_del(page_deferred_list(head));
+		}
+		spin_unlock(&split_queue_lock);
 		__split_huge_page(page, list);
 		ret = 0;
 	} else if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount > count - 1) {
+		spin_unlock(&split_queue_lock);
 		pr_alert("total_mapcount: %u, page_count(): %u\n",
 				mapcount, count);
 		if (PageTail(page))
@@ -3312,6 +3313,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 		dump_page(page, "total_mapcount(head) > page_count(head) - 1");
 		BUG();
 	} else {
+		spin_unlock(&split_queue_lock);
 		unfreeze_page(anon_vma, head);
 		ret = -EBUSY;
 	}
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-11-17  9:32                                               ` Kirill A. Shutemov
@ 2015-11-19  2:12                                                 ` Minchan Kim
  2015-11-19  6:58                                                   ` Kirill A. Shutemov
  0 siblings, 1 reply; 33+ messages in thread
From: Minchan Kim @ 2015-11-19  2:12 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Sasha Levin, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Tue, Nov 17, 2015 at 11:32:13AM +0200, Kirill A. Shutemov wrote:
> On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote:
> > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > > > I couldn't see any problem.
> > > > > > 
> > > > > > However, in this round, I did another test which is same one
> > > > > > I attached but a liitle bit different because it doesn't do
> > > > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > > > 
> > > > > Could you share updated test?
> > > > 
> > > > It's part of my testing suite so I should factor it out.
> > > > I will send it when I go to office tomorrow.
> > > 
> > > Thanks.
> > > 
> > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > > > 
> > > > Befor leaving office, I queued it up and result is below.
> > > > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > > > Anyway, please confirm and say to me what I should add more patches
> > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > > > fix patches.
> > > 
> > > The two my patches which are not in the mmotm-2015-11-10-15-53 release:
> > > 
> > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shutemov@linux.intel.com
> > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shutemov@linux.intel.com
> > 
> > 1. mm: fix __page_mapcount()
> > 2. thp: fix leak due split_huge_page() vs. exit race
> > 
> > If I missed some patches, let me know it.
> > 
> > I applied above two patches based on mmotm-2015-11-10-15-53 and tested again.
> > But unfortunately, the result was below.
> > 
> > Now, I am making test program I can send to you but it seems to be not easy
> > because small changes for factoring it out from testing suite seems to change
> > something(ex, timing) and makes hard to reproduce. I will try it again.
> 
> Your test suite seems generate quite a few bug reports. Don't mind make whole
> suite public?

It's tough due to including company internal stuffs.
That's why I try to factor the part I can share out but unfortunatel,
I couldn't grab a time for retrying until now. :(

>  
> > page:ffffea0000240080 count:2 mapcount:1 mapping:ffff88007eff3321 index:0x600000e02
> > flags: 0x4000000000040018(uptodate|dirty|swapbacked)
> > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> > page->mem_cgroup:ffff880077cf0c00
> > ------------[ cut here ]------------
> > kernel BUG at mm/huge_memory.c:3272!
> > invalid opcode: 0000 [#1] SMP 
> > Dumping ftrace buffer:
> >    (ftrace buffer empty)
> > Modules linked in:
> > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > task: ffff880073441a40 ti: ffff88007344c000 task.ti: ffff88007344c000
> > RIP: 0010:[<ffffffff8114bc9b>]  [<ffffffff8114bc9b>] split_huge_page_to_list+0x8fb/0x910
> > RSP: 0018:ffff88007344f968  EFLAGS: 00010286
> > RAX: 0000000000000021 RBX: ffffea0000240080 RCX: 0000000000000000
> > RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffffff821df4d8
> > RBP: ffff88007344f9e8 R08: 0000000000000000 R09: ffff8800000bc600
> > R10: ffffffff8163e2c0 R11: 0000000000004b47 R12: ffffea0000240080
> > R13: ffffea0000240088 R14: ffffea0000240080 R15: 0000000000000000
> > FS:  0000000000000000(0000) GS:ffff880078300000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 00007ffd59edcd68 CR3: 0000000001808000 CR4: 00000000000006a0
> > Stack:
> >  cccccccccccccccd ffffea0000240080 ffff88007344fa00 ffffea0000240088
> >  ffff88007344fa00 0000000000000000 ffff88007344f9e8 ffffffff810f0200
> >  ffffea0000240000 0000000000000000 0000000000000000 ffffea0000240080
> > Call Trace:
> >  [<ffffffff810f0200>] ? __lock_page+0xa0/0xb0
> >  [<ffffffff8114bdc5>] deferred_split_scan+0x115/0x240
> >  [<ffffffff8111851c>] ? list_lru_count_one+0x1c/0x30
> >  [<ffffffff811018d3>] shrink_slab.part.42+0x1e3/0x350
> >  [<ffffffff8110644a>] shrink_zone+0x26a/0x280
> >  [<ffffffff8110658d>] do_try_to_free_pages+0x12d/0x3b0
> >  [<ffffffff811068c4>] try_to_free_pages+0xb4/0x140
> >  [<ffffffff810f9279>] __alloc_pages_nodemask+0x459/0x920
> >  [<ffffffff8108d750>] ? trace_event_raw_event_tick_stop+0xd0/0xd0
> >  [<ffffffff81147465>] khugepaged+0x155/0x1b10
> >  [<ffffffff81073ca0>] ? prepare_to_wait_event+0xf0/0xf0
> >  [<ffffffff81147310>] ? __split_huge_pmd_locked+0x4e0/0x4e0
> >  [<ffffffff81057e49>] kthread+0xc9/0xe0
> >  [<ffffffff81057d80>] ? kthread_park+0x60/0x60
> >  [<ffffffff8142aa6f>] ret_from_fork+0x3f/0x70
> >  [<ffffffff81057d80>] ? kthread_park+0x60/0x60
> > Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 e9 94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 c6 48 c9 77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 
> > RIP  [<ffffffff8114bc9b>] split_huge_page_to_list+0x8fb/0x910
> >  RSP <ffff88007344f968>
> > ---[ end trace 0ee39378e850d8de ]---
> > Kernel panic - not syncing: Fatal exception
> > Dumping ftrace buffer:
> >    (ftrace buffer empty)
> > Kernel Offset: disabled
> 
> I looked more into it. It seems a race between split_huge_page() and
> deferred_split_scan() as the dumped page is not huge.
> 
> Could you check if the patch below makes any difference to the situation?
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 91e2f4b7ca39..923c0f6eb50a 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3186,13 +3186,6 @@ static void __split_huge_page(struct page *page, struct list_head *list)
>  	spin_lock_irq(&zone->lru_lock);
>  	lruvec = mem_cgroup_page_lruvec(head, zone);
>  
> -	spin_lock(&split_queue_lock);
> -	if (!list_empty(page_deferred_list(head))) {
> -		split_queue_len--;
> -		list_del(page_deferred_list(head));
> -	}
> -	spin_unlock(&split_queue_lock);
> -
>  	/* complete memcg works before add pages to LRU */
>  	mem_cgroup_split_huge_fixup(head);
>  
> @@ -3299,12 +3292,20 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>  	freeze_page(anon_vma, head);
>  	VM_BUG_ON_PAGE(compound_mapcount(head), head);
>  
> +	/* Prevent deferred_split_scan() touching ->_count */
> +	spin_lock(&split_queue_lock);
>  	count = page_count(head);
>  	mapcount = total_mapcount(head);
>  	if (mapcount == count - 1) {
> +		if (!list_empty(page_deferred_list(head))) {
> +			split_queue_len--;
> +			list_del(page_deferred_list(head));
> +		}
> +		spin_unlock(&split_queue_lock);
>  		__split_huge_page(page, list);
>  		ret = 0;
>  	} else if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount > count - 1) {
> +		spin_unlock(&split_queue_lock);
>  		pr_alert("total_mapcount: %u, page_count(): %u\n",
>  				mapcount, count);
>  		if (PageTail(page))
> @@ -3312,6 +3313,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>  		dump_page(page, "total_mapcount(head) > page_count(head) - 1");
>  		BUG();
>  	} else {
> +		spin_unlock(&split_queue_lock);
>  		unfreeze_page(anon_vma, head);
>  		ret = -EBUSY;
>  	}
> -- 
>  Kirill A. Shutemov
> 

It seems to solve that BUG_ON. One guest which doesn't include above fix hit
the BUG_ON within 10 hours. However, another machine with above fix works
during 1 day above without the BUG_ON but it introduces new problem.

        BUG: Bad rss-counter state mm:ffff88007f411c00 idx:0 val:-1
        BUG: Bad rss-counter state mm:ffff88007f411c00 idx:1 val:1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-11-19  2:12                                                 ` Minchan Kim
@ 2015-11-19  6:58                                                   ` Kirill A. Shutemov
  2015-11-19 10:10                                                     ` yalin wang
  2015-11-25  7:21                                                     ` Minchan Kim
  0 siblings, 2 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2015-11-19  6:58 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Hugh Dickins, Sasha Levin, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Thu, Nov 19, 2015 at 11:12:21AM +0900, Minchan Kim wrote:
> On Tue, Nov 17, 2015 at 11:32:13AM +0200, Kirill A. Shutemov wrote:
> > On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote:
> > > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> > > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > > > > I couldn't see any problem.
> > > > > > > 
> > > > > > > However, in this round, I did another test which is same one
> > > > > > > I attached but a liitle bit different because it doesn't do
> > > > > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > > > > 
> > > > > > Could you share updated test?
> > > > > 
> > > > > It's part of my testing suite so I should factor it out.
> > > > > I will send it when I go to office tomorrow.
> > > > 
> > > > Thanks.
> > > > 
> > > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > > > > 
> > > > > Befor leaving office, I queued it up and result is below.
> > > > > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > > > > Anyway, please confirm and say to me what I should add more patches
> > > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > > > > fix patches.
> > > > 
> > > > The two my patches which are not in the mmotm-2015-11-10-15-53 release:
> > > > 
> > > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shutemov@linux.intel.com
> > > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shutemov@linux.intel.com
> > > 
> > > 1. mm: fix __page_mapcount()
> > > 2. thp: fix leak due split_huge_page() vs. exit race
> > > 
> > > If I missed some patches, let me know it.
> > > 
> > > I applied above two patches based on mmotm-2015-11-10-15-53 and tested again.
> > > But unfortunately, the result was below.
> > > 
> > > Now, I am making test program I can send to you but it seems to be not easy
> > > because small changes for factoring it out from testing suite seems to change
> > > something(ex, timing) and makes hard to reproduce. I will try it again.
> > 
> > Your test suite seems generate quite a few bug reports. Don't mind make whole
> > suite public?
> 
> It's tough due to including company internal stuffs.
> That's why I try to factor the part I can share out but unfortunatel,
> I couldn't grab a time for retrying until now. :(
> 
> >  
> > > page:ffffea0000240080 count:2 mapcount:1 mapping:ffff88007eff3321 index:0x600000e02
> > > flags: 0x4000000000040018(uptodate|dirty|swapbacked)
> > > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> > > page->mem_cgroup:ffff880077cf0c00
> > > ------------[ cut here ]------------
> > > kernel BUG at mm/huge_memory.c:3272!
> > > invalid opcode: 0000 [#1] SMP 
> > > Dumping ftrace buffer:
> > >    (ftrace buffer empty)
> > > Modules linked in:
> > > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > task: ffff880073441a40 ti: ffff88007344c000 task.ti: ffff88007344c000
> > > RIP: 0010:[<ffffffff8114bc9b>]  [<ffffffff8114bc9b>] split_huge_page_to_list+0x8fb/0x910
> > > RSP: 0018:ffff88007344f968  EFLAGS: 00010286
> > > RAX: 0000000000000021 RBX: ffffea0000240080 RCX: 0000000000000000
> > > RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffffff821df4d8
> > > RBP: ffff88007344f9e8 R08: 0000000000000000 R09: ffff8800000bc600
> > > R10: ffffffff8163e2c0 R11: 0000000000004b47 R12: ffffea0000240080
> > > R13: ffffea0000240088 R14: ffffea0000240080 R15: 0000000000000000
> > > FS:  0000000000000000(0000) GS:ffff880078300000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > CR2: 00007ffd59edcd68 CR3: 0000000001808000 CR4: 00000000000006a0
> > > Stack:
> > >  cccccccccccccccd ffffea0000240080 ffff88007344fa00 ffffea0000240088
> > >  ffff88007344fa00 0000000000000000 ffff88007344f9e8 ffffffff810f0200
> > >  ffffea0000240000 0000000000000000 0000000000000000 ffffea0000240080
> > > Call Trace:
> > >  [<ffffffff810f0200>] ? __lock_page+0xa0/0xb0
> > >  [<ffffffff8114bdc5>] deferred_split_scan+0x115/0x240
> > >  [<ffffffff8111851c>] ? list_lru_count_one+0x1c/0x30
> > >  [<ffffffff811018d3>] shrink_slab.part.42+0x1e3/0x350
> > >  [<ffffffff8110644a>] shrink_zone+0x26a/0x280
> > >  [<ffffffff8110658d>] do_try_to_free_pages+0x12d/0x3b0
> > >  [<ffffffff811068c4>] try_to_free_pages+0xb4/0x140
> > >  [<ffffffff810f9279>] __alloc_pages_nodemask+0x459/0x920
> > >  [<ffffffff8108d750>] ? trace_event_raw_event_tick_stop+0xd0/0xd0
> > >  [<ffffffff81147465>] khugepaged+0x155/0x1b10
> > >  [<ffffffff81073ca0>] ? prepare_to_wait_event+0xf0/0xf0
> > >  [<ffffffff81147310>] ? __split_huge_pmd_locked+0x4e0/0x4e0
> > >  [<ffffffff81057e49>] kthread+0xc9/0xe0
> > >  [<ffffffff81057d80>] ? kthread_park+0x60/0x60
> > >  [<ffffffff8142aa6f>] ret_from_fork+0x3f/0x70
> > >  [<ffffffff81057d80>] ? kthread_park+0x60/0x60
> > > Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 e9 94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 c6 48 c9 77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 
> > > RIP  [<ffffffff8114bc9b>] split_huge_page_to_list+0x8fb/0x910
> > >  RSP <ffff88007344f968>
> > > ---[ end trace 0ee39378e850d8de ]---
> > > Kernel panic - not syncing: Fatal exception
> > > Dumping ftrace buffer:
> > >    (ftrace buffer empty)
> > > Kernel Offset: disabled
> > 
> > I looked more into it. It seems a race between split_huge_page() and
> > deferred_split_scan() as the dumped page is not huge.
> > 
> > Could you check if the patch below makes any difference to the situation?
> > 
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 91e2f4b7ca39..923c0f6eb50a 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -3186,13 +3186,6 @@ static void __split_huge_page(struct page *page, struct list_head *list)
> >  	spin_lock_irq(&zone->lru_lock);
> >  	lruvec = mem_cgroup_page_lruvec(head, zone);
> >  
> > -	spin_lock(&split_queue_lock);
> > -	if (!list_empty(page_deferred_list(head))) {
> > -		split_queue_len--;
> > -		list_del(page_deferred_list(head));
> > -	}
> > -	spin_unlock(&split_queue_lock);
> > -
> >  	/* complete memcg works before add pages to LRU */
> >  	mem_cgroup_split_huge_fixup(head);
> >  
> > @@ -3299,12 +3292,20 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
> >  	freeze_page(anon_vma, head);
> >  	VM_BUG_ON_PAGE(compound_mapcount(head), head);
> >  
> > +	/* Prevent deferred_split_scan() touching ->_count */
> > +	spin_lock(&split_queue_lock);
> >  	count = page_count(head);
> >  	mapcount = total_mapcount(head);
> >  	if (mapcount == count - 1) {
> > +		if (!list_empty(page_deferred_list(head))) {
> > +			split_queue_len--;
> > +			list_del(page_deferred_list(head));
> > +		}
> > +		spin_unlock(&split_queue_lock);
> >  		__split_huge_page(page, list);
> >  		ret = 0;
> >  	} else if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount > count - 1) {
> > +		spin_unlock(&split_queue_lock);
> >  		pr_alert("total_mapcount: %u, page_count(): %u\n",
> >  				mapcount, count);
> >  		if (PageTail(page))
> > @@ -3312,6 +3313,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
> >  		dump_page(page, "total_mapcount(head) > page_count(head) - 1");
> >  		BUG();
> >  	} else {
> > +		spin_unlock(&split_queue_lock);
> >  		unfreeze_page(anon_vma, head);
> >  		ret = -EBUSY;
> >  	}
> > -- 
> >  Kirill A. Shutemov
> > 
> 
> It seems to solve that BUG_ON. One guest which doesn't include above fix hit
> the BUG_ON within 10 hours. However, another machine with above fix works
> during 1 day above without the BUG_ON but it introduces new problem.
> 
>         BUG: Bad rss-counter state mm:ffff88007f411c00 idx:0 val:-1
>         BUG: Bad rss-counter state mm:ffff88007f411c00 idx:1 val:1

That's rather strange: looks like one file page was charged as anon or
one anon page was uncharged as file. Not sure yet how this can be caused
by my THP patchset :/

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-11-19  6:58                                                   ` Kirill A. Shutemov
@ 2015-11-19 10:10                                                     ` yalin wang
  2015-11-25  7:21                                                     ` Minchan Kim
  1 sibling, 0 replies; 33+ messages in thread
From: yalin wang @ 2015-11-19 10:10 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Minchan Kim, Hugh Dickins, Sasha Levin, Andrew Morton,
	open list:MEMORY MANAGEMENT, linux-kernel, Rik van Riel,
	Mel Gorman, Michal Hocko, Johannes Weiner, Vlastimil Babka


> On Nov 19, 2015, at 14:58, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> 
> uncharged
i also encounter this crash ,

also  i encounter a crash like this in qemu:


[    2.703436] [<ffffffc0001d4d2c>] do_execveat_common.isra.36+0x4f0/0x630
[    2.703624] [<ffffffc0001d4e90>] do_execve+0x24/0x30
[    2.703767] [<ffffffc0001d50e0>] SyS_execve+0x1c/0x2c
[    2.703923] BUG: Bad page map in process init  pte:6000004837ebd3 pmd:b29e7003
[    2.704140] page:ffffffc07f00af80 count:2 mapcount:-1 mapping:          (null) index:0x1
[    2.704414] flags: 0x400000000014(referenced|dirty)
[    2.704563] page dumped because: bad pte
[    2.704666] addr:0000007fafb7e000 vm_flags:00100073 anon_vma:ffffffc0729bdb90 mapping:          (null) index:7fafb7e
[    2.704906] file:          (null) fault:          (null) mmap:          (null) readpage:          (null)
[    2.705117] CPU: 0 PID: 84 Comm: init Tainted: G    B           4.2.0ajb-00005-g11a9bf3 #80
[    2.705315] Hardware name: ranchu (DT)
[    2.705408] Call trace:
[    2.705488] [<ffffffc000089ea0>] dump_backtrace+0x0/0x124
[    2.705657] [<ffffffc000089fd4>] show_stack+0x10/0x1c
[    2.705797] [<ffffffc0005f1df0>] dump_stack+0x78/0x98
[    2.705971] [<ffffffc00018a8d4>] print_bad_pte+0x154/0x1f0
[    2.706102] [<ffffffc00018c5f4>] unmap_single_vma+0x574/0x704
[    2.706236] [<ffffffc00018d0a4>] unmap_vmas+0x54/0x70
[    2.706354] [<ffffffc000195e70>] exit_mmap+0x88/0xfc
[    2.706473] [<ffffffc000097af4>] mmput+0x48/0xe8
[    2.706584] [<ffffffc0001d3b64>] flush_old_exec+0x30c/0x79c
[    2.706719] [<ffffffc000225fa4>] load_elf_binary+0x21c/0x1098
[    2.706856] [<ffffffc0001d4330>] search_binary_handler+0xa8/0x224
[    2.706995] [<ffffffc0001d4d2c>] do_execveat_common.isra.36+0x4f0/0x630
[    2.707144] [<ffffffc0001d4e90>] do_execve+0x24/0x30
[    2.707263] [<ffffffc0001d50e0>] SyS_execve+0x1c/0x2c
[    2.707392] BUG: Bad page map in process init  pte:6000004837fbd3 pmd:b29e7003
[    2.707752] page:ffffffc07f00afc0 count:2 mapcount:-1 mapping:          (null) index:0x1
[    2.708167] flags: 0x400000000014(referenced|dirty)
[    2.708333] page dumped because: bad pte
[    2.708501] addr:0000007fafb7f000 vm_flags:00100073 anon_vma:ffffffc0729bdb90 mapping:          (null) index:7fafb7f
[    2.709084] file:          (null) fault:          (null) mmap:          (null) readpage:          (null)
[    2.709306] CPU: 0 PID: 84 Comm: init Tainted: G    B           4.2.0ajb-00005-g11a9bf3 #80
[    2.709494] Hardware name: ranchu (DT)

seems the page map count is not correct ..
i build is based on mmotm-2015-10-21-14-41

Thanks




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kernel oops on mmotm-2015-10-15-15-20
  2015-11-19  6:58                                                   ` Kirill A. Shutemov
  2015-11-19 10:10                                                     ` yalin wang
@ 2015-11-25  7:21                                                     ` Minchan Kim
  1 sibling, 0 replies; 33+ messages in thread
From: Minchan Kim @ 2015-11-25  7:21 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Sasha Levin, Andrew Morton, linux-mm, linux-kernel,
	Rik van Riel, Mel Gorman, Michal Hocko, Johannes Weiner,
	Vlastimil Babka

On Thu, Nov 19, 2015 at 08:58:27AM +0200, Kirill A. Shutemov wrote:
> On Thu, Nov 19, 2015 at 11:12:21AM +0900, Minchan Kim wrote:
> > On Tue, Nov 17, 2015 at 11:32:13AM +0200, Kirill A. Shutemov wrote:
> > > On Tue, Nov 17, 2015 at 04:35:39PM +0900, Minchan Kim wrote:
> > > > On Mon, Nov 16, 2015 at 12:54:53PM +0200, Kirill A. Shutemov wrote:
> > > > > On Mon, Nov 16, 2015 at 07:32:20PM +0900, Minchan Kim wrote:
> > > > > > On Mon, Nov 16, 2015 at 10:45:22AM +0200, Kirill A. Shutemov wrote:
> > > > > > > On Mon, Nov 16, 2015 at 10:45:21AM +0900, Minchan Kim wrote:
> > > > > > > > During the test with MADV_FREE on kernel I applied your patches,
> > > > > > > > I couldn't see any problem.
> > > > > > > > 
> > > > > > > > However, in this round, I did another test which is same one
> > > > > > > > I attached but a liitle bit different because it doesn't do
> > > > > > > > (memcg things/kill/swapoff) for testing program long-live test.
> > > > > > > 
> > > > > > > Could you share updated test?
> > > > > > 
> > > > > > It's part of my testing suite so I should factor it out.
> > > > > > I will send it when I go to office tomorrow.
> > > > > 
> > > > > Thanks.
> > > > > 
> > > > > > > And could you try to reproduce it on clean mmotm-2015-11-10-15-53?
> > > > > > 
> > > > > > Befor leaving office, I queued it up and result is below.
> > > > > > It seems you fixed already but didn't apply it to mmotm yet. Right?
> > > > > > Anyway, please confirm and say to me what I should add more patches
> > > > > > into mmotm-2015-11-10-15-53 for follow up your recent many bug
> > > > > > fix patches.
> > > > > 
> > > > > The two my patches which are not in the mmotm-2015-11-10-15-53 release:
> > > > > 
> > > > > http://lkml.kernel.org/g/1447236557-68682-1-git-send-email-kirill.shutemov@linux.intel.com
> > > > > http://lkml.kernel.org/g/1447236567-68751-1-git-send-email-kirill.shutemov@linux.intel.com
> > > > 
> > > > 1. mm: fix __page_mapcount()
> > > > 2. thp: fix leak due split_huge_page() vs. exit race
> > > > 
> > > > If I missed some patches, let me know it.
> > > > 
> > > > I applied above two patches based on mmotm-2015-11-10-15-53 and tested again.
> > > > But unfortunately, the result was below.
> > > > 
> > > > Now, I am making test program I can send to you but it seems to be not easy
> > > > because small changes for factoring it out from testing suite seems to change
> > > > something(ex, timing) and makes hard to reproduce. I will try it again.
> > > 
> > > Your test suite seems generate quite a few bug reports. Don't mind make whole
> > > suite public?
> > 
> > It's tough due to including company internal stuffs.
> > That's why I try to factor the part I can share out but unfortunatel,
> > I couldn't grab a time for retrying until now. :(
> > 
> > >  
> > > > page:ffffea0000240080 count:2 mapcount:1 mapping:ffff88007eff3321 index:0x600000e02
> > > > flags: 0x4000000000040018(uptodate|dirty|swapbacked)
> > > > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> > > > page->mem_cgroup:ffff880077cf0c00
> > > > ------------[ cut here ]------------
> > > > kernel BUG at mm/huge_memory.c:3272!
> > > > invalid opcode: 0000 [#1] SMP 
> > > > Dumping ftrace buffer:
> > > >    (ftrace buffer empty)
> > > > Modules linked in:
> > > > CPU: 8 PID: 59 Comm: khugepaged Not tainted 4.3.0-mm1-kirill+ #8
> > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > > task: ffff880073441a40 ti: ffff88007344c000 task.ti: ffff88007344c000
> > > > RIP: 0010:[<ffffffff8114bc9b>]  [<ffffffff8114bc9b>] split_huge_page_to_list+0x8fb/0x910
> > > > RSP: 0018:ffff88007344f968  EFLAGS: 00010286
> > > > RAX: 0000000000000021 RBX: ffffea0000240080 RCX: 0000000000000000
> > > > RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffffff821df4d8
> > > > RBP: ffff88007344f9e8 R08: 0000000000000000 R09: ffff8800000bc600
> > > > R10: ffffffff8163e2c0 R11: 0000000000004b47 R12: ffffea0000240080
> > > > R13: ffffea0000240088 R14: ffffea0000240080 R15: 0000000000000000
> > > > FS:  0000000000000000(0000) GS:ffff880078300000(0000) knlGS:0000000000000000
> > > > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > > CR2: 00007ffd59edcd68 CR3: 0000000001808000 CR4: 00000000000006a0
> > > > Stack:
> > > >  cccccccccccccccd ffffea0000240080 ffff88007344fa00 ffffea0000240088
> > > >  ffff88007344fa00 0000000000000000 ffff88007344f9e8 ffffffff810f0200
> > > >  ffffea0000240000 0000000000000000 0000000000000000 ffffea0000240080
> > > > Call Trace:
> > > >  [<ffffffff810f0200>] ? __lock_page+0xa0/0xb0
> > > >  [<ffffffff8114bdc5>] deferred_split_scan+0x115/0x240
> > > >  [<ffffffff8111851c>] ? list_lru_count_one+0x1c/0x30
> > > >  [<ffffffff811018d3>] shrink_slab.part.42+0x1e3/0x350
> > > >  [<ffffffff8110644a>] shrink_zone+0x26a/0x280
> > > >  [<ffffffff8110658d>] do_try_to_free_pages+0x12d/0x3b0
> > > >  [<ffffffff811068c4>] try_to_free_pages+0xb4/0x140
> > > >  [<ffffffff810f9279>] __alloc_pages_nodemask+0x459/0x920
> > > >  [<ffffffff8108d750>] ? trace_event_raw_event_tick_stop+0xd0/0xd0
> > > >  [<ffffffff81147465>] khugepaged+0x155/0x1b10
> > > >  [<ffffffff81073ca0>] ? prepare_to_wait_event+0xf0/0xf0
> > > >  [<ffffffff81147310>] ? __split_huge_pmd_locked+0x4e0/0x4e0
> > > >  [<ffffffff81057e49>] kthread+0xc9/0xe0
> > > >  [<ffffffff81057d80>] ? kthread_park+0x60/0x60
> > > >  [<ffffffff8142aa6f>] ret_from_fork+0x3f/0x70
> > > >  [<ffffffff81057d80>] ? kthread_park+0x60/0x60
> > > > Code: ff ff 48 c7 c6 00 cd 77 81 4c 89 f7 e8 df ce fc ff 0f 0b 48 83 e8 01 e9 94 f7 ff ff 48 c7 c6 80 bb 77 81 4c 89 f7 e8 c5 ce fc ff <0f> 0b 48 c7 c6 48 c9 77 81 4c 89 e7 e8 b4 ce fc ff 0f 0b 66 90 
> > > > RIP  [<ffffffff8114bc9b>] split_huge_page_to_list+0x8fb/0x910
> > > >  RSP <ffff88007344f968>
> > > > ---[ end trace 0ee39378e850d8de ]---
> > > > Kernel panic - not syncing: Fatal exception
> > > > Dumping ftrace buffer:
> > > >    (ftrace buffer empty)
> > > > Kernel Offset: disabled
> > > 
> > > I looked more into it. It seems a race between split_huge_page() and
> > > deferred_split_scan() as the dumped page is not huge.
> > > 
> > > Could you check if the patch below makes any difference to the situation?
> > > 
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index 91e2f4b7ca39..923c0f6eb50a 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -3186,13 +3186,6 @@ static void __split_huge_page(struct page *page, struct list_head *list)
> > >  	spin_lock_irq(&zone->lru_lock);
> > >  	lruvec = mem_cgroup_page_lruvec(head, zone);
> > >  
> > > -	spin_lock(&split_queue_lock);
> > > -	if (!list_empty(page_deferred_list(head))) {
> > > -		split_queue_len--;
> > > -		list_del(page_deferred_list(head));
> > > -	}
> > > -	spin_unlock(&split_queue_lock);
> > > -
> > >  	/* complete memcg works before add pages to LRU */
> > >  	mem_cgroup_split_huge_fixup(head);
> > >  
> > > @@ -3299,12 +3292,20 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
> > >  	freeze_page(anon_vma, head);
> > >  	VM_BUG_ON_PAGE(compound_mapcount(head), head);
> > >  
> > > +	/* Prevent deferred_split_scan() touching ->_count */
> > > +	spin_lock(&split_queue_lock);
> > >  	count = page_count(head);
> > >  	mapcount = total_mapcount(head);
> > >  	if (mapcount == count - 1) {
> > > +		if (!list_empty(page_deferred_list(head))) {
> > > +			split_queue_len--;
> > > +			list_del(page_deferred_list(head));
> > > +		}
> > > +		spin_unlock(&split_queue_lock);
> > >  		__split_huge_page(page, list);
> > >  		ret = 0;
> > >  	} else if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount > count - 1) {
> > > +		spin_unlock(&split_queue_lock);
> > >  		pr_alert("total_mapcount: %u, page_count(): %u\n",
> > >  				mapcount, count);
> > >  		if (PageTail(page))
> > > @@ -3312,6 +3313,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
> > >  		dump_page(page, "total_mapcount(head) > page_count(head) - 1");
> > >  		BUG();
> > >  	} else {
> > > +		spin_unlock(&split_queue_lock);
> > >  		unfreeze_page(anon_vma, head);
> > >  		ret = -EBUSY;
> > >  	}
> > > -- 
> > >  Kirill A. Shutemov
> > > 
> > 
> > It seems to solve that BUG_ON. One guest which doesn't include above fix hit
> > the BUG_ON within 10 hours. However, another machine with above fix works
> > during 1 day above without the BUG_ON but it introduces new problem.
> > 
> >         BUG: Bad rss-counter state mm:ffff88007f411c00 idx:0 val:-1
> >         BUG: Bad rss-counter state mm:ffff88007f411c00 idx:1 val:1
> 
> That's rather strange: looks like one file page was charged as anon or
> one anon page was uncharged as file. Not sure yet how this can be caused
> by my THP patchset :/

I couldn't reproduce this problem in another test for a week and the test
doesn't have any problem until now.

Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2015-11-25  7:21 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-21  5:28 kernel oops on mmotm-2015-10-15-15-20 Minchan Kim
2015-10-21 11:07 ` Kirill A. Shutemov
2015-10-22  0:06   ` Minchan Kim
2015-10-22  0:59     ` Hugh Dickins
2015-10-22  1:21       ` Minchan Kim
2015-10-22  9:00         ` Minchan Kim
2015-10-29  0:25           ` Kirill A. Shutemov
2015-10-29  7:58             ` Minchan Kim
2015-10-29  9:43               ` Kirill A. Shutemov
2015-10-29  9:52               ` Kirill A. Shutemov
2015-10-30  7:03                 ` Minchan Kim
2015-11-02 12:57                   ` Kirill A. Shutemov
2015-11-03  3:02                     ` Minchan Kim
2015-11-03  7:16                       ` Kirill A. Shutemov
2015-11-03  7:33                         ` Minchan Kim
2015-11-03 15:20                           ` Minchan Kim
2015-11-04 14:21                             ` Kirill A. Shutemov
2015-11-05  0:19                               ` Minchan Kim
2015-11-08 22:55                                 ` Kirill A. Shutemov
2015-11-12  0:36                                   ` Minchan Kim
2015-11-16  1:45                                     ` Minchan Kim
2015-11-16  8:45                                       ` Kirill A. Shutemov
2015-11-16 10:32                                         ` Minchan Kim
2015-11-16 10:54                                           ` Kirill A. Shutemov
2015-11-17  7:35                                             ` Minchan Kim
2015-11-17  9:32                                               ` Kirill A. Shutemov
2015-11-19  2:12                                                 ` Minchan Kim
2015-11-19  6:58                                                   ` Kirill A. Shutemov
2015-11-19 10:10                                                     ` yalin wang
2015-11-25  7:21                                                     ` Minchan Kim
2015-10-22  2:15 ` Hugh Dickins
2015-10-22  4:25   ` Hugh Dickins
2015-10-22 22:26     ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).