linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [BUG] kernel BUG at mm/memcontrol.c:1074!
  2012-01-19  5:10 [BUG] kernel BUG at mm/memcontrol.c:1074! Sasha Levin
@ 2012-01-19  3:23 ` KAMEZAWA Hiroyuki
  2012-01-19  3:41   ` Hugh Dickins
  2012-01-19  5:52 ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-01-19  3:23 UTC (permalink / raw)
  To: Sasha Levin
  Cc: hannes, mhocko, bsingharora, Dave Jones, linux-kernel, cgroups, linux-mm

On Thu, 19 Jan 2012 07:10:26 +0200
Sasha Levin <levinsasha928@gmail.com> wrote:

> Hi all,
> 
> During testing, I have triggered the OOM killer by mmap()ing a large block of memory. The OOM kicked in and tried to kill the process:
> 

two questions.

1. What is the kernel version  ?
2. are you using memcg moutned ?

Thanks,
-Kame

> [  526.657446] trinity invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
> [  526.659083] trinity cpuset=/ mems_allowed=0
> [  526.659854] Pid: 2200, comm: trinity Not tainted 3.2.0-next-20120119-sasha #128
> [  526.661203] Call Trace:
> [  526.661703]  [<ffffffff82583260>] ? _raw_spin_unlock+0x30/0x60
> [  526.662839]  [<ffffffff8116aefe>] dump_header+0x7e/0x330
> [  526.663841]  [<ffffffff82583303>] ? _raw_spin_unlock_irqrestore+0x73/0xa0
> [  526.665104]  [<ffffffff81835b20>] ? ___ratelimit+0xd0/0x180
> [  526.666149]  [<ffffffff8116b5cd>] oom_kill_process+0x7d/0x2d0
> [  526.667224]  [<ffffffff8116bcc0>] out_of_memory+0x1d0/0x400
> [  526.668237]  [<ffffffff81171011>] __alloc_pages_nodemask+0x8f1/0x910
> [  526.669388]  [<ffffffff811a8870>] alloc_pages_current+0xa0/0x110
> [  526.670486]  [<ffffffff8116713f>] __page_cache_alloc+0x8f/0xa0
> [  526.671610]  [<ffffffff81167f3a>] filemap_fault+0x34a/0x4e0
> [  526.672666]  [<ffffffff8118779f>] __do_fault+0x7f/0x5c0
> [  526.673665]  [<ffffffff810de041>] ? get_parent_ip+0x11/0x50
> [  526.674744]  [<ffffffff81053900>] ? native_sched_clock+0x60/0x90
> [  526.675868]  [<ffffffff8118a6e1>] handle_pte_fault+0xa1/0xa20
> [  526.676941]  [<ffffffff81107cfe>] ? put_lock_stats.clone.18+0xe/0x40
> [  526.678118]  [<ffffffff81108012>] ? lock_release_holdtime+0xb2/0x160
> [  526.679300]  [<ffffffff8118c7ae>] handle_mm_fault+0x1ce/0x330
> [  526.680405]  [<ffffffff8107d94d>] do_page_fault+0x15d/0x4d0
> [  526.681464]  [<ffffffff810aaf53>] ? do_fork+0x73/0x340
> [  526.682440]  [<ffffffff811ebff5>] ? vfsmount_lock_local_unlock+0x55/0x80
> [  526.683645]  [<ffffffff811ec988>] ? mntput_no_expire+0x38/0x100
> [  526.684709]  [<ffffffff811ed46e>] ? mntput+0x1e/0x30
> [  526.685605]  [<ffffffff811ce463>] ? fput+0x1b3/0x2b0
> [  526.686514]  [<ffffffff81076d11>] do_async_page_fault+0x31/0x90
> [  526.687573]  [<ffffffff825843d5>] async_page_fault+0x25/0x30
> [  526.688585] Mem-Info:
> [  526.689000] Node 0 DMA per-cpu:
> [  526.689605] CPU    0: hi:    0, btch:   1 usd:   0
> [  526.690484] Node 0 DMA32 per-cpu:
> [  526.691171] CPU    0: hi:   90, btch:  15 usd:   0
> [  526.692085] active_anon:1218 inactive_anon:12 isolated_anon:0
> [  526.692087]  active_file:1 inactive_file:6 isolated_file:0
> [  526.692087]  immediate:0 unevictable:48358 dirty:6 writeback:0 unstable:0
> [  526.692088]  free:864 slab_reclaimable:1696 slab_unreclaimable:3992
> [  526.692089]  mapped:5 shmem:2 pagetables:141 bounce:0
> [  526.697504] Node 0 DMA free:1300kB min:108kB low:132kB high:160kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB immediate:0kB unevictable:14568kB isolated(anon):0kB isolated(file):0kB present:15656kB mlocked:14576kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:32kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
> [  526.704557] lowmem_reserve[]: 0 299 299 299
> [  526.705458] Node 0 DMA32 free:2156kB min:2156kB low:2692kB high:3232kB active_anon:4872kB inactive_anon:48kB active_file:4kB inactive_file:24kB immediate:0kB unevictable:178864kB isolated(anon):0kB isolated(file):0kB present:306432kB mlocked:178880kB dirty:24kB writeback:0kB mapped:20kB shmem:8kB slab_reclaimable:6784kB slab_unreclaimable:15968kB kernel_stack:1376kB pagetables:532kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:44 all_unreclaimable? yes
> [  526.712825] lowmem_reserve[]: 0 0 0 0
> [  526.713633] Node 0 DMA: 1*4kB 1*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1308kB
> [  526.715878] Node 0 DMA32: 10*4kB 6*8kB 4*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2168kB
> [  526.718169] 10 total pagecache pages
> [  526.718820] 0 pages in swap cache
> [  526.719449] Swap cache stats: add 0, delete 0, find 0/0
> [  526.720392] Free swap  = 0kB
> [  526.720947] Total swap = 0kB
> [  526.722927] 81904 pages RAM
> [  526.723470] 14810 pages reserved
> [  526.724094] 558 pages shared
> [  526.724611] 65388 pages non-shared
> [  526.725239] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
> [  526.726586] [ 2193]     0  2193     4505       92   0       0             0 sh
> [  526.727884] [ 2200]     0  2200     3959      560   0       0             0 trinity
> [  526.729301] [ 2201]     0  2201     3959      561   0       0             0 trinity
> [  526.730804] [13370]     0 13370   528247    48921   0       0             0 trinity
> [  526.732207] Out of memory: Kill process 13370 (trinity) score 700 or sacrifice child
> [  526.733624] Killed process 13370 (trinity) total-vm:2112988kB, anon-rss:195680kB, file-rss:4kB
> 
> So far, everything went on as expected.
> 
> The problem is, that it looks like this has triggered a BUG() in the memory cgroup code:
> 
> [  526.737227] ------------[ cut here ]------------
> [  526.738032] 
> [  526.738032] invalid opcode: 0000 [#1] PREEMPT SMP 
> [  526.738032] CPU 0 
> [  526.738032] Pid: 1091, comm: kswapd0 Not tainted 3.2.0-next-20120119-sasha #128  
> [  526.738032] RIP: 0010:[<ffffffff811c4b4a>]  [<ffffffff811c4b4a>] mem_cgroup_lru_del_list+0xca/0xd0
> [  526.738032] RSP: 0018:ffff8800127139a0  EFLAGS: 00010046
> [  526.738032] RAX: 0000000000000001 RBX: ffffea0000358300 RCX: 0000000000000000
> [  526.738032] RDX: ffff880012c0b800 RSI: 0000000000000000 RDI: 0000000000000000
> [  526.738032] RBP: ffff8800127139b0 R08: ffff880012713ad0 R09: 0000000000000001
> [  526.738032] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002
> [  526.738032] R13: ffffea0000358300 R14: ffffea0000358320 R15: 0000000000000001
> [  526.738032] FS:  0000000000000000(0000) GS:ffff880013a00000(0000) knlGS:0000000000000000
> [  526.738032] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  526.738032] CR2: 00007fea7fa42e66 CR3: 000000000c42a000 CR4: 00000000000406f0
> [  526.738032] DR0: ffffffff810aaee0 DR1: 0000000000000000 DR2: 0000000000000000
> [  526.738032] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000600
> [  526.738032] Process kswapd0 (pid: 1091, threadinfo ffff880012712000, task ffff880012f7d840)
> [  526.738032] Stack:
> [  526.738032]  ffff880012c0b968 ffff880012c0b968 ffff8800127139c0 ffffffff811c4f0a
> [  526.738032]  ffff880012713a70 ffffffff81178c63 ffff8800127139e0 ffffea00000cbba0
> [  526.738032]  ffff880012713a40 ffff880012713b08 0000000000000001 ffffffffffffffff
> [  526.738032] Call Trace:
> [  526.738032]  [<ffffffff811c4f0a>] mem_cgroup_lru_del+0x3a/0x40
> [  526.738032]  [<ffffffff81178c63>] isolate_lru_pages+0xe3/0x330
> [  526.738032]  [<ffffffff8117a11e>] ? shrink_inactive_list+0xce/0x480
> [  526.738032]  [<ffffffff8117a153>] shrink_inactive_list+0x103/0x480
> [  526.738032]  [<ffffffff811c2a46>] ? mem_cgroup_iter+0x176/0x310
> [  526.738032]  [<ffffffff810e2c55>] ? sched_clock_local+0x25/0x90
> [  526.738032]  [<ffffffff8117ac04>] shrink_mem_cgroup_zone+0x3f4/0x580
> [  526.738032]  [<ffffffff81107cfe>] ? put_lock_stats.clone.18+0xe/0x40
> [  526.738032]  [<ffffffff8117adfe>] shrink_zone+0x6e/0xa0
> [  526.738032]  [<ffffffff8117be65>] balance_pgdat+0x545/0x750
> [  526.738032]  [<ffffffff810de1ed>] ? sub_preempt_count+0x9d/0xd0
> [  526.738032]  [<ffffffff8117c233>] kswapd+0x1c3/0x320
> [  526.738032]  [<ffffffff810cee30>] ? abort_exclusive_wait+0xb0/0xb0
> [  526.738032]  [<ffffffff8117c070>] ? balance_pgdat+0x750/0x750
> [  526.738032]  [<ffffffff810ce06e>] kthread+0xbe/0xd0
> [  526.738032]  [<ffffffff82585df4>] kernel_thread_helper+0x4/0x10
> [  526.738032]  [<ffffffff810d8c88>] ? finish_task_switch+0x78/0x100
> [  526.738032]  [<ffffffff825840f8>] ? retint_restore_args+0x13/0x13
> [  526.738032]  [<ffffffff810cdfb0>] ? kthread_flush_work_fn+0x10/0x10
> [  526.738032]  [<ffffffff82585df0>] ? gs_change+0x13/0x13
> [  526.738032] Code: 8b 1c 24 4c 8b 64 24 08 c9 c3 0f 1f 80 00 00 00 00 8b 4b 68 eb ba 0f 1f 00 0f b6 4b 68 bb 01 00 00 00 d3 e3 48 63 cb eb c2 0f 0b <0f> 0b 0f 1f 40 00 55 48 89 e5 48 83 ec 60 48 89 5d d8 4c 89 65 
> [  526.738032] RIP  [<ffffffff811c4b4a>] mem_cgroup_lru_del_list+0xca/0xd0
> [  526.738032]  RSP <ffff8800127139a0>
> [  526.738032] ---[ end trace 866f4f6c624b8d58 ]---
> 
> -- 
> 
> Sasha.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] kernel BUG at mm/memcontrol.c:1074!
  2012-01-19  3:23 ` KAMEZAWA Hiroyuki
@ 2012-01-19  3:41   ` Hugh Dickins
  2012-01-19  4:03     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 11+ messages in thread
From: Hugh Dickins @ 2012-01-19  3:41 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Sasha Levin, hannes, mhocko, bsingharora, Dave Jones,
	Andrew Morton, linux-kernel, cgroups, linux-mm

On Thu, 19 Jan 2012, KAMEZAWA Hiroyuki wrote:
> On Thu, 19 Jan 2012 07:10:26 +0200
> Sasha Levin <levinsasha928@gmail.com> wrote:
> 
> > Hi all,
> > 
> > During testing, I have triggered the OOM killer by mmap()ing a large block of memory. The OOM kicked in and tried to kill the process:
> > 
> 
> two questions.
> 
> 1. What is the kernel version  ?

It says 3.2.0-next-20120119-sasha #128

> 2. are you using memcg moutned ?

I notice that, unlike Linus's git, this linux-next still has
mm-isolate-pages-for-immediate-reclaim-on-their-own-lru.patch in.

I think that was well capable of oopsing in mem_cgroup_lru_del_list(),
since it didn't always know which lru a page belongs to.

I'm going to be optimistic and assume that was the cause.

Hugh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] kernel BUG at mm/memcontrol.c:1074!
  2012-01-19  3:41   ` Hugh Dickins
@ 2012-01-19  4:03     ` KAMEZAWA Hiroyuki
  2012-01-19  5:16       ` Hugh Dickins
  0 siblings, 1 reply; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-01-19  4:03 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Sasha Levin, hannes, mhocko, bsingharora, Dave Jones,
	Andrew Morton, linux-kernel, cgroups, linux-mm

On Wed, 18 Jan 2012 19:41:44 -0800 (PST)
Hugh Dickins <hughd@google.com> wrote:

> On Thu, 19 Jan 2012, KAMEZAWA Hiroyuki wrote:
> > On Thu, 19 Jan 2012 07:10:26 +0200
> > Sasha Levin <levinsasha928@gmail.com> wrote:
> > 
> > > Hi all,
> > > 
> > > During testing, I have triggered the OOM killer by mmap()ing a large block of memory. The OOM kicked in and tried to kill the process:
> > > 
> > 
> > two questions.
> > 
> > 1. What is the kernel version  ?
> 
> It says 3.2.0-next-20120119-sasha #128
> 
> > 2. are you using memcg moutned ?
> 
> I notice that, unlike Linus's git, this linux-next still has
> mm-isolate-pages-for-immediate-reclaim-on-their-own-lru.patch in.
> 
> I think that was well capable of oopsing in mem_cgroup_lru_del_list(),
> since it didn't always know which lru a page belongs to.
> 
> I'm going to be optimistic and assume that was the cause.
> 
Hmm, because the log hits !memcg at lru "del", the page should be added
to LRU somewhere and the lru must be determined by pc->mem_cgroup.

Once set, pc->mem_cgroup is not cleared, just overwritten. AFAIK, there is
only one chance to set pc->mem_cgroup as NULL... initalization.
I wonder why it hits lru_del() rather than lru_add()...
................

Ahhhh, ok, it seems you are right. the patch has following kinds of codes
==
+static void pagevec_putback_immediate_fn(struct page *page, void *arg)
+{
+       struct zone *zone = page_zone(page);
+
+       if (PageLRU(page)) {
+               enum lru_list lru = page_lru(page);
+               list_move(&page->lru, &zone->lru[lru].list);
+       }
+}
==
..this will bypass mem_cgroup_lru_add(), and we can see bug in lru_del()
rather than lru_add()..

Another question is who pushes pages to LRU before setting pc->mem_cgroup..
Anyway, I think we need to fix memcg to be LRU_IMMEDIATE aware.

Thanks,
-Kmae





^ permalink raw reply	[flat|nested] 11+ messages in thread

* [BUG] kernel BUG at mm/memcontrol.c:1074!
@ 2012-01-19  5:10 Sasha Levin
  2012-01-19  3:23 ` KAMEZAWA Hiroyuki
  2012-01-19  5:52 ` KAMEZAWA Hiroyuki
  0 siblings, 2 replies; 11+ messages in thread
From: Sasha Levin @ 2012-01-19  5:10 UTC (permalink / raw)
  To: hannes, mhocko, bsingharora, kamezawa.hiroyu, Dave Jones
  Cc: linux-kernel, cgroups, linux-mm

Hi all,

During testing, I have triggered the OOM killer by mmap()ing a large block of memory. The OOM kicked in and tried to kill the process:

[  526.657446] trinity invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
[  526.659083] trinity cpuset=/ mems_allowed=0
[  526.659854] Pid: 2200, comm: trinity Not tainted 3.2.0-next-20120119-sasha #128
[  526.661203] Call Trace:
[  526.661703]  [<ffffffff82583260>] ? _raw_spin_unlock+0x30/0x60
[  526.662839]  [<ffffffff8116aefe>] dump_header+0x7e/0x330
[  526.663841]  [<ffffffff82583303>] ? _raw_spin_unlock_irqrestore+0x73/0xa0
[  526.665104]  [<ffffffff81835b20>] ? ___ratelimit+0xd0/0x180
[  526.666149]  [<ffffffff8116b5cd>] oom_kill_process+0x7d/0x2d0
[  526.667224]  [<ffffffff8116bcc0>] out_of_memory+0x1d0/0x400
[  526.668237]  [<ffffffff81171011>] __alloc_pages_nodemask+0x8f1/0x910
[  526.669388]  [<ffffffff811a8870>] alloc_pages_current+0xa0/0x110
[  526.670486]  [<ffffffff8116713f>] __page_cache_alloc+0x8f/0xa0
[  526.671610]  [<ffffffff81167f3a>] filemap_fault+0x34a/0x4e0
[  526.672666]  [<ffffffff8118779f>] __do_fault+0x7f/0x5c0
[  526.673665]  [<ffffffff810de041>] ? get_parent_ip+0x11/0x50
[  526.674744]  [<ffffffff81053900>] ? native_sched_clock+0x60/0x90
[  526.675868]  [<ffffffff8118a6e1>] handle_pte_fault+0xa1/0xa20
[  526.676941]  [<ffffffff81107cfe>] ? put_lock_stats.clone.18+0xe/0x40
[  526.678118]  [<ffffffff81108012>] ? lock_release_holdtime+0xb2/0x160
[  526.679300]  [<ffffffff8118c7ae>] handle_mm_fault+0x1ce/0x330
[  526.680405]  [<ffffffff8107d94d>] do_page_fault+0x15d/0x4d0
[  526.681464]  [<ffffffff810aaf53>] ? do_fork+0x73/0x340
[  526.682440]  [<ffffffff811ebff5>] ? vfsmount_lock_local_unlock+0x55/0x80
[  526.683645]  [<ffffffff811ec988>] ? mntput_no_expire+0x38/0x100
[  526.684709]  [<ffffffff811ed46e>] ? mntput+0x1e/0x30
[  526.685605]  [<ffffffff811ce463>] ? fput+0x1b3/0x2b0
[  526.686514]  [<ffffffff81076d11>] do_async_page_fault+0x31/0x90
[  526.687573]  [<ffffffff825843d5>] async_page_fault+0x25/0x30
[  526.688585] Mem-Info:
[  526.689000] Node 0 DMA per-cpu:
[  526.689605] CPU    0: hi:    0, btch:   1 usd:   0
[  526.690484] Node 0 DMA32 per-cpu:
[  526.691171] CPU    0: hi:   90, btch:  15 usd:   0
[  526.692085] active_anon:1218 inactive_anon:12 isolated_anon:0
[  526.692087]  active_file:1 inactive_file:6 isolated_file:0
[  526.692087]  immediate:0 unevictable:48358 dirty:6 writeback:0 unstable:0
[  526.692088]  free:864 slab_reclaimable:1696 slab_unreclaimable:3992
[  526.692089]  mapped:5 shmem:2 pagetables:141 bounce:0
[  526.697504] Node 0 DMA free:1300kB min:108kB low:132kB high:160kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB immediate:0kB unevictable:14568kB isolated(anon):0kB isolated(file):0kB present:15656kB mlocked:14576kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:32kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[  526.704557] lowmem_reserve[]: 0 299 299 299
[  526.705458] Node 0 DMA32 free:2156kB min:2156kB low:2692kB high:3232kB active_anon:4872kB inactive_anon:48kB active_file:4kB inactive_file:24kB immediate:0kB unevictable:178864kB isolated(anon):0kB isolated(file):0kB present:306432kB mlocked:178880kB dirty:24kB writeback:0kB mapped:20kB shmem:8kB slab_reclaimable:6784kB slab_unreclaimable:15968kB kernel_stack:1376kB pagetables:532kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:44 all_unreclaimable? yes
[  526.712825] lowmem_reserve[]: 0 0 0 0
[  526.713633] Node 0 DMA: 1*4kB 1*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1308kB
[  526.715878] Node 0 DMA32: 10*4kB 6*8kB 4*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2168kB
[  526.718169] 10 total pagecache pages
[  526.718820] 0 pages in swap cache
[  526.719449] Swap cache stats: add 0, delete 0, find 0/0
[  526.720392] Free swap  = 0kB
[  526.720947] Total swap = 0kB
[  526.722927] 81904 pages RAM
[  526.723470] 14810 pages reserved
[  526.724094] 558 pages shared
[  526.724611] 65388 pages non-shared
[  526.725239] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
[  526.726586] [ 2193]     0  2193     4505       92   0       0             0 sh
[  526.727884] [ 2200]     0  2200     3959      560   0       0             0 trinity
[  526.729301] [ 2201]     0  2201     3959      561   0       0             0 trinity
[  526.730804] [13370]     0 13370   528247    48921   0       0             0 trinity
[  526.732207] Out of memory: Kill process 13370 (trinity) score 700 or sacrifice child
[  526.733624] Killed process 13370 (trinity) total-vm:2112988kB, anon-rss:195680kB, file-rss:4kB

So far, everything went on as expected.

The problem is, that it looks like this has triggered a BUG() in the memory cgroup code:

[  526.737227] ------------[ cut here ]------------
[  526.738032] 
[  526.738032] invalid opcode: 0000 [#1] PREEMPT SMP 
[  526.738032] CPU 0 
[  526.738032] Pid: 1091, comm: kswapd0 Not tainted 3.2.0-next-20120119-sasha #128  
[  526.738032] RIP: 0010:[<ffffffff811c4b4a>]  [<ffffffff811c4b4a>] mem_cgroup_lru_del_list+0xca/0xd0
[  526.738032] RSP: 0018:ffff8800127139a0  EFLAGS: 00010046
[  526.738032] RAX: 0000000000000001 RBX: ffffea0000358300 RCX: 0000000000000000
[  526.738032] RDX: ffff880012c0b800 RSI: 0000000000000000 RDI: 0000000000000000
[  526.738032] RBP: ffff8800127139b0 R08: ffff880012713ad0 R09: 0000000000000001
[  526.738032] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002
[  526.738032] R13: ffffea0000358300 R14: ffffea0000358320 R15: 0000000000000001
[  526.738032] FS:  0000000000000000(0000) GS:ffff880013a00000(0000) knlGS:0000000000000000
[  526.738032] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  526.738032] CR2: 00007fea7fa42e66 CR3: 000000000c42a000 CR4: 00000000000406f0
[  526.738032] DR0: ffffffff810aaee0 DR1: 0000000000000000 DR2: 0000000000000000
[  526.738032] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000600
[  526.738032] Process kswapd0 (pid: 1091, threadinfo ffff880012712000, task ffff880012f7d840)
[  526.738032] Stack:
[  526.738032]  ffff880012c0b968 ffff880012c0b968 ffff8800127139c0 ffffffff811c4f0a
[  526.738032]  ffff880012713a70 ffffffff81178c63 ffff8800127139e0 ffffea00000cbba0
[  526.738032]  ffff880012713a40 ffff880012713b08 0000000000000001 ffffffffffffffff
[  526.738032] Call Trace:
[  526.738032]  [<ffffffff811c4f0a>] mem_cgroup_lru_del+0x3a/0x40
[  526.738032]  [<ffffffff81178c63>] isolate_lru_pages+0xe3/0x330
[  526.738032]  [<ffffffff8117a11e>] ? shrink_inactive_list+0xce/0x480
[  526.738032]  [<ffffffff8117a153>] shrink_inactive_list+0x103/0x480
[  526.738032]  [<ffffffff811c2a46>] ? mem_cgroup_iter+0x176/0x310
[  526.738032]  [<ffffffff810e2c55>] ? sched_clock_local+0x25/0x90
[  526.738032]  [<ffffffff8117ac04>] shrink_mem_cgroup_zone+0x3f4/0x580
[  526.738032]  [<ffffffff81107cfe>] ? put_lock_stats.clone.18+0xe/0x40
[  526.738032]  [<ffffffff8117adfe>] shrink_zone+0x6e/0xa0
[  526.738032]  [<ffffffff8117be65>] balance_pgdat+0x545/0x750
[  526.738032]  [<ffffffff810de1ed>] ? sub_preempt_count+0x9d/0xd0
[  526.738032]  [<ffffffff8117c233>] kswapd+0x1c3/0x320
[  526.738032]  [<ffffffff810cee30>] ? abort_exclusive_wait+0xb0/0xb0
[  526.738032]  [<ffffffff8117c070>] ? balance_pgdat+0x750/0x750
[  526.738032]  [<ffffffff810ce06e>] kthread+0xbe/0xd0
[  526.738032]  [<ffffffff82585df4>] kernel_thread_helper+0x4/0x10
[  526.738032]  [<ffffffff810d8c88>] ? finish_task_switch+0x78/0x100
[  526.738032]  [<ffffffff825840f8>] ? retint_restore_args+0x13/0x13
[  526.738032]  [<ffffffff810cdfb0>] ? kthread_flush_work_fn+0x10/0x10
[  526.738032]  [<ffffffff82585df0>] ? gs_change+0x13/0x13
[  526.738032] Code: 8b 1c 24 4c 8b 64 24 08 c9 c3 0f 1f 80 00 00 00 00 8b 4b 68 eb ba 0f 1f 00 0f b6 4b 68 bb 01 00 00 00 d3 e3 48 63 cb eb c2 0f 0b <0f> 0b 0f 1f 40 00 55 48 89 e5 48 83 ec 60 48 89 5d d8 4c 89 65 
[  526.738032] RIP  [<ffffffff811c4b4a>] mem_cgroup_lru_del_list+0xca/0xd0
[  526.738032]  RSP <ffff8800127139a0>
[  526.738032] ---[ end trace 866f4f6c624b8d58 ]---

-- 

Sasha.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] kernel BUG at mm/memcontrol.c:1074!
  2012-01-19  4:03     ` KAMEZAWA Hiroyuki
@ 2012-01-19  5:16       ` Hugh Dickins
  2012-01-19  5:29         ` KAMEZAWA Hiroyuki
                           ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Hugh Dickins @ 2012-01-19  5:16 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Sasha Levin, hannes, mhocko, bsingharora, Dave Jones,
	Andrew Morton, Mel Gorman, linux-kernel, cgroups, linux-mm

On Thu, 19 Jan 2012, KAMEZAWA Hiroyuki wrote:
> On Wed, 18 Jan 2012 19:41:44 -0800 (PST)
> Hugh Dickins <hughd@google.com> wrote:
> > 
> > I notice that, unlike Linus's git, this linux-next still has
> > mm-isolate-pages-for-immediate-reclaim-on-their-own-lru.patch in.
> > 
> > I think that was well capable of oopsing in mem_cgroup_lru_del_list(),
> > since it didn't always know which lru a page belongs to.
> > 
> > I'm going to be optimistic and assume that was the cause.
> > 
> Hmm, because the log hits !memcg at lru "del", the page should be added
> to LRU somewhere and the lru must be determined by pc->mem_cgroup.
> 
> Once set, pc->mem_cgroup is not cleared, just overwritten. AFAIK, there is
> only one chance to set pc->mem_cgroup as NULL... initalization.
> I wonder why it hits lru_del() rather than lru_add()...
> ................
> 
> Ahhhh, ok, it seems you are right. the patch has following kinds of codes
> ==
> +static void pagevec_putback_immediate_fn(struct page *page, void *arg)
> +{
> +       struct zone *zone = page_zone(page);
> +
> +       if (PageLRU(page)) {
> +               enum lru_list lru = page_lru(page);
> +               list_move(&page->lru, &zone->lru[lru].list);
> +       }
> +}
> ==
> ..this will bypass mem_cgroup_lru_add(), and we can see bug in lru_del()
> rather than lru_add()..

I've not thought it through in detail (and your questioning reminds me
that the worst I saw from that patch was updating of the wrong counts,
leading to underflow, then livelock from the mismatch between empty list
and enormous count: I never saw an oops from it, and may be mistaken).

> 
> Another question is who pushes pages to LRU before setting pc->mem_cgroup..
> Anyway, I think we need to fix memcg to be LRU_IMMEDIATE aware.

I don't think so: Mel agreed that the patch could not go forward as is,
without an additional pageflag, and asked Andrew to drop it from mmotm
in mail on 29th December (I didn't notice an mm-commits message to say
akpm did drop it, and marc is blacked out in protest for today, so I
cannot check: but certainly akpm left it out of his push to Linus).

Oh, and Mel noticed another bug in it on the 30th, that the PageLRU
check in the function you quote above is wrong: see PATCH 11/11 thread.

Hugh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] kernel BUG at mm/memcontrol.c:1074!
  2012-01-19  5:16       ` Hugh Dickins
@ 2012-01-19  5:29         ` KAMEZAWA Hiroyuki
  2012-01-19  6:59           ` KAMEZAWA Hiroyuki
  2012-01-19 15:05         ` Sasha Levin
  2012-01-19 16:49         ` Mel Gorman
  2 siblings, 1 reply; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-01-19  5:29 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Sasha Levin, hannes, mhocko, bsingharora, Dave Jones,
	Andrew Morton, Mel Gorman, linux-kernel, cgroups, linux-mm

On Wed, 18 Jan 2012 21:16:09 -0800 (PST)
Hugh Dickins <hughd@google.com> wrote:

> On Thu, 19 Jan 2012, KAMEZAWA Hiroyuki wrote:
> > On Wed, 18 Jan 2012 19:41:44 -0800 (PST)
> > Hugh Dickins <hughd@google.com> wrote:
> > > 
> > > I notice that, unlike Linus's git, this linux-next still has
> > > mm-isolate-pages-for-immediate-reclaim-on-their-own-lru.patch in.
> > > 
> > > I think that was well capable of oopsing in mem_cgroup_lru_del_list(),
> > > since it didn't always know which lru a page belongs to.
> > > 
> > > I'm going to be optimistic and assume that was the cause.
> > > 
> > Hmm, because the log hits !memcg at lru "del", the page should be added
> > to LRU somewhere and the lru must be determined by pc->mem_cgroup.
> > 
> > Once set, pc->mem_cgroup is not cleared, just overwritten. AFAIK, there is
> > only one chance to set pc->mem_cgroup as NULL... initalization.
> > I wonder why it hits lru_del() rather than lru_add()...
> > ................
> > 
> > Ahhhh, ok, it seems you are right. the patch has following kinds of codes
> > ==
> > +static void pagevec_putback_immediate_fn(struct page *page, void *arg)
> > +{
> > +       struct zone *zone = page_zone(page);
> > +
> > +       if (PageLRU(page)) {
> > +               enum lru_list lru = page_lru(page);
> > +               list_move(&page->lru, &zone->lru[lru].list);
> > +       }
> > +}
> > ==
> > ..this will bypass mem_cgroup_lru_add(), and we can see bug in lru_del()
> > rather than lru_add()..
> 
> I've not thought it through in detail (and your questioning reminds me
> that the worst I saw from that patch was updating of the wrong counts,
> leading to underflow, then livelock from the mismatch between empty list
> and enormous count: I never saw an oops from it, and may be mistaken).
> 
> > 
> > Another question is who pushes pages to LRU before setting pc->mem_cgroup..
> > Anyway, I think we need to fix memcg to be LRU_IMMEDIATE aware.
> 
> I don't think so: Mel agreed that the patch could not go forward as is,
> without an additional pageflag, and asked Andrew to drop it from mmotm
> in mail on 29th December (I didn't notice an mm-commits message to say
> akpm did drop it, and marc is blacked out in protest for today, so I
> cannot check: but certainly akpm left it out of his push to Linus).
> 
> Oh, and Mel noticed another bug in it on the 30th, that the PageLRU
> check in the function you quote above is wrong: see PATCH 11/11 thread.

Sure.

Hm, what I need to find is a path which adds page to LRU bypassing memcg's check...

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] kernel BUG at mm/memcontrol.c:1074!
  2012-01-19  5:10 [BUG] kernel BUG at mm/memcontrol.c:1074! Sasha Levin
  2012-01-19  3:23 ` KAMEZAWA Hiroyuki
@ 2012-01-19  5:52 ` KAMEZAWA Hiroyuki
  2012-01-19  6:44   ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-01-19  5:52 UTC (permalink / raw)
  To: Sasha Levin
  Cc: hannes, mhocko, bsingharora, Dave Jones, linux-kernel, cgroups, linux-mm

On Thu, 19 Jan 2012 07:10:26 +0200
Sasha Levin <levinsasha928@gmail.com> wrote:

> The problem is, that it looks like this has triggered a BUG() in the memory cgroup code:
> 
> [  526.737227] ------------[ cut here ]------------
> [  526.738032] 
> [  526.738032] invalid opcode: 0000 [#1] PREEMPT SMP 
> [  526.738032] CPU 0 
> [  526.738032] Pid: 1091, comm: kswapd0 Not tainted 3.2.0-next-20120119-sasha #128  
> [  526.738032] RIP: 0010:[<ffffffff811c4b4a>]  [<ffffffff811c4b4a>] mem_cgroup_lru_del_list+0xca/0xd0
> [  526.738032] RSP: 0018:ffff8800127139a0  EFLAGS: 00010046
> [  526.738032] RAX: 0000000000000001 RBX: ffffea0000358300 RCX: 0000000000000000
> [  526.738032] RDX: ffff880012c0b800 RSI: 0000000000000000 RDI: 0000000000000000
> [  526.738032] RBP: ffff8800127139b0 R08: ffff880012713ad0 R09: 0000000000000001
> [  526.738032] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002
> [  526.738032] R13: ffffea0000358300 R14: ffffea0000358320 R15: 0000000000000001
> [  526.738032] FS:  0000000000000000(0000) GS:ffff880013a00000(0000) knlGS:0000000000000000
> [  526.738032] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  526.738032] CR2: 00007fea7fa42e66 CR3: 000000000c42a000 CR4: 00000000000406f0
> [  526.738032] DR0: ffffffff810aaee0 DR1: 0000000000000000 DR2: 0000000000000000
> [  526.738032] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000600
> [  526.738032] Process kswapd0 (pid: 1091, threadinfo ffff880012712000, task ffff880012f7d840)
> [  526.738032] Stack:
> [  526.738032]  ffff880012c0b968 ffff880012c0b968 ffff8800127139c0 ffffffff811c4f0a
> [  526.738032]  ffff880012713a70 ffffffff81178c63 ffff8800127139e0 ffffea00000cbba0
> [  526.738032]  ffff880012713a40 ffff880012713b08 0000000000000001 ffffffffffffffff
> [  526.738032] Call Trace:
> [  526.738032]  [<ffffffff811c4f0a>] mem_cgroup_lru_del+0x3a/0x40
> [  526.738032]  [<ffffffff81178c63>] isolate_lru_pages+0xe3/0x330
> [  526.738032]  [<ffffffff8117a11e>] ? shrink_inactive_list+0xce/0x480
> [  526.738032]  [<ffffffff8117a153>] shrink_inactive_list+0x103/0x480
> [  526.738032]  [<ffffffff811c2a46>] ? mem_cgroup_iter+0x176/0x310
> [  526.738032]  [<ffffffff810e2c55>] ? sched_clock_local+0x25/0x90
> [  526.738032]  [<ffffffff8117ac04>] shrink_mem_cgroup_zone+0x3f4/0x580
> [  526.738032]  [<ffffffff81107cfe>] ? put_lock_stats.clone.18+0xe/0x40
> [  526.738032]  [<ffffffff8117adfe>] shrink_zone+0x6e/0xa0
> [  526.738032]  [<ffffffff8117be65>] balance_pgdat+0x545/0x750
> [  526.738032]  [<ffffffff810de1ed>] ? sub_preempt_count+0x9d/0xd0
> [  526.738032]  [<ffffffff8117c233>] kswapd+0x1c3/0x320
> [  526.738032]  [<ffffffff810cee30>] ? abort_exclusive_wait+0xb0/0xb0
> [  526.738032]  [<ffffffff8117c070>] ? balance_pgdat+0x750/0x750
> [  526.738032]  [<ffffffff810ce06e>] kthread+0xbe/0xd0
> [  526.738032]  [<ffffffff82585df4>] kernel_thread_helper+0x4/0x10
> [  526.738032]  [<ffffffff810d8c88>] ? finish_task_switch+0x78/0x100
> [  526.738032]  [<ffffffff825840f8>] ? retint_restore_args+0x13/0x13
> [  526.738032]  [<ffffffff810cdfb0>] ? kthread_flush_work_fn+0x10/0x10
> [  526.738032]  [<ffffffff82585df0>] ? gs_change+0x13/0x13
> [  526.738032] Code: 8b 1c 24 4c 8b 64 24 08 c9 c3 0f 1f 80 00 00 00 00 8b 4b 68 eb ba 0f 1f 00 0f b6 4b 68 bb 01 00 00 00 d3 e3 48 63 cb eb c2 0f 0b <0f> 0b 0f 1f 40 00 55 48 89 e5 48 83 ec 60 48 89 5d d8 4c 89 65 
> [  526.738032] RIP  [<ffffffff811c4b4a>] mem_cgroup_lru_del_list+0xca/0xd0
> [  526.738032]  RSP <ffff8800127139a0>
> [  526.738032] ---[ end trace 866f4f6c624b8d58 ]---

my memo here.

1. This is caused by pc->mem_cgroup was NULL at mem_cgroup_lru_del().

2. IIUC, PageLRU(page) should be true to cause this BUG. Then,
   there is a page whose pc->mem_cgroup == NULL but PageLRU(page)==true.
   But, memcg's lru_add() routine accesses pc->mem_cgroup...so it should
   cause NULL pointer access if the page was added to LRU with pc->mem_cgroup is NULL.

   One possibility is that the page was PageLRU set but not added to memcg's LRU
   ... added to zone's LRU directly..
   Or PageLRU(page) was true but not added to any lru list without pc->mem_cgroup updates.

3. IIUC, There is no routine to set pc->mem_cgroup as NULL once page is used.
   But I need to check it....

Regards,
-Kame





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] kernel BUG at mm/memcontrol.c:1074!
  2012-01-19  5:52 ` KAMEZAWA Hiroyuki
@ 2012-01-19  6:44   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-01-19  6:44 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Sasha Levin, hannes, mhocko, bsingharora, Dave Jones,
	linux-kernel, cgroups, linux-mm

On Thu, 19 Jan 2012 14:52:50 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Thu, 19 Jan 2012 07:10:26 +0200
> Sasha Levin <levinsasha928@gmail.com> wrote:
> 
> > The problem is, that it looks like this has triggered a BUG() in the memory cgroup code:
> > 
> > [  526.737227] ------------[ cut here ]------------
> > [  526.738032] 
> > [  526.738032] invalid opcode: 0000 [#1] PREEMPT SMP 
> > [  526.738032] CPU 0 
> > [  526.738032] Pid: 1091, comm: kswapd0 Not tainted 3.2.0-next-20120119-sasha #128  
> > [  526.738032] RIP: 0010:[<ffffffff811c4b4a>]  [<ffffffff811c4b4a>] mem_cgroup_lru_del_list+0xca/0xd0
> > [  526.738032] RSP: 0018:ffff8800127139a0  EFLAGS: 00010046
> > [  526.738032] RAX: 0000000000000001 RBX: ffffea0000358300 RCX: 0000000000000000
> > [  526.738032] RDX: ffff880012c0b800 RSI: 0000000000000000 RDI: 0000000000000000
> > [  526.738032] RBP: ffff8800127139b0 R08: ffff880012713ad0 R09: 0000000000000001
> > [  526.738032] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002
> > [  526.738032] R13: ffffea0000358300 R14: ffffea0000358320 R15: 0000000000000001
> > [  526.738032] FS:  0000000000000000(0000) GS:ffff880013a00000(0000) knlGS:0000000000000000
> > [  526.738032] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [  526.738032] CR2: 00007fea7fa42e66 CR3: 000000000c42a000 CR4: 00000000000406f0
> > [  526.738032] DR0: ffffffff810aaee0 DR1: 0000000000000000 DR2: 0000000000000000
> > [  526.738032] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000600
> > [  526.738032] Process kswapd0 (pid: 1091, threadinfo ffff880012712000, task ffff880012f7d840)
> > [  526.738032] Stack:
> > [  526.738032]  ffff880012c0b968 ffff880012c0b968 ffff8800127139c0 ffffffff811c4f0a
> > [  526.738032]  ffff880012713a70 ffffffff81178c63 ffff8800127139e0 ffffea00000cbba0
> > [  526.738032]  ffff880012713a40 ffff880012713b08 0000000000000001 ffffffffffffffff
> > [  526.738032] Call Trace:
> > [  526.738032]  [<ffffffff811c4f0a>] mem_cgroup_lru_del+0x3a/0x40
> > [  526.738032]  [<ffffffff81178c63>] isolate_lru_pages+0xe3/0x330
> > [  526.738032]  [<ffffffff8117a11e>] ? shrink_inactive_list+0xce/0x480
> > [  526.738032]  [<ffffffff8117a153>] shrink_inactive_list+0x103/0x480
> > [  526.738032]  [<ffffffff811c2a46>] ? mem_cgroup_iter+0x176/0x310
> > [  526.738032]  [<ffffffff810e2c55>] ? sched_clock_local+0x25/0x90
> > [  526.738032]  [<ffffffff8117ac04>] shrink_mem_cgroup_zone+0x3f4/0x580
> > [  526.738032]  [<ffffffff81107cfe>] ? put_lock_stats.clone.18+0xe/0x40
> > [  526.738032]  [<ffffffff8117adfe>] shrink_zone+0x6e/0xa0
> > [  526.738032]  [<ffffffff8117be65>] balance_pgdat+0x545/0x750
> > [  526.738032]  [<ffffffff810de1ed>] ? sub_preempt_count+0x9d/0xd0
> > [  526.738032]  [<ffffffff8117c233>] kswapd+0x1c3/0x320
> > [  526.738032]  [<ffffffff810cee30>] ? abort_exclusive_wait+0xb0/0xb0
> > [  526.738032]  [<ffffffff8117c070>] ? balance_pgdat+0x750/0x750
> > [  526.738032]  [<ffffffff810ce06e>] kthread+0xbe/0xd0
> > [  526.738032]  [<ffffffff82585df4>] kernel_thread_helper+0x4/0x10
> > [  526.738032]  [<ffffffff810d8c88>] ? finish_task_switch+0x78/0x100
> > [  526.738032]  [<ffffffff825840f8>] ? retint_restore_args+0x13/0x13
> > [  526.738032]  [<ffffffff810cdfb0>] ? kthread_flush_work_fn+0x10/0x10
> > [  526.738032]  [<ffffffff82585df0>] ? gs_change+0x13/0x13
> > [  526.738032] Code: 8b 1c 24 4c 8b 64 24 08 c9 c3 0f 1f 80 00 00 00 00 8b 4b 68 eb ba 0f 1f 00 0f b6 4b 68 bb 01 00 00 00 d3 e3 48 63 cb eb c2 0f 0b <0f> 0b 0f 1f 40 00 55 48 89 e5 48 83 ec 60 48 89 5d d8 4c 89 65 
> > [  526.738032] RIP  [<ffffffff811c4b4a>] mem_cgroup_lru_del_list+0xca/0xd0
> > [  526.738032]  RSP <ffff8800127139a0>
> > [  526.738032] ---[ end trace 866f4f6c624b8d58 ]---
> 
> my memo here.
> 
> 1. This is caused by pc->mem_cgroup was NULL at mem_cgroup_lru_del().
> 
> 2. IIUC, PageLRU(page) should be true to cause this BUG. Then,
>    there is a page whose pc->mem_cgroup == NULL but PageLRU(page)==true.
>    But, memcg's lru_add() routine accesses pc->mem_cgroup...so it should
>    cause NULL pointer access if the page was added to LRU with pc->mem_cgroup is NULL.
> 
>    One possibility is that the page was PageLRU set but not added to memcg's LRU
>    ... added to zone's LRU directly..
>    Or PageLRU(page) was true but not added to any lru list without pc->mem_cgroup updates.
> 
> 3. IIUC, There is no routine to set pc->mem_cgroup as NULL once page is used.
>    But I need to check it....
> 
I'm very sorry ...I misunderstood from the beginning..
The BUG_ON() was
 VM_BUG_ON(MEM_CGROUP_ZSTAT(mz, lru) < (1 << compound_order(page)));

I didn't notice this one. sorry.

Then, as Hugh pointed out, that patch seems doubtful.
As Hugh said, the patch 6699ba077ebcdeb7bde7e2644a39b9e5bf6a7e8a will be
dropped and the issue will disappear (I hope.)

Thanks,
-Kame





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] kernel BUG at mm/memcontrol.c:1074!
  2012-01-19  5:29         ` KAMEZAWA Hiroyuki
@ 2012-01-19  6:59           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-01-19  6:59 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Hugh Dickins, Sasha Levin, hannes, mhocko, bsingharora,
	Dave Jones, Andrew Morton, Mel Gorman, linux-kernel, cgroups,
	linux-mm

On Thu, 19 Jan 2012 14:29:34 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Wed, 18 Jan 2012 21:16:09 -0800 (PST)
> Hugh Dickins <hughd@google.com> wrote:
> 
> > On Thu, 19 Jan 2012, KAMEZAWA Hiroyuki wrote:
> > > On Wed, 18 Jan 2012 19:41:44 -0800 (PST)
> > > Hugh Dickins <hughd@google.com> wrote:
> > > > 
> > > > I notice that, unlike Linus's git, this linux-next still has
> > > > mm-isolate-pages-for-immediate-reclaim-on-their-own-lru.patch in.
> > > > 
> > > > I think that was well capable of oopsing in mem_cgroup_lru_del_list(),
> > > > since it didn't always know which lru a page belongs to.
> > > > 
> > > > I'm going to be optimistic and assume that was the cause.
> > > > 
> > > Hmm, because the log hits !memcg at lru "del", the page should be added
> > > to LRU somewhere and the lru must be determined by pc->mem_cgroup.
> > > 
> > > Once set, pc->mem_cgroup is not cleared, just overwritten. AFAIK, there is
> > > only one chance to set pc->mem_cgroup as NULL... initalization.
> > > I wonder why it hits lru_del() rather than lru_add()...
> > > ................
> > > 
> > > Ahhhh, ok, it seems you are right. the patch has following kinds of codes
> > > ==
> > > +static void pagevec_putback_immediate_fn(struct page *page, void *arg)
> > > +{
> > > +       struct zone *zone = page_zone(page);
> > > +
> > > +       if (PageLRU(page)) {
> > > +               enum lru_list lru = page_lru(page);
> > > +               list_move(&page->lru, &zone->lru[lru].list);
> > > +       }
> > > +}
> > > ==
> > > ..this will bypass mem_cgroup_lru_add(), and we can see bug in lru_del()
> > > rather than lru_add()..
> > 
> > I've not thought it through in detail (and your questioning reminds me
> > that the worst I saw from that patch was updating of the wrong counts,
> > leading to underflow, then livelock from the mismatch between empty list
> > and enormous count: I never saw an oops from it, and may be mistaken).
> > 
> > > 
> > > Another question is who pushes pages to LRU before setting pc->mem_cgroup..
> > > Anyway, I think we need to fix memcg to be LRU_IMMEDIATE aware.
> > 
> > I don't think so: Mel agreed that the patch could not go forward as is,
> > without an additional pageflag, and asked Andrew to drop it from mmotm
> > in mail on 29th December (I didn't notice an mm-commits message to say
> > akpm did drop it, and marc is blacked out in protest for today, so I
> > cannot check: but certainly akpm left it out of his push to Linus).
> > 
> > Oh, and Mel noticed another bug in it on the 30th, that the PageLRU
> > check in the function you quote above is wrong: see PATCH 11/11 thread.
> 
> Sure.
> 
> Hm, what I need to find is a path which adds page to LRU bypassing memcg's check...
> 
Sorry, I misunderstand the problem at all.
Now, I think reverting the patch will help this case.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] kernel BUG at mm/memcontrol.c:1074!
  2012-01-19  5:16       ` Hugh Dickins
  2012-01-19  5:29         ` KAMEZAWA Hiroyuki
@ 2012-01-19 15:05         ` Sasha Levin
  2012-01-19 16:49         ` Mel Gorman
  2 siblings, 0 replies; 11+ messages in thread
From: Sasha Levin @ 2012-01-19 15:05 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: KAMEZAWA Hiroyuki, hannes, mhocko, bsingharora, Dave Jones,
	Andrew Morton, Mel Gorman, linux-kernel, cgroups, linux-mm

On Thu, Jan 19, 2012 at 12:16 AM, Hugh Dickins <hughd@google.com> wrote:
> On Thu, 19 Jan 2012, KAMEZAWA Hiroyuki wrote:
>> On Wed, 18 Jan 2012 19:41:44 -0800 (PST)
>> Hugh Dickins <hughd@google.com> wrote:
>> >
>> > I notice that, unlike Linus's git, this linux-next still has
>> > mm-isolate-pages-for-immediate-reclaim-on-their-own-lru.patch in.
>> >
>> > I think that was well capable of oopsing in mem_cgroup_lru_del_list(),
>> > since it didn't always know which lru a page belongs to.
>> >
>> > I'm going to be optimistic and assume that was the cause.
>> >
>> Hmm, because the log hits !memcg at lru "del", the page should be added
>> to LRU somewhere and the lru must be determined by pc->mem_cgroup.
>>
>> Once set, pc->mem_cgroup is not cleared, just overwritten. AFAIK, there is
>> only one chance to set pc->mem_cgroup as NULL... initalization.
>> I wonder why it hits lru_del() rather than lru_add()...
>> ................
>>
>> Ahhhh, ok, it seems you are right. the patch has following kinds of codes
>> ==
>> +static void pagevec_putback_immediate_fn(struct page *page, void *arg)
>> +{
>> +       struct zone *zone = page_zone(page);
>> +
>> +       if (PageLRU(page)) {
>> +               enum lru_list lru = page_lru(page);
>> +               list_move(&page->lru, &zone->lru[lru].list);
>> +       }
>> +}
>> ==
>> ..this will bypass mem_cgroup_lru_add(), and we can see bug in lru_del()
>> rather than lru_add()..
>
> I've not thought it through in detail (and your questioning reminds me
> that the worst I saw from that patch was updating of the wrong counts,
> leading to underflow, then livelock from the mismatch between empty list
> and enormous count: I never saw an oops from it, and may be mistaken).
>
>>
>> Another question is who pushes pages to LRU before setting pc->mem_cgroup..
>> Anyway, I think we need to fix memcg to be LRU_IMMEDIATE aware.
>
> I don't think so: Mel agreed that the patch could not go forward as is,
> without an additional pageflag, and asked Andrew to drop it from mmotm
> in mail on 29th December (I didn't notice an mm-commits message to say
> akpm did drop it, and marc is blacked out in protest for today, so I
> cannot check: but certainly akpm left it out of his push to Linus).
>
> Oh, and Mel noticed another bug in it on the 30th, that the PageLRU
> check in the function you quote above is wrong: see PATCH 11/11 thread.

So reverting this patch seems to indeed solve the issue (though
reverting wasn't clean - some minor conflicts in mm/swap.c).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] kernel BUG at mm/memcontrol.c:1074!
  2012-01-19  5:16       ` Hugh Dickins
  2012-01-19  5:29         ` KAMEZAWA Hiroyuki
  2012-01-19 15:05         ` Sasha Levin
@ 2012-01-19 16:49         ` Mel Gorman
  2 siblings, 0 replies; 11+ messages in thread
From: Mel Gorman @ 2012-01-19 16:49 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: KAMEZAWA Hiroyuki, Sasha Levin, hannes, mhocko, bsingharora,
	Dave Jones, Andrew Morton, linux-kernel, cgroups, linux-mm

On Wed, Jan 18, 2012 at 09:16:09PM -0800, Hugh Dickins wrote:
> > 
> > Another question is who pushes pages to LRU before setting pc->mem_cgroup..
> > Anyway, I think we need to fix memcg to be LRU_IMMEDIATE aware.
> 
> I don't think so: Mel agreed that the patch could not go forward as is,
> without an additional pageflag, and asked Andrew to drop it from mmotm
> in mail on 29th December (I didn't notice an mm-commits message to say
> akpm did drop it, and marc is blacked out in protest for today, so I
> cannot check: but certainly akpm left it out of his push to Linus).
> 
> Oh, and Mel noticed another bug in it on the 30th, that the PageLRU
> check in the function you quote above is wrong: see PATCH 11/11 thread.
> 

Yes, that patch is broken. According to the mm-commits list, it was
"withdrawn" on December 30th. I do not know why it is still in
linux-next but AFAIK, it is not expected to end up in mainline. I do not
have a fixed version of the patch at the moment.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-01-19 16:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-19  5:10 [BUG] kernel BUG at mm/memcontrol.c:1074! Sasha Levin
2012-01-19  3:23 ` KAMEZAWA Hiroyuki
2012-01-19  3:41   ` Hugh Dickins
2012-01-19  4:03     ` KAMEZAWA Hiroyuki
2012-01-19  5:16       ` Hugh Dickins
2012-01-19  5:29         ` KAMEZAWA Hiroyuki
2012-01-19  6:59           ` KAMEZAWA Hiroyuki
2012-01-19 15:05         ` Sasha Levin
2012-01-19 16:49         ` Mel Gorman
2012-01-19  5:52 ` KAMEZAWA Hiroyuki
2012-01-19  6:44   ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).