From: Glauber Costa <glommer@gmail.com> To: Michal Hocko <mhocko@suse.cz> Cc: Dave Chinner <david@fromorbit.com>, Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org> Subject: Re: linux-next: slab shrinkers: BUG at mm/list_lru.c:92 Date: Fri, 28 Jun 2013 18:31:26 +0400 [thread overview] Message-ID: <20130628143124.GA6552@localhost.localdomain> (raw) In-Reply-To: <20130628083943.GA32747@dhcp22.suse.cz> On Fri, Jun 28, 2013 at 10:39:43AM +0200, Michal Hocko wrote: > I have just triggered this one. > > [37955.364062] RIP: 0010:[<ffffffff81127e5b>] [<ffffffff81127e5b>] list_lru_walk_node+0xab/0x140 > [37955.364062] RSP: 0000:ffff8800374af7b8 EFLAGS: 00010286 > [37955.364062] RAX: 0000000000000106 RBX: ffff88002ead7838 RCX: ffff8800374af830 Note ebx > [37955.364062] RDX: 0000000000000107 RSI: ffff88001d250dc0 RDI: ffff88002ead77d0 > [37955.364062] RBP: ffff8800374af818 R08: 0000000000000000 R09: ffff88001ffeafc0 > [37955.364062] R10: 0000000000000002 R11: 0000000000000000 R12: ffff88001d250dc0 > [37955.364062] R13: 00000000000000a0 R14: 000000572ead7838 R15: ffff88001d250dc8 Note r14 > [37955.364062] Process as (pid: 3351, threadinfo ffff8800374ae000, task ffff880036d665c0) > [37955.364062] Stack: > [37955.364062] ffff88001da3e700 ffff8800374af830 ffff8800374af838 ffffffff811846d0 > [37955.364062] 0000000000000000 ffff88001ce75c48 01ff8800374af838 ffff8800374af838 > [37955.364062] 0000000000000000 ffff88001ce75800 ffff8800374afa08 0000000000001014 > [37955.364062] Call Trace: > [37955.364062] [<ffffffff811846d0>] ? insert_inode_locked+0x160/0x160 > [37955.364062] [<ffffffff8118496c>] prune_icache_sb+0x3c/0x60 > [37955.364062] [<ffffffff8116dcbe>] super_cache_scan+0x12e/0x1b0 > [37955.364062] [<ffffffff8111354a>] shrink_slab_node+0x13a/0x250 > [37955.364062] [<ffffffff8111671b>] shrink_slab+0xab/0x120 > [37955.364062] [<ffffffff81117944>] do_try_to_free_pages+0x264/0x360 > [37955.364062] [<ffffffff81117d90>] try_to_free_pages+0x130/0x180 > [37955.364062] [<ffffffff81001974>] ? __switch_to+0x1b4/0x550 > [37955.364062] [<ffffffff8110a2fe>] __alloc_pages_slowpath+0x39e/0x790 > [37955.364062] [<ffffffff8110a8ea>] __alloc_pages_nodemask+0x1fa/0x210 > [37955.364062] [<ffffffff8114d1b0>] alloc_pages_vma+0xa0/0x120 > [37955.364062] [<ffffffff81129ebb>] do_anonymous_page+0x16b/0x350 > [37955.364062] [<ffffffff8112f9c5>] handle_pte_fault+0x235/0x240 > [37955.364062] [<ffffffff8107b8b0>] ? set_next_entity+0xb0/0xd0 > [37955.364062] [<ffffffff8112fcbf>] handle_mm_fault+0x2ef/0x400 > [37955.364062] [<ffffffff8157e927>] __do_page_fault+0x237/0x4f0 > [37955.364062] [<ffffffff8116a8a8>] ? fsnotify_access+0x68/0x80 > [37955.364062] [<ffffffff8116b0b8>] ? vfs_read+0xd8/0x130 > [37955.364062] [<ffffffff8157ebe9>] do_page_fault+0x9/0x10ffff88002ead7838 > [37955.364062] [<ffffffff8157b348>] page_fault+0x28/0x30 > [37955.364062] Code: 44 24 18 0f 84 87 00 00 00 49 83 7c 24 18 00 78 7b 49 83 c5 01 48 8b 4d a8 48 8b 11 48 8d 42 ff 48 85 d2 48 89 01 74 78 4d 39 f7 <49> 8b 06 4c 89 f3 74 6d 49 89 c6 eb a6 0f 1f 84 00 00 00 00 00 > [37955.364062] RIP [<ffffffff81127e5b>] list_lru_walk_node+0xab/0x140 > > ffffffff81127e0e: 48 8b 55 b0 mov -0x50(%rbp),%rdx > ffffffff81127e12: 4c 89 e6 mov %r12,%rsi > ffffffff81127e15: 48 89 df mov %rbx,%rdi > ffffffff81127e18: ff 55 b8 callq *-0x48(%rbp) # isolate(item, &nlru->lock, cb_arg) > ffffffff81127e1b: 83 f8 01 cmp $0x1,%eax > ffffffff81127e1e: 74 78 je ffffffff81127e98 <list_lru_walk_node+0xe8> > ffffffff81127e20: 73 4e jae ffffffff81127e70 <list_lru_walk_node+0xc0> > [...] One interesting thing I have noted here, is that r14 is basically the lower half of rbx, with the upper part borked. Because we are talking about a single word, this does not seem the usual update-half-of-double-word without locking issue. >From your excerpt, it is not totally clear what r14 is. But by looking at rdi which is 0xffff88002ead77d0 and very probable nlru->lock due to the calling convention, that would indicate that this is nlru->list in case you have spinlock debugging enabled. So yes, someone destroyed our next pointer, and amazingly only half of it. Still, the only time we ever release this lock is when isolate returns LRU_RETRY. Maybe the way we restart is wrong? (although I can't see how) An iput() happens outside the lock in that case, but it seems safe : if that ends up manipulating the lru it will do so through our accessors. I will have to think a bit more... Any other strange thing happening before it ?
WARNING: multiple messages have this Message-ID (diff)
From: Glauber Costa <glommer@gmail.com> To: Michal Hocko <mhocko@suse.cz> Cc: Dave Chinner <david@fromorbit.com>, Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org> Subject: Re: linux-next: slab shrinkers: BUG at mm/list_lru.c:92 Date: Fri, 28 Jun 2013 18:31:26 +0400 [thread overview] Message-ID: <20130628143124.GA6552@localhost.localdomain> (raw) In-Reply-To: <20130628083943.GA32747@dhcp22.suse.cz> On Fri, Jun 28, 2013 at 10:39:43AM +0200, Michal Hocko wrote: > I have just triggered this one. > > [37955.364062] RIP: 0010:[<ffffffff81127e5b>] [<ffffffff81127e5b>] list_lru_walk_node+0xab/0x140 > [37955.364062] RSP: 0000:ffff8800374af7b8 EFLAGS: 00010286 > [37955.364062] RAX: 0000000000000106 RBX: ffff88002ead7838 RCX: ffff8800374af830 Note ebx > [37955.364062] RDX: 0000000000000107 RSI: ffff88001d250dc0 RDI: ffff88002ead77d0 > [37955.364062] RBP: ffff8800374af818 R08: 0000000000000000 R09: ffff88001ffeafc0 > [37955.364062] R10: 0000000000000002 R11: 0000000000000000 R12: ffff88001d250dc0 > [37955.364062] R13: 00000000000000a0 R14: 000000572ead7838 R15: ffff88001d250dc8 Note r14 > [37955.364062] Process as (pid: 3351, threadinfo ffff8800374ae000, task ffff880036d665c0) > [37955.364062] Stack: > [37955.364062] ffff88001da3e700 ffff8800374af830 ffff8800374af838 ffffffff811846d0 > [37955.364062] 0000000000000000 ffff88001ce75c48 01ff8800374af838 ffff8800374af838 > [37955.364062] 0000000000000000 ffff88001ce75800 ffff8800374afa08 0000000000001014 > [37955.364062] Call Trace: > [37955.364062] [<ffffffff811846d0>] ? insert_inode_locked+0x160/0x160 > [37955.364062] [<ffffffff8118496c>] prune_icache_sb+0x3c/0x60 > [37955.364062] [<ffffffff8116dcbe>] super_cache_scan+0x12e/0x1b0 > [37955.364062] [<ffffffff8111354a>] shrink_slab_node+0x13a/0x250 > [37955.364062] [<ffffffff8111671b>] shrink_slab+0xab/0x120 > [37955.364062] [<ffffffff81117944>] do_try_to_free_pages+0x264/0x360 > [37955.364062] [<ffffffff81117d90>] try_to_free_pages+0x130/0x180 > [37955.364062] [<ffffffff81001974>] ? __switch_to+0x1b4/0x550 > [37955.364062] [<ffffffff8110a2fe>] __alloc_pages_slowpath+0x39e/0x790 > [37955.364062] [<ffffffff8110a8ea>] __alloc_pages_nodemask+0x1fa/0x210 > [37955.364062] [<ffffffff8114d1b0>] alloc_pages_vma+0xa0/0x120 > [37955.364062] [<ffffffff81129ebb>] do_anonymous_page+0x16b/0x350 > [37955.364062] [<ffffffff8112f9c5>] handle_pte_fault+0x235/0x240 > [37955.364062] [<ffffffff8107b8b0>] ? set_next_entity+0xb0/0xd0 > [37955.364062] [<ffffffff8112fcbf>] handle_mm_fault+0x2ef/0x400 > [37955.364062] [<ffffffff8157e927>] __do_page_fault+0x237/0x4f0 > [37955.364062] [<ffffffff8116a8a8>] ? fsnotify_access+0x68/0x80 > [37955.364062] [<ffffffff8116b0b8>] ? vfs_read+0xd8/0x130 > [37955.364062] [<ffffffff8157ebe9>] do_page_fault+0x9/0x10ffff88002ead7838 > [37955.364062] [<ffffffff8157b348>] page_fault+0x28/0x30 > [37955.364062] Code: 44 24 18 0f 84 87 00 00 00 49 83 7c 24 18 00 78 7b 49 83 c5 01 48 8b 4d a8 48 8b 11 48 8d 42 ff 48 85 d2 48 89 01 74 78 4d 39 f7 <49> 8b 06 4c 89 f3 74 6d 49 89 c6 eb a6 0f 1f 84 00 00 00 00 00 > [37955.364062] RIP [<ffffffff81127e5b>] list_lru_walk_node+0xab/0x140 > > ffffffff81127e0e: 48 8b 55 b0 mov -0x50(%rbp),%rdx > ffffffff81127e12: 4c 89 e6 mov %r12,%rsi > ffffffff81127e15: 48 89 df mov %rbx,%rdi > ffffffff81127e18: ff 55 b8 callq *-0x48(%rbp) # isolate(item, &nlru->lock, cb_arg) > ffffffff81127e1b: 83 f8 01 cmp $0x1,%eax > ffffffff81127e1e: 74 78 je ffffffff81127e98 <list_lru_walk_node+0xe8> > ffffffff81127e20: 73 4e jae ffffffff81127e70 <list_lru_walk_node+0xc0> > [...] One interesting thing I have noted here, is that r14 is basically the lower half of rbx, with the upper part borked. Because we are talking about a single word, this does not seem the usual update-half-of-double-word without locking issue. >From your excerpt, it is not totally clear what r14 is. But by looking at rdi which is 0xffff88002ead77d0 and very probable nlru->lock due to the calling convention, that would indicate that this is nlru->list in case you have spinlock debugging enabled. So yes, someone destroyed our next pointer, and amazingly only half of it. Still, the only time we ever release this lock is when isolate returns LRU_RETRY. Maybe the way we restart is wrong? (although I can't see how) An iput() happens outside the lock in that case, but it seems safe : if that ends up manipulating the lru it will do so through our accessors. I will have to think a bit more... Any other strange thing happening before it ? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-06-28 14:31 UTC|newest] Thread overview: 127+ messages / expand[flat|nested] mbox.gz Atom feed top 2013-06-17 14:18 linux-next: slab shrinkers: BUG at mm/list_lru.c:92 Michal Hocko 2013-06-17 14:18 ` Michal Hocko 2013-06-17 15:14 ` Glauber Costa 2013-06-17 15:14 ` Glauber Costa 2013-06-17 15:33 ` Michal Hocko 2013-06-17 15:33 ` Michal Hocko 2013-06-17 16:54 ` Glauber Costa 2013-06-17 16:54 ` Glauber Costa 2013-06-18 7:42 ` Michal Hocko 2013-06-18 7:42 ` Michal Hocko 2013-06-17 21:35 ` Andrew Morton 2013-06-17 21:35 ` Andrew Morton 2013-06-17 22:30 ` Glauber Costa 2013-06-18 2:46 ` Dave Chinner 2013-06-18 2:46 ` Dave Chinner 2013-06-18 6:31 ` Glauber Costa 2013-06-18 6:31 ` Glauber Costa 2013-06-18 8:24 ` Michal Hocko 2013-06-18 8:24 ` Michal Hocko 2013-06-18 10:44 ` Michal Hocko 2013-06-18 10:44 ` Michal Hocko 2013-06-18 13:50 ` Michal Hocko 2013-06-18 13:50 ` Michal Hocko 2013-06-25 2:27 ` Dave Chinner 2013-06-25 2:27 ` Dave Chinner 2013-06-26 8:15 ` Michal Hocko 2013-06-26 8:15 ` Michal Hocko 2013-06-26 23:24 ` Dave Chinner 2013-06-26 23:24 ` Dave Chinner 2013-06-27 14:54 ` Michal Hocko 2013-06-27 14:54 ` Michal Hocko 2013-06-28 8:39 ` Michal Hocko 2013-06-28 8:39 ` Michal Hocko 2013-06-28 14:31 ` Glauber Costa [this message] 2013-06-28 14:31 ` Glauber Costa 2013-06-28 15:12 ` Michal Hocko 2013-06-28 15:12 ` Michal Hocko 2013-06-29 2:55 ` Dave Chinner 2013-06-29 2:55 ` Dave Chinner 2013-06-30 18:33 ` Michal Hocko 2013-07-01 1:25 ` Dave Chinner 2013-07-01 1:25 ` Dave Chinner 2013-07-01 7:50 ` Michal Hocko 2013-07-01 7:50 ` Michal Hocko 2013-07-01 8:10 ` Dave Chinner 2013-07-01 8:10 ` Dave Chinner 2013-07-02 9:22 ` Michal Hocko 2013-07-02 12:19 ` Dave Chinner 2013-07-02 12:19 ` Dave Chinner 2013-07-02 12:44 ` Michal Hocko 2013-07-02 12:44 ` Michal Hocko 2013-07-03 11:24 ` Dave Chinner 2013-07-03 11:24 ` Dave Chinner 2013-07-03 14:08 ` Glauber Costa 2013-07-03 14:08 ` Glauber Costa 2013-07-04 16:36 ` Michal Hocko 2013-07-04 16:36 ` Michal Hocko 2013-07-08 12:53 ` Michal Hocko 2013-07-08 21:04 ` Andrew Morton 2013-07-08 21:04 ` Andrew Morton 2013-07-09 17:34 ` Glauber Costa 2013-07-09 17:34 ` Glauber Costa 2013-07-09 17:51 ` Andrew Morton 2013-07-09 17:51 ` Andrew Morton 2013-07-09 17:32 ` Glauber Costa 2013-07-09 17:32 ` Glauber Costa 2013-07-09 17:50 ` Andrew Morton 2013-07-09 17:50 ` Andrew Morton 2013-07-09 17:57 ` Glauber Costa 2013-07-09 17:57 ` Glauber Costa 2013-07-09 17:57 ` Michal Hocko 2013-07-09 17:57 ` Michal Hocko 2013-07-09 21:39 ` Andrew Morton 2013-07-09 21:39 ` Andrew Morton 2013-07-10 2:31 ` Dave Chinner 2013-07-10 2:31 ` Dave Chinner 2013-07-10 7:34 ` Michal Hocko 2013-07-10 7:34 ` Michal Hocko 2013-07-10 8:06 ` Michal Hocko 2013-07-10 8:06 ` Michal Hocko 2013-07-11 2:26 ` Dave Chinner 2013-07-11 2:26 ` Dave Chinner 2013-07-11 3:03 ` Andrew Morton 2013-07-11 3:03 ` Andrew Morton 2013-07-11 13:23 ` Michal Hocko 2013-07-11 13:23 ` Michal Hocko 2013-07-12 1:42 ` Hugh Dickins 2013-07-12 1:42 ` Hugh Dickins 2013-07-13 3:29 ` Dave Chinner 2013-07-13 3:29 ` Dave Chinner 2013-07-15 9:14 ` Michal Hocko 2013-07-15 9:14 ` Michal Hocko 2013-06-18 6:26 ` Glauber Costa 2013-06-18 8:25 ` Michal Hocko 2013-06-18 8:25 ` Michal Hocko 2013-06-19 7:13 ` Michal Hocko 2013-06-19 7:13 ` Michal Hocko 2013-06-19 7:35 ` Glauber Costa 2013-06-19 7:35 ` Glauber Costa 2013-06-19 8:52 ` Glauber Costa 2013-06-19 8:52 ` Glauber Costa 2013-06-19 13:57 ` Michal Hocko 2013-06-19 13:57 ` Michal Hocko 2013-06-19 14:02 ` Glauber Costa 2013-06-19 14:02 ` Glauber Costa 2013-06-19 14:28 ` Michal Hocko 2013-06-19 14:28 ` Michal Hocko 2013-06-20 14:11 ` Glauber Costa 2013-06-20 14:11 ` Glauber Costa 2013-06-20 15:12 ` Michal Hocko 2013-06-20 15:16 ` Michal Hocko 2013-06-20 15:16 ` Michal Hocko 2013-06-21 9:00 ` Michal Hocko 2013-06-21 9:00 ` Michal Hocko 2013-06-23 11:51 ` Glauber Costa 2013-06-23 11:51 ` Glauber Costa 2013-06-23 11:55 ` Glauber Costa 2013-06-25 2:29 ` Dave Chinner 2013-06-25 2:29 ` Dave Chinner 2013-06-26 8:22 ` Michal Hocko 2013-06-26 8:22 ` Michal Hocko 2013-06-18 8:19 ` Michal Hocko 2013-06-18 8:19 ` Michal Hocko 2013-06-18 8:21 ` Glauber Costa 2013-06-18 8:21 ` Glauber Costa 2013-06-18 8:26 ` Michal Hocko 2013-06-18 8:26 ` Michal Hocko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20130628143124.GA6552@localhost.localdomain \ --to=glommer@gmail.com \ --cc=akpm@linux-foundation.org \ --cc=david@fromorbit.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mhocko@suse.cz \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.