linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.cz>
To: Glauber Costa <glommer@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: linux-next: slab shrinkers: BUG at mm/list_lru.c:92
Date: Tue, 18 Jun 2013 10:24:14 +0200	[thread overview]
Message-ID: <20130618082414.GC13677@dhcp22.suse.cz> (raw)
In-Reply-To: <20130618063104.GB20528@localhost.localdomain>

On Tue 18-06-13 10:31:05, Glauber Costa wrote:
> On Tue, Jun 18, 2013 at 12:46:23PM +1000, Dave Chinner wrote:
> > On Tue, Jun 18, 2013 at 02:30:05AM +0400, Glauber Costa wrote:
> > > On Mon, Jun 17, 2013 at 02:35:08PM -0700, Andrew Morton wrote:
> > > > On Mon, 17 Jun 2013 19:14:12 +0400 Glauber Costa <glommer@gmail.com> wrote:
> > > > 
> > > > > > I managed to trigger:
> > > > > > [ 1015.776029] kernel BUG at mm/list_lru.c:92!
> > > > > > [ 1015.776029] invalid opcode: 0000 [#1] SMP
> > > > > > with Linux next (next-20130607) with https://lkml.org/lkml/2013/6/17/203
> > > > > > on top. 
> > > > > > 
> > > > > > This is obviously BUG_ON(nlru->nr_items < 0) and 
> > > > > > ffffffff81122d0b:       48 85 c0                test   %rax,%rax
> > > > > > ffffffff81122d0e:       49 89 44 24 18          mov    %rax,0x18(%r12)
> > > > > > ffffffff81122d13:       0f 84 87 00 00 00       je     ffffffff81122da0 <list_lru_walk_node+0x110>
> > > > > > ffffffff81122d19:       49 83 7c 24 18 00       cmpq   $0x0,0x18(%r12)
> > > > > > ffffffff81122d1f:       78 7b                   js     ffffffff81122d9c <list_lru_walk_node+0x10c>
> > > > > > [...]
> > > > > > ffffffff81122d9c:       0f 0b                   ud2
> > > > > > 
> > > > > > RAX is -1UL.
> > > > > Yes, fearing those kind of imbalances, we decided to leave the counter as a signed quantity
> > > > > and BUG, instead of an unsigned quantity.
> > > > > 
> > > > > > 
> > > > > > I assume that the current backtrace is of no use and it would most
> > > > > > probably be some shrinker which doesn't behave.
> > > > > > 
> > > > > There are currently 3 users of list_lru in tree: dentries, inodes and xfs.
> > > > > Assuming you are not using xfs, we are left with dentries and inodes.
> > > > > 
> > > > > The first thing to do is to find which one of them is misbehaving. You can try finding
> > > > > this out by the address of the list_lru, and where it lays in the superblock.
> > > > > 
> > > > > Once we know each of them is misbehaving, then we'll have to figure out why.
> > > > 
> > > > The trace says shrink_slab_node->super_cache_scan->prune_icache_sb.  So
> > > > it's inodes?
> > > > 
> > > Assuming there is no memory corruption of any sort going on , let's check the code.
> > > nr_item is only manipulated in 3 places:
> > > 
> > > 1) list_lru_add, where it is increased
> > > 2) list_lru_del, where it is decreased in case the user have voluntarily removed the
> > >    element from the list
> > > 3) list_lru_walk_node, where an element is removing during shrink.
> > > 
> > > All three excerpts seem to be correctly locked, so something like this indicates an imbalance.
> > 
> > inode_lru_isolate() looks suspicious to me:
> > 
> >         WARN_ON(inode->i_state & I_NEW);
> >         inode->i_state |= I_FREEING;
> >         spin_unlock(&inode->i_lock);
> > 
> >         list_move(&inode->i_lru, freeable);
> >         this_cpu_dec(nr_unused);
> > 	return LRU_REMOVED;
> > }
> > 
> > All the other cases where I_FREEING is set and the inode is removed
> > from the LRU are completely done under the inode->i_lock. i.e. from
> > an external POV, the state change to I_FREEING and removal from LRU
> > are supposed to be atomic, but they are not here.
> > 
> > I'm not sure this is the source of the problem, but it definitely
> > needs fixing.
> > 
> Yes, I missed that yesterday, but that does look suspicious to me as well.
> 
> Michal, if you can manually move this one inside the lock as well and see
> if it fixes your problem as well... Otherwise I can send you a patch as well
> so we don't get lost on what is patched and what is not.

OK, I am testing with this now:
diff --git a/fs/inode.c b/fs/inode.c
index 604c15e..95e598c 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -733,9 +733,9 @@ inode_lru_isolate(struct list_head *item, spinlock_t *lru_lock, void *arg)
 
 	WARN_ON(inode->i_state & I_NEW);
 	inode->i_state |= I_FREEING;
+	list_move(&inode->i_lru, freeable);
 	spin_unlock(&inode->i_lock);
 
-	list_move(&inode->i_lru, freeable);
 	this_cpu_dec(nr_unused);
 	return LRU_REMOVED;
 }

> Let us at least know if this is the problem.
> 
> > > callers:
> > > iput_final, evict_inodes, invalidate_inodes.
> > > Both evict_inodes and invalidate_inodes will do the following pattern:
> > > 
> > >                 inode->i_state |= I_FREEING;                                            
> > >                 inode_lru_list_del(inode);
> > >                 spin_unlock(&inode->i_lock);
> > >                 list_add(&inode->i_lru, &dispose);
> > > 
> > > IOW, they will remove the element from the LRU, and add it to the dispose list.
> > > Both of them will also bail out if they see I_FREEING already set, so they are safe
> > > against each other - because the flag is manipulated inside the lock.
> > > 
> > > But how about iput_final? It seems to me that if we are calling iput_final at the
> > > same time as the other two, this *could* happen (maybe there is some extra protection
> > > that can be seen from Australia but not from here. Dave?)
> > 
> > If I_FREEING is set before we enter iput_final(), then something
> > else is screwed up. I_FREEING is only set once the last reference
> > has gone away and we are killing the inode. All the other callers
> > that set I_FREEING check that the reference count on the inode is
> > zero before they set I_FREEING. Hence I_FREEING cannot be set on the
> > transition of i_count from 1 to 0 when iput_final() is called. So
> > the patch won't do anything to avoid the problem being seen.
> > 
> Yes, but isn't things like evict_inodes and invalidate_inodes called at
> umount time, for instance?

JFYI No unmount is going on in my test case.
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2013-06-18  8:24 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-17 14:18 linux-next: slab shrinkers: BUG at mm/list_lru.c:92 Michal Hocko
2013-06-17 15:14 ` Glauber Costa
2013-06-17 15:33   ` Michal Hocko
2013-06-17 16:54     ` Glauber Costa
2013-06-18  7:42       ` Michal Hocko
2013-06-17 21:35   ` Andrew Morton
2013-06-17 22:30     ` Glauber Costa
2013-06-18  2:46       ` Dave Chinner
2013-06-18  6:31         ` Glauber Costa
2013-06-18  8:24           ` Michal Hocko [this message]
2013-06-18 10:44             ` Michal Hocko
2013-06-18 13:50               ` Michal Hocko
2013-06-25  2:27                 ` Dave Chinner
2013-06-26  8:15                   ` Michal Hocko
2013-06-26 23:24                     ` Dave Chinner
2013-06-27 14:54                       ` Michal Hocko
2013-06-28  8:39                         ` Michal Hocko
2013-06-28 14:31                           ` Glauber Costa
2013-06-28 15:12                             ` Michal Hocko
2013-06-29  2:55                         ` Dave Chinner
2013-06-30 18:33                           ` Michal Hocko
2013-07-01  1:25                             ` Dave Chinner
2013-07-01  7:50                               ` Michal Hocko
2013-07-01  8:10                                 ` Dave Chinner
2013-07-02  9:22                                   ` Michal Hocko
2013-07-02 12:19                                     ` Dave Chinner
2013-07-02 12:44                                       ` Michal Hocko
2013-07-03 11:24                                         ` Dave Chinner
2013-07-03 14:08                                           ` Glauber Costa
2013-07-04 16:36                                           ` Michal Hocko
2013-07-08 12:53                                             ` Michal Hocko
2013-07-08 21:04                                               ` Andrew Morton
2013-07-09 17:34                                                 ` Glauber Costa
2013-07-09 17:51                                                   ` Andrew Morton
2013-07-09 17:32                                               ` Glauber Costa
2013-07-09 17:50                                                 ` Andrew Morton
2013-07-09 17:57                                                   ` Glauber Costa
2013-07-09 17:57                                                 ` Michal Hocko
2013-07-09 21:39                                                   ` Andrew Morton
2013-07-10  2:31                                               ` Dave Chinner
2013-07-10  7:34                                                 ` Michal Hocko
2013-07-10  8:06                                                 ` Michal Hocko
2013-07-11  2:26                                                   ` Dave Chinner
2013-07-11  3:03                                                     ` Andrew Morton
2013-07-11 13:23                                                     ` Michal Hocko
2013-07-12  1:42                                                       ` Hugh Dickins
2013-07-13  3:29                                                         ` Dave Chinner
2013-07-15  9:14                                             ` Michal Hocko
2013-06-18  6:26       ` Glauber Costa
2013-06-18  8:25         ` Michal Hocko
2013-06-19  7:13         ` Michal Hocko
2013-06-19  7:35           ` Glauber Costa
2013-06-19  8:52             ` Glauber Costa
2013-06-19 13:57             ` Michal Hocko
2013-06-19 14:02               ` Glauber Costa
2013-06-19 14:28           ` Michal Hocko
2013-06-20 14:11             ` Glauber Costa
2013-06-20 15:12               ` Michal Hocko
2013-06-20 15:16                 ` Michal Hocko
2013-06-21  9:00                 ` Michal Hocko
2013-06-23 11:51                   ` Glauber Costa
2013-06-23 11:55                     ` Glauber Costa
2013-06-25  2:29                     ` Dave Chinner
2013-06-26  8:22                     ` Michal Hocko
2013-06-18  8:19       ` Michal Hocko
2013-06-18  8:21         ` Glauber Costa
2013-06-18  8:26           ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130618082414.GC13677@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=glommer@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).