All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: linux-next: slab shrinkers: BUG at mm/list_lru.c:92
Date: Mon, 1 Jul 2013 18:10:56 +1000	[thread overview]
Message-ID: <20130701081056.GA4072@dastard> (raw)
In-Reply-To: <20130701075005.GA28765@dhcp22.suse.cz>

On Mon, Jul 01, 2013 at 09:50:05AM +0200, Michal Hocko wrote:
> On Mon 01-07-13 11:25:58, Dave Chinner wrote:
> > On Sun, Jun 30, 2013 at 08:33:49PM +0200, Michal Hocko wrote:
> > > On Sat 29-06-13 12:55:09, Dave Chinner wrote:
> > > > On Thu, Jun 27, 2013 at 04:54:11PM +0200, Michal Hocko wrote:
> > > > > On Thu 27-06-13 09:24:26, Dave Chinner wrote:
> > > > > > On Wed, Jun 26, 2013 at 10:15:09AM +0200, Michal Hocko wrote:
> > > > > > > On Tue 25-06-13 12:27:54, Dave Chinner wrote:
> > > > > > > > On Tue, Jun 18, 2013 at 03:50:25PM +0200, Michal Hocko wrote:
> > > > > > > > > And again, another hang. It looks like the inode deletion never
> > > > > > > > > finishes. The good thing is that I do not see any LRU related BUG_ONs
> > > > > > > > > anymore. I am going to test with the other patch in the thread.
> > > > > > > > > 
> > > > > > > > > 2476 [<ffffffff8118325e>] __wait_on_freeing_inode+0x9e/0xc0	<<< waiting for an inode to go away
> > > > > > > > > [<ffffffff81183321>] find_inode_fast+0xa1/0xc0
> > > > > > > > > [<ffffffff8118525f>] iget_locked+0x4f/0x180
> > > > > > > > > [<ffffffff811ef9e3>] ext4_iget+0x33/0x9f0
> > > > > > > > > [<ffffffff811f6a1c>] ext4_lookup+0xbc/0x160
> > > > > > > > > [<ffffffff81174ad0>] lookup_real+0x20/0x60
> > > > > > > > > [<ffffffff81177e25>] lookup_open+0x175/0x1d0
> > > > > > > > > [<ffffffff8117815e>] do_last+0x2de/0x780			<<< holds i_mutex
> > > > > > > > > [<ffffffff8117ae9a>] path_openat+0xda/0x400
> > > > > > > > > [<ffffffff8117b303>] do_filp_open+0x43/0xa0
> > > > > > > > > [<ffffffff81168ee0>] do_sys_open+0x160/0x1e0
> > > > > > > > > [<ffffffff81168f9c>] sys_open+0x1c/0x20
> > > > > > > > > [<ffffffff81582fe9>] system_call_fastpath+0x16/0x1b
> > > > > > > > > [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> > .....
> > > Do you mean sysrq+t? It is attached. 
> > > 
> > > Btw. I was able to reproduce this again. The stuck processes were
> > > sitting in the same traces for more than 28 hours without any change so
> > > I do not think this is a temporal condition.
> > > 
> > > Traces of all processes in the D state:
> > > 7561 [<ffffffffa029c03e>] xfs_iget+0xbe/0x190 [xfs]
> > > [<ffffffffa02a8e98>] xfs_lookup+0xe8/0x110 [xfs]
> > > [<ffffffffa029fad9>] xfs_vn_lookup+0x49/0x90 [xfs]
> > > [<ffffffff81174ad0>] lookup_real+0x20/0x60
> > > [<ffffffff81177e25>] lookup_open+0x175/0x1d0
> > > [<ffffffff8117815e>] do_last+0x2de/0x780
> > > [<ffffffff8117ae9a>] path_openat+0xda/0x400
> > > [<ffffffff8117b303>] do_filp_open+0x43/0xa0
> > > [<ffffffff81168ee0>] do_sys_open+0x160/0x1e0
> > > [<ffffffff81168f9c>] sys_open+0x1c/0x20
> > > [<ffffffff815830e9>] system_call_fastpath+0x16/0x1b
> > > [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> > This looks like it may be equivalent to the ext4 trace above, though
> > I'm not totally sure on that yet. Can you get me the line of code
> > where the above code is sleeping - 'gdb> l *(xfs_iget+0xbe)' output
> > is sufficient.
> 
> OK, this is a bit tricky because I have xfs built as a module so objdump
> on xfs.ko shows nonsense
>    19039:       e8 00 00 00 00          callq  1903e <xfs_iget+0xbe>
>    1903e:       48 8b 75 c0             mov    -0x40(%rbp),%rsi
> 
> crash was more clever though and it says:
> 0xffffffffa029c034 <xfs_iget+180>:      mov    $0x1,%edi
> 0xffffffffa029c039 <xfs_iget+185>:      callq  0xffffffff815776d0
> <schedule_timeout_uninterruptible>
> /dev/shm/mhocko-build/BUILD/kernel-3.9.0mmotm+/fs/xfs/xfs_icache.c: 423
> 0xffffffffa029c03e <xfs_iget+190>:      mov    -0x40(%rbp),%rsi
> 
> which maps to:
> out_error_or_again:
>         if (error == EAGAIN) {
>                 delay(1);
>                 goto again;
>         }
> 
> So this looks like this path loops in goto again and out_error_or_again.

Yup, that's what I suspected.

> > If it's where I suspect it is, we are hitting a VFS inode that
> > igrab() is failing on because I_FREEING is set and that is returning
> > EAGAIN. Hence xfs_iget() sleeps for a short period and retries the
> > lookup. If you've still got a system in this state, can you dump the
> > xfs stats a few times about 5s apart i.e.
> > 
> > $ for i in `seq 0 1 5`; do echo ; date; cat /proc/fs/xfs/stat ; sleep 5 ; done
> > 
> > Depending on what stat is changing (i'm looking for skip vs recycle
> > in the inode cache stats), that will tell us why the lookup is
> > failing...
> 
> $ for i in `seq 0 1 5`; do echo ; date; cat /proc/fs/xfs/stat ; sleep 5 ; done
> 
> Mon Jul  1 09:29:57 CEST 2013
> extent_alloc 1484333 2038118 1678 13182
> abt 0 0 0 0
> blk_map 21004635 3433178 1450438 1461372 1450017 25888309 0
> bmbt 0 0 0 0
> dir 1482235 1466711 7281 2529
> trans 7676 6231535 1444850
> ig 0 8534 299 1463749 0 1256778 262381
            ^^^

That is the recycle stat, which indicates we've found an inode being
reclaimed. When it's found an inode that have been evicted, but not
yet reclaimed at the XFS level, that stat will increase. If the
inode is still valid at the VFS level, and igrab() fails, then we'll
get EAGAIN without that stat being increased. So, igrab() is
failing, and that means I_FREEING|I_WILL_FREE are set.

So, it looks to be the same case as the ext4 hang, and it's likely
that we have some dangling inode dispose list somewhere. So, here's
the fun part. Use tracing to grab the inode number that is stuck
(tracepoint xfs::xfs_iget_skip), and then run crash on the live
kernel on the process that is looping, and find the struct xfs_inode
and print it.  Use the inode number from the trace point to check
you've got the right inode.

Th struct inode of the VFS inode is embedded into the struct
xfs_inode, and the dispose list that it is on should be the on the
inode->i_lru_list. What that, and see how many other inodes are on
that list. Once we know if it's a single inode, and whether the
dispose list it is on is intact, empty or corrupt, we might have a
better idea of how these inodes are getting lost....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: linux-next: slab shrinkers: BUG at mm/list_lru.c:92
Date: Mon, 1 Jul 2013 18:10:56 +1000	[thread overview]
Message-ID: <20130701081056.GA4072@dastard> (raw)
In-Reply-To: <20130701075005.GA28765@dhcp22.suse.cz>

On Mon, Jul 01, 2013 at 09:50:05AM +0200, Michal Hocko wrote:
> On Mon 01-07-13 11:25:58, Dave Chinner wrote:
> > On Sun, Jun 30, 2013 at 08:33:49PM +0200, Michal Hocko wrote:
> > > On Sat 29-06-13 12:55:09, Dave Chinner wrote:
> > > > On Thu, Jun 27, 2013 at 04:54:11PM +0200, Michal Hocko wrote:
> > > > > On Thu 27-06-13 09:24:26, Dave Chinner wrote:
> > > > > > On Wed, Jun 26, 2013 at 10:15:09AM +0200, Michal Hocko wrote:
> > > > > > > On Tue 25-06-13 12:27:54, Dave Chinner wrote:
> > > > > > > > On Tue, Jun 18, 2013 at 03:50:25PM +0200, Michal Hocko wrote:
> > > > > > > > > And again, another hang. It looks like the inode deletion never
> > > > > > > > > finishes. The good thing is that I do not see any LRU related BUG_ONs
> > > > > > > > > anymore. I am going to test with the other patch in the thread.
> > > > > > > > > 
> > > > > > > > > 2476 [<ffffffff8118325e>] __wait_on_freeing_inode+0x9e/0xc0	<<< waiting for an inode to go away
> > > > > > > > > [<ffffffff81183321>] find_inode_fast+0xa1/0xc0
> > > > > > > > > [<ffffffff8118525f>] iget_locked+0x4f/0x180
> > > > > > > > > [<ffffffff811ef9e3>] ext4_iget+0x33/0x9f0
> > > > > > > > > [<ffffffff811f6a1c>] ext4_lookup+0xbc/0x160
> > > > > > > > > [<ffffffff81174ad0>] lookup_real+0x20/0x60
> > > > > > > > > [<ffffffff81177e25>] lookup_open+0x175/0x1d0
> > > > > > > > > [<ffffffff8117815e>] do_last+0x2de/0x780			<<< holds i_mutex
> > > > > > > > > [<ffffffff8117ae9a>] path_openat+0xda/0x400
> > > > > > > > > [<ffffffff8117b303>] do_filp_open+0x43/0xa0
> > > > > > > > > [<ffffffff81168ee0>] do_sys_open+0x160/0x1e0
> > > > > > > > > [<ffffffff81168f9c>] sys_open+0x1c/0x20
> > > > > > > > > [<ffffffff81582fe9>] system_call_fastpath+0x16/0x1b
> > > > > > > > > [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> > .....
> > > Do you mean sysrq+t? It is attached. 
> > > 
> > > Btw. I was able to reproduce this again. The stuck processes were
> > > sitting in the same traces for more than 28 hours without any change so
> > > I do not think this is a temporal condition.
> > > 
> > > Traces of all processes in the D state:
> > > 7561 [<ffffffffa029c03e>] xfs_iget+0xbe/0x190 [xfs]
> > > [<ffffffffa02a8e98>] xfs_lookup+0xe8/0x110 [xfs]
> > > [<ffffffffa029fad9>] xfs_vn_lookup+0x49/0x90 [xfs]
> > > [<ffffffff81174ad0>] lookup_real+0x20/0x60
> > > [<ffffffff81177e25>] lookup_open+0x175/0x1d0
> > > [<ffffffff8117815e>] do_last+0x2de/0x780
> > > [<ffffffff8117ae9a>] path_openat+0xda/0x400
> > > [<ffffffff8117b303>] do_filp_open+0x43/0xa0
> > > [<ffffffff81168ee0>] do_sys_open+0x160/0x1e0
> > > [<ffffffff81168f9c>] sys_open+0x1c/0x20
> > > [<ffffffff815830e9>] system_call_fastpath+0x16/0x1b
> > > [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> > This looks like it may be equivalent to the ext4 trace above, though
> > I'm not totally sure on that yet. Can you get me the line of code
> > where the above code is sleeping - 'gdb> l *(xfs_iget+0xbe)' output
> > is sufficient.
> 
> OK, this is a bit tricky because I have xfs built as a module so objdump
> on xfs.ko shows nonsense
>    19039:       e8 00 00 00 00          callq  1903e <xfs_iget+0xbe>
>    1903e:       48 8b 75 c0             mov    -0x40(%rbp),%rsi
> 
> crash was more clever though and it says:
> 0xffffffffa029c034 <xfs_iget+180>:      mov    $0x1,%edi
> 0xffffffffa029c039 <xfs_iget+185>:      callq  0xffffffff815776d0
> <schedule_timeout_uninterruptible>
> /dev/shm/mhocko-build/BUILD/kernel-3.9.0mmotm+/fs/xfs/xfs_icache.c: 423
> 0xffffffffa029c03e <xfs_iget+190>:      mov    -0x40(%rbp),%rsi
> 
> which maps to:
> out_error_or_again:
>         if (error == EAGAIN) {
>                 delay(1);
>                 goto again;
>         }
> 
> So this looks like this path loops in goto again and out_error_or_again.

Yup, that's what I suspected.

> > If it's where I suspect it is, we are hitting a VFS inode that
> > igrab() is failing on because I_FREEING is set and that is returning
> > EAGAIN. Hence xfs_iget() sleeps for a short period and retries the
> > lookup. If you've still got a system in this state, can you dump the
> > xfs stats a few times about 5s apart i.e.
> > 
> > $ for i in `seq 0 1 5`; do echo ; date; cat /proc/fs/xfs/stat ; sleep 5 ; done
> > 
> > Depending on what stat is changing (i'm looking for skip vs recycle
> > in the inode cache stats), that will tell us why the lookup is
> > failing...
> 
> $ for i in `seq 0 1 5`; do echo ; date; cat /proc/fs/xfs/stat ; sleep 5 ; done
> 
> Mon Jul  1 09:29:57 CEST 2013
> extent_alloc 1484333 2038118 1678 13182
> abt 0 0 0 0
> blk_map 21004635 3433178 1450438 1461372 1450017 25888309 0
> bmbt 0 0 0 0
> dir 1482235 1466711 7281 2529
> trans 7676 6231535 1444850
> ig 0 8534 299 1463749 0 1256778 262381
            ^^^

That is the recycle stat, which indicates we've found an inode being
reclaimed. When it's found an inode that have been evicted, but not
yet reclaimed at the XFS level, that stat will increase. If the
inode is still valid at the VFS level, and igrab() fails, then we'll
get EAGAIN without that stat being increased. So, igrab() is
failing, and that means I_FREEING|I_WILL_FREE are set.

So, it looks to be the same case as the ext4 hang, and it's likely
that we have some dangling inode dispose list somewhere. So, here's
the fun part. Use tracing to grab the inode number that is stuck
(tracepoint xfs::xfs_iget_skip), and then run crash on the live
kernel on the process that is looping, and find the struct xfs_inode
and print it.  Use the inode number from the trace point to check
you've got the right inode.

Th struct inode of the VFS inode is embedded into the struct
xfs_inode, and the dispose list that it is on should be the on the
inode->i_lru_list. What that, and see how many other inodes are on
that list. Once we know if it's a single inode, and whether the
dispose list it is on is intact, empty or corrupt, we might have a
better idea of how these inodes are getting lost....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-07-01  8:11 UTC|newest]

Thread overview: 127+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-17 14:18 linux-next: slab shrinkers: BUG at mm/list_lru.c:92 Michal Hocko
2013-06-17 14:18 ` Michal Hocko
2013-06-17 15:14 ` Glauber Costa
2013-06-17 15:14   ` Glauber Costa
2013-06-17 15:33   ` Michal Hocko
2013-06-17 15:33     ` Michal Hocko
2013-06-17 16:54     ` Glauber Costa
2013-06-17 16:54       ` Glauber Costa
2013-06-18  7:42       ` Michal Hocko
2013-06-18  7:42         ` Michal Hocko
2013-06-17 21:35   ` Andrew Morton
2013-06-17 21:35     ` Andrew Morton
2013-06-17 22:30     ` Glauber Costa
2013-06-18  2:46       ` Dave Chinner
2013-06-18  2:46         ` Dave Chinner
2013-06-18  6:31         ` Glauber Costa
2013-06-18  6:31           ` Glauber Costa
2013-06-18  8:24           ` Michal Hocko
2013-06-18  8:24             ` Michal Hocko
2013-06-18 10:44             ` Michal Hocko
2013-06-18 10:44               ` Michal Hocko
2013-06-18 13:50               ` Michal Hocko
2013-06-18 13:50                 ` Michal Hocko
2013-06-25  2:27                 ` Dave Chinner
2013-06-25  2:27                   ` Dave Chinner
2013-06-26  8:15                   ` Michal Hocko
2013-06-26  8:15                     ` Michal Hocko
2013-06-26 23:24                     ` Dave Chinner
2013-06-26 23:24                       ` Dave Chinner
2013-06-27 14:54                       ` Michal Hocko
2013-06-27 14:54                         ` Michal Hocko
2013-06-28  8:39                         ` Michal Hocko
2013-06-28  8:39                           ` Michal Hocko
2013-06-28 14:31                           ` Glauber Costa
2013-06-28 14:31                             ` Glauber Costa
2013-06-28 15:12                             ` Michal Hocko
2013-06-28 15:12                               ` Michal Hocko
2013-06-29  2:55                         ` Dave Chinner
2013-06-29  2:55                           ` Dave Chinner
2013-06-30 18:33                           ` Michal Hocko
2013-07-01  1:25                             ` Dave Chinner
2013-07-01  1:25                               ` Dave Chinner
2013-07-01  7:50                               ` Michal Hocko
2013-07-01  7:50                                 ` Michal Hocko
2013-07-01  8:10                                 ` Dave Chinner [this message]
2013-07-01  8:10                                   ` Dave Chinner
2013-07-02  9:22                                   ` Michal Hocko
2013-07-02 12:19                                     ` Dave Chinner
2013-07-02 12:19                                       ` Dave Chinner
2013-07-02 12:44                                       ` Michal Hocko
2013-07-02 12:44                                         ` Michal Hocko
2013-07-03 11:24                                         ` Dave Chinner
2013-07-03 11:24                                           ` Dave Chinner
2013-07-03 14:08                                           ` Glauber Costa
2013-07-03 14:08                                             ` Glauber Costa
2013-07-04 16:36                                           ` Michal Hocko
2013-07-04 16:36                                             ` Michal Hocko
2013-07-08 12:53                                             ` Michal Hocko
2013-07-08 21:04                                               ` Andrew Morton
2013-07-08 21:04                                                 ` Andrew Morton
2013-07-09 17:34                                                 ` Glauber Costa
2013-07-09 17:34                                                   ` Glauber Costa
2013-07-09 17:51                                                   ` Andrew Morton
2013-07-09 17:51                                                     ` Andrew Morton
2013-07-09 17:32                                               ` Glauber Costa
2013-07-09 17:32                                                 ` Glauber Costa
2013-07-09 17:50                                                 ` Andrew Morton
2013-07-09 17:50                                                   ` Andrew Morton
2013-07-09 17:57                                                   ` Glauber Costa
2013-07-09 17:57                                                     ` Glauber Costa
2013-07-09 17:57                                                 ` Michal Hocko
2013-07-09 17:57                                                   ` Michal Hocko
2013-07-09 21:39                                                   ` Andrew Morton
2013-07-09 21:39                                                     ` Andrew Morton
2013-07-10  2:31                                               ` Dave Chinner
2013-07-10  2:31                                                 ` Dave Chinner
2013-07-10  7:34                                                 ` Michal Hocko
2013-07-10  7:34                                                   ` Michal Hocko
2013-07-10  8:06                                                 ` Michal Hocko
2013-07-10  8:06                                                   ` Michal Hocko
2013-07-11  2:26                                                   ` Dave Chinner
2013-07-11  2:26                                                     ` Dave Chinner
2013-07-11  3:03                                                     ` Andrew Morton
2013-07-11  3:03                                                       ` Andrew Morton
2013-07-11 13:23                                                     ` Michal Hocko
2013-07-11 13:23                                                       ` Michal Hocko
2013-07-12  1:42                                                       ` Hugh Dickins
2013-07-12  1:42                                                         ` Hugh Dickins
2013-07-13  3:29                                                         ` Dave Chinner
2013-07-13  3:29                                                           ` Dave Chinner
2013-07-15  9:14                                             ` Michal Hocko
2013-07-15  9:14                                               ` Michal Hocko
2013-06-18  6:26       ` Glauber Costa
2013-06-18  8:25         ` Michal Hocko
2013-06-18  8:25           ` Michal Hocko
2013-06-19  7:13         ` Michal Hocko
2013-06-19  7:13           ` Michal Hocko
2013-06-19  7:35           ` Glauber Costa
2013-06-19  7:35             ` Glauber Costa
2013-06-19  8:52             ` Glauber Costa
2013-06-19  8:52               ` Glauber Costa
2013-06-19 13:57             ` Michal Hocko
2013-06-19 13:57               ` Michal Hocko
2013-06-19 14:02               ` Glauber Costa
2013-06-19 14:02                 ` Glauber Costa
2013-06-19 14:28           ` Michal Hocko
2013-06-19 14:28             ` Michal Hocko
2013-06-20 14:11             ` Glauber Costa
2013-06-20 14:11               ` Glauber Costa
2013-06-20 15:12               ` Michal Hocko
2013-06-20 15:16                 ` Michal Hocko
2013-06-20 15:16                   ` Michal Hocko
2013-06-21  9:00                 ` Michal Hocko
2013-06-21  9:00                   ` Michal Hocko
2013-06-23 11:51                   ` Glauber Costa
2013-06-23 11:51                     ` Glauber Costa
2013-06-23 11:55                     ` Glauber Costa
2013-06-25  2:29                     ` Dave Chinner
2013-06-25  2:29                       ` Dave Chinner
2013-06-26  8:22                     ` Michal Hocko
2013-06-26  8:22                       ` Michal Hocko
2013-06-18  8:19       ` Michal Hocko
2013-06-18  8:19         ` Michal Hocko
2013-06-18  8:21         ` Glauber Costa
2013-06-18  8:21           ` Glauber Costa
2013-06-18  8:26           ` Michal Hocko
2013-06-18  8:26             ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130701081056.GA4072@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=glommer@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.