All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: linux-next: slab shrinkers: BUG at mm/list_lru.c:92
Date: Thu, 11 Jul 2013 12:26:34 +1000	[thread overview]
Message-ID: <20130711022634.GZ3438@dastard> (raw)
In-Reply-To: <20130710080605.GC4437@dhcp22.suse.cz>

On Wed, Jul 10, 2013 at 10:06:05AM +0200, Michal Hocko wrote:
> On Wed 10-07-13 12:31:39, Dave Chinner wrote:
> [...]
> > > 20761 [<ffffffffa0305fdd>] xlog_grant_head_wait+0xdd/0x1a0 [xfs]
> > > [<ffffffffa0306166>] xlog_grant_head_check+0xc6/0xe0 [xfs]
> > > [<ffffffffa030627f>] xfs_log_reserve+0xff/0x240 [xfs]
> > > [<ffffffffa0302ac4>] xfs_trans_reserve+0x234/0x240 [xfs]
> > > [<ffffffffa02c5999>] xfs_create+0x1a9/0x5c0 [xfs]
> > > [<ffffffffa02bccca>] xfs_vn_mknod+0x8a/0x1a0 [xfs]
> > > [<ffffffffa02bce0e>] xfs_vn_create+0xe/0x10 [xfs]
> > > [<ffffffff811763dd>] vfs_create+0xad/0xd0
> > > [<ffffffff81177e68>] lookup_open+0x1b8/0x1d0
> > > [<ffffffff8117815e>] do_last+0x2de/0x780
> > > [<ffffffff8117ae9a>] path_openat+0xda/0x400
> > > [<ffffffff8117b303>] do_filp_open+0x43/0xa0
> > > [<ffffffff81168ee0>] do_sys_open+0x160/0x1e0
> > > [<ffffffff81168f9c>] sys_open+0x1c/0x20
> > > [<ffffffff815830e9>] system_call_fastpath+0x16/0x1b
> > > [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> > That's an XFS log space issue, indicating that it has run out of
> > space in IO the log and it is waiting for more to come free. That
> > requires IO completion to occur.
> >
> > > [276962.652076] INFO: task xfs-data/sda9:930 blocked for more than 480 seconds.
> > > [276962.652087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [276962.652093] xfs-data/sda9   D ffff88001ffb9cc8     0   930      2 0x00000000
> > 
> > Oh, that's why. This is the IO completion worker...
> 
> But that task doesn't seem to be stuck anymore (at least lockup watchdog
> doesn't report it anymore and I have already rebooted to test with ext3
> :/). I am sorry if the these lockups logs were more confusing than
> helpful, but they happened _long_ time ago and the system obviously
> recovered from them. I am pasting only the traces for processes in D
> state here again for reference.

Right, there are various triggers that can get XFS out of the
situation - it takes something to kick the log or metadata writeback
and that can make space in the log free up and hence things get
moving again. The problem will be that once in this low memory state
everything in the filesystem will back up on slow memory allocation
and it might take minutes to clear the backlog of IO completions....

> 20757 [<ffffffffa0305fdd>] xlog_grant_head_wait+0xdd/0x1a0 [xfs]
> [<ffffffffa0306166>] xlog_grant_head_check+0xc6/0xe0 [xfs]
> [<ffffffffa030627f>] xfs_log_reserve+0xff/0x240 [xfs]
> [<ffffffffa0302ac4>] xfs_trans_reserve+0x234/0x240 [xfs]

That is the stack of a process waiting for log space to come
available.

> We are wating for page under writeback but neither of the 2 paths starts
> in xfs code. So I do not think waiting for PageWriteback causes a
> deadlock here.

The problem is this: the page that we are waiting for IO on is in
the IO completion queue, but the IO compeltion requires memory
allocation to complete the transaction. That memory allocation is
causing memcg reclaim, which then waits for IO completion on another
page, which may or may not end up in the same IO completion queue.
The CMWQ can continue to process new Io completions - up to a point
- so slow progress will be made. In the worst case, it can deadlock.

GFP_NOFS allocation is the mechanism by which filesystems are
supposed to be able to avoid this recursive deadlock...

> [...]
> > ... is running IO completion work and trying to commit a transaction
> > that is blocked in memory allocation which is waiting for IO
> > completion. It's disappeared up it's own fundamental orifice.
> > 
> > Ok, this has absolutely nothing to do with the LRU changes - this is
> > a pre-existing XFS/mm interaction problem from around 3.2. The
> > question is now this: how the hell do I get memory allocation to not
> > block waiting on IO completion here? This is already being done in
> > GFP_NOFS allocation context here....
> 
> Just for reference. wait_on_page_writeback is issued only for memcg
> reclaim because there is no other throttling mechanism to prevent from
> too many dirty pages on the list, thus pre-mature OOM killer. See
> e62e384e9d (memcg: prevent OOM with too many dirty pages) for more
> details. The original patch relied on may_enter_fs but that check
> disappeared by later changes by c3b94f44fc (memcg: further prevent OOM
> with too many dirty pages).

Aye. That's the exact code I was looking at yesterday and wondering
"how the hell is waiting on page writeback valid in GFP_NOFS
context?". It seems that memcg reclaim is intentionally ignoring
GFP_NOFS to avoid OOM issues.  That's a memcg implementation problem,
not a filesystem or LRU infrastructure problem....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: linux-next: slab shrinkers: BUG at mm/list_lru.c:92
Date: Thu, 11 Jul 2013 12:26:34 +1000	[thread overview]
Message-ID: <20130711022634.GZ3438@dastard> (raw)
In-Reply-To: <20130710080605.GC4437@dhcp22.suse.cz>

On Wed, Jul 10, 2013 at 10:06:05AM +0200, Michal Hocko wrote:
> On Wed 10-07-13 12:31:39, Dave Chinner wrote:
> [...]
> > > 20761 [<ffffffffa0305fdd>] xlog_grant_head_wait+0xdd/0x1a0 [xfs]
> > > [<ffffffffa0306166>] xlog_grant_head_check+0xc6/0xe0 [xfs]
> > > [<ffffffffa030627f>] xfs_log_reserve+0xff/0x240 [xfs]
> > > [<ffffffffa0302ac4>] xfs_trans_reserve+0x234/0x240 [xfs]
> > > [<ffffffffa02c5999>] xfs_create+0x1a9/0x5c0 [xfs]
> > > [<ffffffffa02bccca>] xfs_vn_mknod+0x8a/0x1a0 [xfs]
> > > [<ffffffffa02bce0e>] xfs_vn_create+0xe/0x10 [xfs]
> > > [<ffffffff811763dd>] vfs_create+0xad/0xd0
> > > [<ffffffff81177e68>] lookup_open+0x1b8/0x1d0
> > > [<ffffffff8117815e>] do_last+0x2de/0x780
> > > [<ffffffff8117ae9a>] path_openat+0xda/0x400
> > > [<ffffffff8117b303>] do_filp_open+0x43/0xa0
> > > [<ffffffff81168ee0>] do_sys_open+0x160/0x1e0
> > > [<ffffffff81168f9c>] sys_open+0x1c/0x20
> > > [<ffffffff815830e9>] system_call_fastpath+0x16/0x1b
> > > [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> > That's an XFS log space issue, indicating that it has run out of
> > space in IO the log and it is waiting for more to come free. That
> > requires IO completion to occur.
> >
> > > [276962.652076] INFO: task xfs-data/sda9:930 blocked for more than 480 seconds.
> > > [276962.652087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [276962.652093] xfs-data/sda9   D ffff88001ffb9cc8     0   930      2 0x00000000
> > 
> > Oh, that's why. This is the IO completion worker...
> 
> But that task doesn't seem to be stuck anymore (at least lockup watchdog
> doesn't report it anymore and I have already rebooted to test with ext3
> :/). I am sorry if the these lockups logs were more confusing than
> helpful, but they happened _long_ time ago and the system obviously
> recovered from them. I am pasting only the traces for processes in D
> state here again for reference.

Right, there are various triggers that can get XFS out of the
situation - it takes something to kick the log or metadata writeback
and that can make space in the log free up and hence things get
moving again. The problem will be that once in this low memory state
everything in the filesystem will back up on slow memory allocation
and it might take minutes to clear the backlog of IO completions....

> 20757 [<ffffffffa0305fdd>] xlog_grant_head_wait+0xdd/0x1a0 [xfs]
> [<ffffffffa0306166>] xlog_grant_head_check+0xc6/0xe0 [xfs]
> [<ffffffffa030627f>] xfs_log_reserve+0xff/0x240 [xfs]
> [<ffffffffa0302ac4>] xfs_trans_reserve+0x234/0x240 [xfs]

That is the stack of a process waiting for log space to come
available.

> We are wating for page under writeback but neither of the 2 paths starts
> in xfs code. So I do not think waiting for PageWriteback causes a
> deadlock here.

The problem is this: the page that we are waiting for IO on is in
the IO completion queue, but the IO compeltion requires memory
allocation to complete the transaction. That memory allocation is
causing memcg reclaim, which then waits for IO completion on another
page, which may or may not end up in the same IO completion queue.
The CMWQ can continue to process new Io completions - up to a point
- so slow progress will be made. In the worst case, it can deadlock.

GFP_NOFS allocation is the mechanism by which filesystems are
supposed to be able to avoid this recursive deadlock...

> [...]
> > ... is running IO completion work and trying to commit a transaction
> > that is blocked in memory allocation which is waiting for IO
> > completion. It's disappeared up it's own fundamental orifice.
> > 
> > Ok, this has absolutely nothing to do with the LRU changes - this is
> > a pre-existing XFS/mm interaction problem from around 3.2. The
> > question is now this: how the hell do I get memory allocation to not
> > block waiting on IO completion here? This is already being done in
> > GFP_NOFS allocation context here....
> 
> Just for reference. wait_on_page_writeback is issued only for memcg
> reclaim because there is no other throttling mechanism to prevent from
> too many dirty pages on the list, thus pre-mature OOM killer. See
> e62e384e9d (memcg: prevent OOM with too many dirty pages) for more
> details. The original patch relied on may_enter_fs but that check
> disappeared by later changes by c3b94f44fc (memcg: further prevent OOM
> with too many dirty pages).

Aye. That's the exact code I was looking at yesterday and wondering
"how the hell is waiting on page writeback valid in GFP_NOFS
context?". It seems that memcg reclaim is intentionally ignoring
GFP_NOFS to avoid OOM issues.  That's a memcg implementation problem,
not a filesystem or LRU infrastructure problem....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-07-11  2:26 UTC|newest]

Thread overview: 127+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-17 14:18 linux-next: slab shrinkers: BUG at mm/list_lru.c:92 Michal Hocko
2013-06-17 14:18 ` Michal Hocko
2013-06-17 15:14 ` Glauber Costa
2013-06-17 15:14   ` Glauber Costa
2013-06-17 15:33   ` Michal Hocko
2013-06-17 15:33     ` Michal Hocko
2013-06-17 16:54     ` Glauber Costa
2013-06-17 16:54       ` Glauber Costa
2013-06-18  7:42       ` Michal Hocko
2013-06-18  7:42         ` Michal Hocko
2013-06-17 21:35   ` Andrew Morton
2013-06-17 21:35     ` Andrew Morton
2013-06-17 22:30     ` Glauber Costa
2013-06-18  2:46       ` Dave Chinner
2013-06-18  2:46         ` Dave Chinner
2013-06-18  6:31         ` Glauber Costa
2013-06-18  6:31           ` Glauber Costa
2013-06-18  8:24           ` Michal Hocko
2013-06-18  8:24             ` Michal Hocko
2013-06-18 10:44             ` Michal Hocko
2013-06-18 10:44               ` Michal Hocko
2013-06-18 13:50               ` Michal Hocko
2013-06-18 13:50                 ` Michal Hocko
2013-06-25  2:27                 ` Dave Chinner
2013-06-25  2:27                   ` Dave Chinner
2013-06-26  8:15                   ` Michal Hocko
2013-06-26  8:15                     ` Michal Hocko
2013-06-26 23:24                     ` Dave Chinner
2013-06-26 23:24                       ` Dave Chinner
2013-06-27 14:54                       ` Michal Hocko
2013-06-27 14:54                         ` Michal Hocko
2013-06-28  8:39                         ` Michal Hocko
2013-06-28  8:39                           ` Michal Hocko
2013-06-28 14:31                           ` Glauber Costa
2013-06-28 14:31                             ` Glauber Costa
2013-06-28 15:12                             ` Michal Hocko
2013-06-28 15:12                               ` Michal Hocko
2013-06-29  2:55                         ` Dave Chinner
2013-06-29  2:55                           ` Dave Chinner
2013-06-30 18:33                           ` Michal Hocko
2013-07-01  1:25                             ` Dave Chinner
2013-07-01  1:25                               ` Dave Chinner
2013-07-01  7:50                               ` Michal Hocko
2013-07-01  7:50                                 ` Michal Hocko
2013-07-01  8:10                                 ` Dave Chinner
2013-07-01  8:10                                   ` Dave Chinner
2013-07-02  9:22                                   ` Michal Hocko
2013-07-02 12:19                                     ` Dave Chinner
2013-07-02 12:19                                       ` Dave Chinner
2013-07-02 12:44                                       ` Michal Hocko
2013-07-02 12:44                                         ` Michal Hocko
2013-07-03 11:24                                         ` Dave Chinner
2013-07-03 11:24                                           ` Dave Chinner
2013-07-03 14:08                                           ` Glauber Costa
2013-07-03 14:08                                             ` Glauber Costa
2013-07-04 16:36                                           ` Michal Hocko
2013-07-04 16:36                                             ` Michal Hocko
2013-07-08 12:53                                             ` Michal Hocko
2013-07-08 21:04                                               ` Andrew Morton
2013-07-08 21:04                                                 ` Andrew Morton
2013-07-09 17:34                                                 ` Glauber Costa
2013-07-09 17:34                                                   ` Glauber Costa
2013-07-09 17:51                                                   ` Andrew Morton
2013-07-09 17:51                                                     ` Andrew Morton
2013-07-09 17:32                                               ` Glauber Costa
2013-07-09 17:32                                                 ` Glauber Costa
2013-07-09 17:50                                                 ` Andrew Morton
2013-07-09 17:50                                                   ` Andrew Morton
2013-07-09 17:57                                                   ` Glauber Costa
2013-07-09 17:57                                                     ` Glauber Costa
2013-07-09 17:57                                                 ` Michal Hocko
2013-07-09 17:57                                                   ` Michal Hocko
2013-07-09 21:39                                                   ` Andrew Morton
2013-07-09 21:39                                                     ` Andrew Morton
2013-07-10  2:31                                               ` Dave Chinner
2013-07-10  2:31                                                 ` Dave Chinner
2013-07-10  7:34                                                 ` Michal Hocko
2013-07-10  7:34                                                   ` Michal Hocko
2013-07-10  8:06                                                 ` Michal Hocko
2013-07-10  8:06                                                   ` Michal Hocko
2013-07-11  2:26                                                   ` Dave Chinner [this message]
2013-07-11  2:26                                                     ` Dave Chinner
2013-07-11  3:03                                                     ` Andrew Morton
2013-07-11  3:03                                                       ` Andrew Morton
2013-07-11 13:23                                                     ` Michal Hocko
2013-07-11 13:23                                                       ` Michal Hocko
2013-07-12  1:42                                                       ` Hugh Dickins
2013-07-12  1:42                                                         ` Hugh Dickins
2013-07-13  3:29                                                         ` Dave Chinner
2013-07-13  3:29                                                           ` Dave Chinner
2013-07-15  9:14                                             ` Michal Hocko
2013-07-15  9:14                                               ` Michal Hocko
2013-06-18  6:26       ` Glauber Costa
2013-06-18  8:25         ` Michal Hocko
2013-06-18  8:25           ` Michal Hocko
2013-06-19  7:13         ` Michal Hocko
2013-06-19  7:13           ` Michal Hocko
2013-06-19  7:35           ` Glauber Costa
2013-06-19  7:35             ` Glauber Costa
2013-06-19  8:52             ` Glauber Costa
2013-06-19  8:52               ` Glauber Costa
2013-06-19 13:57             ` Michal Hocko
2013-06-19 13:57               ` Michal Hocko
2013-06-19 14:02               ` Glauber Costa
2013-06-19 14:02                 ` Glauber Costa
2013-06-19 14:28           ` Michal Hocko
2013-06-19 14:28             ` Michal Hocko
2013-06-20 14:11             ` Glauber Costa
2013-06-20 14:11               ` Glauber Costa
2013-06-20 15:12               ` Michal Hocko
2013-06-20 15:16                 ` Michal Hocko
2013-06-20 15:16                   ` Michal Hocko
2013-06-21  9:00                 ` Michal Hocko
2013-06-21  9:00                   ` Michal Hocko
2013-06-23 11:51                   ` Glauber Costa
2013-06-23 11:51                     ` Glauber Costa
2013-06-23 11:55                     ` Glauber Costa
2013-06-25  2:29                     ` Dave Chinner
2013-06-25  2:29                       ` Dave Chinner
2013-06-26  8:22                     ` Michal Hocko
2013-06-26  8:22                       ` Michal Hocko
2013-06-18  8:19       ` Michal Hocko
2013-06-18  8:19         ` Michal Hocko
2013-06-18  8:21         ` Glauber Costa
2013-06-18  8:21           ` Glauber Costa
2013-06-18  8:26           ` Michal Hocko
2013-06-18  8:26             ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130711022634.GZ3438@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=glommer@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.