All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Linux MM <linux-mm@kvack.org>
Subject: Re: [PATCH v3] xfs: avoid deadlock when trigger memory reclaim in ->writepages
Date: Tue, 16 Jun 2020 11:27:27 +0200	[thread overview]
Message-ID: <20200616092727.GD9499@dhcp22.suse.cz> (raw)
In-Reply-To: <CALOAHbDsCB1yZE6m96xiX1KiUWJW-8Hn0eqGcuEipkf9R6_L2A@mail.gmail.com>

On Tue 16-06-20 17:05:25, Yafang Shao wrote:
> On Tue, Jun 16, 2020 at 4:16 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Mon 15-06-20 07:56:21, Yafang Shao wrote:
> > > Recently there is a XFS deadlock on our server with an old kernel.
> > > This deadlock is caused by allocating memory in xfs_map_blocks() while
> > > doing writeback on behalf of memroy reclaim. Although this deadlock happens
> > > on an old kernel, I think it could happen on the upstream as well. This
> > > issue only happens once and can't be reproduced, so I haven't tried to
> > > reproduce it on upsteam kernel.
> > >
> > > Bellow is the call trace of this deadlock.
> > > [480594.790087] INFO: task redis-server:16212 blocked for more than 120 seconds.
> > > [480594.790087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [480594.790088] redis-server    D ffffffff8168bd60     0 16212  14347 0x00000004
> > > [480594.790090]  ffff880da128f070 0000000000000082 ffff880f94a2eeb0 ffff880da128ffd8
> > > [480594.790092]  ffff880da128ffd8 ffff880da128ffd8 ffff880f94a2eeb0 ffff88103f9d6c40
> > > [480594.790094]  0000000000000000 7fffffffffffffff ffff88207ffc0ee8 ffffffff8168bd60
> > > [480594.790096] Call Trace:
> > > [480594.790101]  [<ffffffff8168dce9>] schedule+0x29/0x70
> > > [480594.790103]  [<ffffffff8168b749>] schedule_timeout+0x239/0x2c0
> > > [480594.790111]  [<ffffffff8168d28e>] io_schedule_timeout+0xae/0x130
> > > [480594.790114]  [<ffffffff8168d328>] io_schedule+0x18/0x20
> > > [480594.790116]  [<ffffffff8168bd71>] bit_wait_io+0x11/0x50
> > > [480594.790118]  [<ffffffff8168b895>] __wait_on_bit+0x65/0x90
> > > [480594.790121]  [<ffffffff811814e1>] wait_on_page_bit+0x81/0xa0
> > > [480594.790125]  [<ffffffff81196ad2>] shrink_page_list+0x6d2/0xaf0
> > > [480594.790130]  [<ffffffff811975a3>] shrink_inactive_list+0x223/0x710
> > > [480594.790135]  [<ffffffff81198225>] shrink_lruvec+0x3b5/0x810
> > > [480594.790139]  [<ffffffff8119873a>] shrink_zone+0xba/0x1e0
> > > [480594.790141]  [<ffffffff81198c20>] do_try_to_free_pages+0x100/0x510
> > > [480594.790143]  [<ffffffff8119928d>] try_to_free_mem_cgroup_pages+0xdd/0x170
> > > [480594.790145]  [<ffffffff811f32de>] mem_cgroup_reclaim+0x4e/0x120
> > > [480594.790147]  [<ffffffff811f37cc>] __mem_cgroup_try_charge+0x41c/0x670
> > > [480594.790153]  [<ffffffff811f5cb6>] __memcg_kmem_newpage_charge+0xf6/0x180
> > > [480594.790157]  [<ffffffff8118c72d>] __alloc_pages_nodemask+0x22d/0x420
> > > [480594.790162]  [<ffffffff811d0c7a>] alloc_pages_current+0xaa/0x170
> > > [480594.790165]  [<ffffffff811db8fc>] new_slab+0x30c/0x320
> > > [480594.790168]  [<ffffffff811dd17c>] ___slab_alloc+0x3ac/0x4f0
> > > [480594.790204]  [<ffffffff81685656>] __slab_alloc+0x40/0x5c
> > > [480594.790206]  [<ffffffff811dfc43>] kmem_cache_alloc+0x193/0x1e0
> > > [480594.790233]  [<ffffffffa04fab67>] kmem_zone_alloc+0x97/0x130 [xfs]
> > > [480594.790247]  [<ffffffffa04f90ba>] _xfs_trans_alloc+0x3a/0xa0 [xfs]
> > > [480594.790261]  [<ffffffffa04f915c>] xfs_trans_alloc+0x3c/0x50 [xfs]
> >
> > Now with a more explanation from Dave I have got back to the original
> > backtrace. Not sure which kernel version you are using but this path
> > should have passed xfs_trans_reserve which sets PF_MEMALLOC_NOFS and
> > this in turn should have made __alloc_pages_nodemask to use __GFP_NOFS
> > and the memcg reclaim shouldn't ever wait_on_page_writeback (pressumably
> > this is what the io_schedule is coming from).
> 
> Hi Michal,
> 
> The page is allocated before calling xfs_trans_reserve, so the
> PF_MEMALLOC_NOFS hasn't been set yet.
> See bellow,
> 
> xfs_trans_alloc
>     kmem_zone_zalloc() <<< GPF_NOFS hasn't been set yet, but it may
> trigger memory reclaim
>     xfs_trans_reserve() <<<< GPF_NOFS is set here (for the kernel
> prior to commit 9070733b4efac, it is PF_FSTRANS)

You are right, I have misread the actual allocation side. 8683edb7755b8
has added KM_NOFS and 73d30d48749f8 has removed it. I cannot really
judge the correctness here.

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2020-06-16  9:27 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-15 11:56 [PATCH v3] xfs: avoid deadlock when trigger memory reclaim in ->writepages Yafang Shao
2020-06-15 14:25 ` Holger Hoffstätte
2020-06-15 14:51   ` Yafang Shao
2020-06-15 14:51     ` Yafang Shao
2020-06-15 14:53   ` Michal Hocko
2020-06-15 15:07     ` Matthew Wilcox
2020-06-15 23:23       ` Dave Chinner
2020-06-15 15:08     ` Yafang Shao
2020-06-15 23:06     ` Dave Chinner
2020-06-16  7:56       ` Michal Hocko
2020-06-16 10:17       ` Yafang Shao
2020-06-16  8:16 ` Michal Hocko
2020-06-16  9:05   ` Yafang Shao
2020-06-16  9:05     ` Yafang Shao
2020-06-16  9:27     ` Michal Hocko [this message]
2020-06-16  9:39       ` Yafang Shao
2020-06-16  9:39         ` Yafang Shao
2020-06-16 10:48         ` Michal Hocko
2020-06-16 11:42           ` Yafang Shao
2020-06-16 11:42             ` Yafang Shao
2020-06-18  0:34           ` Dave Chinner
2020-06-18 11:04             ` Yafang Shao
2020-06-18 11:04               ` Yafang Shao
2020-06-22  1:23 ` [xfs] 59d77e81c5: WARNING:at_fs/iomap/buffered-io.c:#iomap_do_writepage kernel test robot
2020-06-22  1:23   ` kernel test robot
2020-06-22 12:20   ` Yafang Shao
2020-06-22 12:20     ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200616092727.GD9499@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=laoar.shao@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.