linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chandan Rajendra <chandan@linux.ibm.com>
To: Brian Foster <bfoster@redhat.com>
Cc: Chandan Rajendra <chandanrlinux@gmail.com>,
	linux-xfs@vger.kernel.org, darrick.wong@oracle.com,
	hch@infradead.org, david@fromorbit.com
Subject: Re: [RFC] xfs: Flush iclog containing XLOG_COMMIT_TRANS before waiting for log space
Date: Thu, 22 Aug 2019 18:16:37 +0530	[thread overview]
Message-ID: <1965562.CDkx7u9Umj@localhost.localdomain> (raw)
In-Reply-To: <20190821164240.GD19646@bfoster>

On Wednesday, August 21, 2019 10:12 PM Brian Foster wrote:
> On Wed, Aug 21, 2019 at 04:34:48PM +0530, Chandan Rajendra wrote:
> > The following call trace is seen when executing generic/530 on a ppc64le
> > machine,
> > 
> 
> Could you provide xfs_info of the scratch device that reproduces this
> problem? I'm curious because I don't recall seeing this, but I'm also
> not sure I've run those iunlink tests on ppc64...

# xfs_info /dev/loop1
meta-data=/dev/loop1             isize=512    agcount=4, agsize=159648 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=1
         =                       reflink=1
data     =                       bsize=4096   blocks=638592, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=3693, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

> 
> > INFO: task mount:7722 blocked for more than 122 seconds.
> >       Not tainted 5.3.0-rc1-next-20190723-00001-g1867922e5cbf-dirty #6
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > mount           D 8448  7722   7490 0x00040008
> > Call Trace:
> > [c000000629343210] [0000000000000001] 0x1 (unreliable)
> > [c0000006293433f0] [c000000000021acc] __switch_to+0x2ac/0x490
> > [c000000629343450] [c000000000fbbbf4] __schedule+0x394/0xb50
> > [c000000629343510] [c000000000fbc3f4] schedule+0x44/0xf0
> > [c000000629343540] [c0000000007623b4] xlog_grant_head_wait+0x84/0x420
> > [c0000006293435b0] [c000000000762828] xlog_grant_head_check+0xd8/0x1e0
> > [c000000629343600] [c000000000762f6c] xfs_log_reserve+0x26c/0x310
> > [c000000629343690] [c00000000075defc] xfs_trans_reserve+0x28c/0x3e0
> > [c0000006293436e0] [c0000000007606ac] xfs_trans_alloc+0xfc/0x2f0
> > [c000000629343780] [c000000000749ca8] xfs_inactive_ifree+0x248/0x2a0
> > [c000000629343810] [c000000000749e58] xfs_inactive+0x158/0x300
> > [c000000629343850] [c000000000758554] xfs_fs_destroy_inode+0x104/0x3f0
> > [c000000629343890] [c00000000046850c] destroy_inode+0x6c/0xc0
> > [c0000006293438c0] [c00000000074c748] xfs_irele+0x168/0x1d0
> > [c000000629343900] [c000000000778c78] xlog_recover_process_one_iunlink+0x118/0x1e0
> > [c000000629343960] [c000000000778e10] xlog_recover_process_iunlinks+0xd0/0x130
> > [c0000006293439b0] [c000000000782408] xlog_recover_finish+0x58/0x130
> > [c000000629343a20] [c000000000763818] xfs_log_mount_finish+0xa8/0x1d0
> > [c000000629343a60] [c000000000750908] xfs_mountfs+0x6e8/0x9e0
> > [c000000629343b20] [c00000000075a210] xfs_fs_fill_super+0x5a0/0x7c0
> > [c000000629343bc0] [c00000000043e7fc] mount_bdev+0x25c/0x2a0
> > [c000000629343c60] [c000000000757c48] xfs_fs_mount+0x28/0x40
> > [c000000629343c80] [c0000000004956cc] legacy_get_tree+0x4c/0xb0
> > [c000000629343cb0] [c00000000043d690] vfs_get_tree+0x50/0x160
> > [c000000629343d30] [c0000000004775d4] do_mount+0xa14/0xc20
> > [c000000629343db0] [c000000000477d48] ksys_mount+0xc8/0x180
> > [c000000629343e00] [c000000000477e20] sys_mount+0x20/0x30
> > [c000000629343e20] [c00000000000b864] system_call+0x5c/0x70
> > 
> > i.e. the mount task gets hung indefinitely due to the following sequence
> > of events,
> > 
> > 1. Test creates lots of unlinked temp files and then shutsdown the
> >    filesystem.
> > 2. During mount, a transaction started in the context of processing
> >    unlinked inode list causes several iclogs to be filled up. All but
> >    the last one is submitted for I/O.
> 
> So the inactivation transaction commits should basically start to
> populate the CIL. At some point we cross a threshold and a commit
> triggers a background CIL push (see xlog_cil_push_background()). I'm
> assuming that occurs somewhere in here because otherwise we wouldn't
> have started filling iclogs..
> 
> > 3. After writing XLOG_COMMIT_TRANS record into the iclog, we will have
> >    18532 bytes of free space in the last iclog of the transaction which is
> >    greater than 2*sizeof(xlog_op_header_t). Hence
> >    xlog_state_get_iclog_space() does not switch over to using a newer iclog.
> 
> Ok, so we've sent a bunch of iclogs to disk and the commit record for
> the checkpoint ends up in the current iclog log, which remains active on
> return from xlog_commit_record() -> xlog_write() etc.
> 
> > 4. Meanwhile, the endio code processing iclogs of the transaction do not
> >    insert items into the AIL since the iclog containing XLOG_COMMIT_TRANS
> >    hasn't been submitted for I/O yet. Hence a major part of the on-disk
> >    log cannot be freed yet.
> 
> Ok, so that final (still active) iclog holding the commit record is also
> holding the ctx.
> 
> > 5. A new request for log space (via xfs_log_reserve()) will now wait
> >    indefinitely for on-disk log space to be freed.
> > 
> > To fix this issue, before waiting for log space to be freed, this commit
> > now submits xlog->l_iclog for write I/O if iclog->ic_state is
> > XLOG_STATE_ACTIVE and iclog has metadata written into it. This causes
> > AIL list to be populated and a later call to xlog_grant_push_ail() will
> > free up the on-disk log space.
> > 
> 
> Interesting, so I think I follow the issue at least and what the code is
> trying to do to fix it. I'm not totally clear on what the ideal approach
> is as this code is complex and I'd need to think about it some more. My
> first thought was perhaps whether we needed a checkpoint size limit or
> to change the background push threshold or some such, but the more I
> think about that the less I think that fixes anything.
> 
> A follow up thought is that it seems somewhat unfortunate that we can
> build up and hold so much reservation on essentially a single commit
> record and then just let it sit around in-core waiting for somebody else
> to come along and ask for reservation for something unrelated. I think a
> background log force should eventually come around and push that last
> record out, so we're not in that state indefinitely, but IIUC that's a
> lot of log consumption essentially unaccounted for until the associated
> items land in the AIL.
> 
> I'm wondering if we should have some measure (if we don't already) of
> how much reservation is tied into a single iclog and use that as an
> additional metric to determine when to switch iclogs a bit more
> aggressively (as opposed to only based on how much space is physically
> left in the iclog buffer). For example, if we end a huge checkpoint with
> a commit record in an active iclog that has a cil context attached
> associated with a hundreds of MBs (handwaving) of log data, perhaps we
> should just switch out of that iclog before the CIL push returns so that
> consumption can be immediately reflected by the AIL.
> 
> Anyways, perhaps there are other thoughts/ideas on this..
> 
> Brian
> 
> > Signed-off-by: Chandan Rajendra <chandanrlinux@gmail.com>
> > ---
> >  fs/xfs/xfs_log.c | 21 +++++++++++++++++++++
> >  1 file changed, 21 insertions(+)
> > 
> > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > index 00e9f5c388d3..dc785a6b9f47 100644
> > --- a/fs/xfs/xfs_log.c
> > +++ b/fs/xfs/xfs_log.c
> > @@ -236,11 +236,32 @@ xlog_grant_head_wait(
> >  	int			need_bytes) __releases(&head->lock)
> >  					    __acquires(&head->lock)
> >  {
> > +	struct xlog_in_core	*iclog;
> > +
> >  	list_add_tail(&tic->t_queue, &head->waiters);
> >  
> >  	do {
> >  		if (XLOG_FORCED_SHUTDOWN(log))
> >  			goto shutdown;
> > +
> > +		if (xfs_ail_min(log->l_ailp) == NULL) {
> > +			spin_lock(&log->l_icloglock);
> > +			iclog = log->l_iclog;
> > +
> > +			if (iclog->ic_state == XLOG_STATE_ACTIVE
> > +				&& iclog->ic_offset) {
> > +				atomic_inc(&iclog->ic_refcnt);
> > +				xlog_state_want_sync(log, iclog);
> > +				spin_unlock(&log->l_icloglock);
> > +				xlog_state_release_iclog(log, iclog);
> > +
> > +				spin_lock(&log->l_icloglock);
> > +				xlog_wait(&iclog->ic_force_wait, &log->l_icloglock);
> > +			} else {
> > +				spin_unlock(&log->l_icloglock);
> > +			}
> > +		}
> > +
> >  		xlog_grant_push_ail(log, need_bytes);
> >  
> >  		__set_current_state(TASK_UNINTERRUPTIBLE);
> 


-- 
chandan




  reply	other threads:[~2019-08-22 12:45 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-21 11:04 [RFC] xfs: Flush iclog containing XLOG_COMMIT_TRANS before waiting for log space Chandan Rajendra
2019-08-21 16:42 ` Brian Foster
2019-08-22 12:46   ` Chandan Rajendra [this message]
2019-08-21 22:18 ` Dave Chinner
2019-08-22 12:40   ` Chandan Rajendra
2019-08-22 13:19     ` Chandan Rajendra
2019-08-22 16:34   ` Brian Foster
2019-08-23  0:06     ` Dave Chinner
2019-08-23 13:08       ` Brian Foster
2019-08-23 13:38         ` Chandan Rajendra
2019-08-25 15:05           ` Chandan Rajendra
2019-08-26  0:32             ` Dave Chinner
2019-08-26 12:39               ` Brian Foster
2019-08-27 23:10                 ` Dave Chinner
2019-08-28 12:06                   ` Brian Foster
2019-08-29  5:21               ` Chandan Rajendra
2019-08-29 23:08                 ` Dave Chinner
2019-08-30  0:34                   ` Dave Chinner
2019-08-30  2:13                     ` Dave Chinner
2019-08-30  6:05                       ` Chandan Rajendra
2019-08-30 17:24                       ` Brian Foster
2019-08-30 22:10                         ` Dave Chinner
2019-08-30 23:18                           ` Brian Foster
     [not found] ` <20190830164750.GD26520@bfoster>
     [not found]   ` <2367290.sgLJaTIShl@localhost.localdomain>
2019-08-31  6:58     ` Chandan Rajendra
2019-08-31 13:07       ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1965562.CDkx7u9Umj@localhost.localdomain \
    --to=chandan@linux.ibm.com \
    --cc=bfoster@redhat.com \
    --cc=chandanrlinux@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).