linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Chandan Rajendra <chandan@linux.ibm.com>,
	Chandan Rajendra <chandanrlinux@gmail.com>,
	linux-xfs@vger.kernel.org, darrick.wong@oracle.com,
	hch@infradead.org
Subject: Re: [RFC] xfs: Flush iclog containing XLOG_COMMIT_TRANS before waiting for log space
Date: Fri, 30 Aug 2019 13:24:47 -0400	[thread overview]
Message-ID: <20190830172447.GE26520@bfoster> (raw)
In-Reply-To: <20190830021329.GB1119@dread.disaster.area>

On Fri, Aug 30, 2019 at 12:13:29PM +1000, Dave Chinner wrote:
> On Fri, Aug 30, 2019 at 10:34:41AM +1000, Dave Chinner wrote:
> > On Fri, Aug 30, 2019 at 09:08:17AM +1000, Dave Chinner wrote:
> > > On Thu, Aug 29, 2019 at 10:51:59AM +0530, Chandan Rajendra wrote:
> > > > 	 786576: kworker/4:1H-kb  1825 [004]   217.041079:                       xfs:xfs_log_assign_tail_lsn: dev 7:1 new tail lsn 2/19333, old lsn 2/19330, last sync 3/18501
> > > 
> > > 200ms later the tail has moved, and last_sync_lsn is now 3/18501.
> > > i.e. the iclog writes have made it to disk, and the items have been
> > > moved into the AIL. I don't know where that came from, but I'm
> > > assuming it's an IO completion based on it being run from a
> > > kworker context that doesn't have an "xfs-" name prefix(*).
> > > 
> > > As the tail has moved, this should have woken the anything sleeping
> > > on the log tail in xlog_grant_head_wait() via a call to
> > > xfs_log_space_wake(). The first waiter should wake, see that there
> > > still isn't room in the log (only 3 sectors were freed in the log,
> > > we need at least 60). That woken process should then run
> > > xlog_grant_push_ail() again and go back to sleep.
> > 
> > Actually, it doesn't get woken because xlog_grant_head_wake() checks
> > how much space is available before waking waiters, and there clearly
> > isn't enough here. So that's one likely vector. Can you try this
> > patch?
> 
> And this one on top to address the situation the previous patch
> doesn't....
> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> xfs: push the grant head when the log head moves forward
> 
> From: Dave Chinner <dchinner@redhat.com>
> 
> When the log fills up, we can get into the state where the
> outstanding items in the CIL being committed and aggregated are
> larger than the range that the reservation grant head tail pushing
> will attempt to clean. This can result in the tail pushing range
> being trimmed back to the the log head (l_last_sync_lsn) and so
> may not actually move the push target at all.
> 
> When the iclogs associated with the CIL commit finally land, the
> log head moves forward, and this removes the restriction on the AIL
> push target. However, if we already have transactions sleeping on
> the grant head, and there's nothing in the AIL still to flush from
> the current push target, then nothing will move the tail of the log
> and trigger a log reservation wakeup.
> 
> Hence the there is nothing that will trigger xlog_grant_push_ail()
> to recalculate the AIL push target and start pushing on the AIL
> again to write back the metadata objects that pin the tail of the
> log and hence free up space and allow the transaction reservations
> to be woken and make progress.
> 
> Hence we need to push on the grant head when we move the log head
> forward, as this may be the only trigger we have that can move the
> AIL push target forwards in this situation.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log.c | 65 ++++++++++++++++++++++++++++++++++----------------------
>  1 file changed, 40 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 941f10ff99d9..ce46bb244442 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -2612,6 +2612,45 @@ xlog_get_lowest_lsn(
>  	return lowest_lsn;
>  }
>  
> +
> +/*
> + * Completion of a iclog IO does not imply that a transaction has completed, as
> + * transactions can be large enough to span many iclogs. We cannot change the
> + * tail of the log half way through a transaction as this may be the only
> + * transaction in the log and moving the tail to point to the middle of it
> + * will prevent recovery from finding the start of the transaction. Hence we
> + * should only update the last_sync_lsn if this iclog contains transaction
> + * completion callbacks on it.
> + *
> + * We have to do this before we drop the icloglock to ensure we are the only one
> + * that can update it.
> + *
> + * If we are moving the last_sync_lsn forwards, we also need to ensure we kick
> + * the reservation grant head pushing. This is due to the fact that the push
> + * target is bound by the current last_sync_lsn value. Hence if we have a large
> + * amount of log space bound up in this committing transaction then the
> + * last_sync_lsn value may be the limiting factor preventing tail pushing from
> + * freeing space in the log. Hence once we've updated the last_sync_lsn we
> + * should push the AIL to ensure the push target (and hence the grant head) is
> + * no longer bound by the old log head location and can move forwards and make
> + * progress again.
> + */
> +static void
> +xlog_iclog_iodone(
> +	struct xlog		*log,
> +	struct xlog_in_core	*iclog)
> +{
> +	ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn),
> +		be64_to_cpu(iclog->ic_header.h_lsn)) <= 0);
> +
> +	if (list_empty_careful(&iclog->ic_callbacks))
> +		return;
> +
> +	atomic64_set(&log->l_last_sync_lsn, be64_to_cpu(iclog->ic_header.h_lsn));
> +	xlog_grant_push_ail(log, 0);

I think the intent of the patch makes sense but the wakeup is misplaced.
First, this wakeup technically occurs before AIL population because we
haven't invoked the iclog callbacks yet. Second, your first patch
already adds an AIL push in the log I/O completion path via
xfs_trans_ail_update_bulk() -> xfs_log_space_wake(). The problem may
just be that it's only invoked on tail changes and not empty ->
populated state changes. If so, that could be addressed with a check for
mlip == NULL in xfs_trans_ail_update_bulk().

(Actually, the first patch alone seems to be enough to prevent the
problem in my tests. I suppose I could be seeing a variation of the race
from what Chandan sees and a wakeup from the second CIL ctx ticket
release or something is saving me...).

Brian

> +
> +}
> +
>  STATIC void
>  xlog_state_do_callback(
>  	struct xlog		*log,
> @@ -2710,31 +2749,7 @@ xlog_state_do_callback(
>  				}
>  
>  				iclog->ic_state = XLOG_STATE_CALLBACK;
> -
> -
> -				/*
> -				 * Completion of a iclog IO does not imply that
> -				 * a transaction has completed, as transactions
> -				 * can be large enough to span many iclogs. We
> -				 * cannot change the tail of the log half way
> -				 * through a transaction as this may be the only
> -				 * transaction in the log and moving th etail to
> -				 * point to the middle of it will prevent
> -				 * recovery from finding the start of the
> -				 * transaction. Hence we should only update the
> -				 * last_sync_lsn if this iclog contains
> -				 * transaction completion callbacks on it.
> -				 *
> -				 * We have to do this before we drop the
> -				 * icloglock to ensure we are the only one that
> -				 * can update it.
> -				 */
> -				ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn),
> -					be64_to_cpu(iclog->ic_header.h_lsn)) <= 0);
> -				if (!list_empty_careful(&iclog->ic_callbacks))
> -					atomic64_set(&log->l_last_sync_lsn,
> -						be64_to_cpu(iclog->ic_header.h_lsn));
> -
> +				xlog_iclog_iodone(log, iclog);
>  			} else
>  				ioerrors++;
>  

  parent reply	other threads:[~2019-08-30 17:24 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-21 11:04 [RFC] xfs: Flush iclog containing XLOG_COMMIT_TRANS before waiting for log space Chandan Rajendra
2019-08-21 16:42 ` Brian Foster
2019-08-22 12:46   ` Chandan Rajendra
2019-08-21 22:18 ` Dave Chinner
2019-08-22 12:40   ` Chandan Rajendra
2019-08-22 13:19     ` Chandan Rajendra
2019-08-22 16:34   ` Brian Foster
2019-08-23  0:06     ` Dave Chinner
2019-08-23 13:08       ` Brian Foster
2019-08-23 13:38         ` Chandan Rajendra
2019-08-25 15:05           ` Chandan Rajendra
2019-08-26  0:32             ` Dave Chinner
2019-08-26 12:39               ` Brian Foster
2019-08-27 23:10                 ` Dave Chinner
2019-08-28 12:06                   ` Brian Foster
2019-08-29  5:21               ` Chandan Rajendra
2019-08-29 23:08                 ` Dave Chinner
2019-08-30  0:34                   ` Dave Chinner
2019-08-30  2:13                     ` Dave Chinner
2019-08-30  6:05                       ` Chandan Rajendra
2019-08-30 17:24                       ` Brian Foster [this message]
2019-08-30 22:10                         ` Dave Chinner
2019-08-30 23:18                           ` Brian Foster
     [not found] ` <20190830164750.GD26520@bfoster>
     [not found]   ` <2367290.sgLJaTIShl@localhost.localdomain>
2019-08-31  6:58     ` Chandan Rajendra
2019-08-31 13:07       ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190830172447.GE26520@bfoster \
    --to=bfoster@redhat.com \
    --cc=chandan@linux.ibm.com \
    --cc=chandanrlinux@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).