All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [RFC v5 PATCH 5/9] xfs: automatic log item relog mechanism
Date: Thu, 27 Feb 2020 16:13:45 -0800	[thread overview]
Message-ID: <20200228001345.GS8045@magnolia> (raw)
In-Reply-To: <20200227134321.7238-6-bfoster@redhat.com>

On Thu, Feb 27, 2020 at 08:43:17AM -0500, Brian Foster wrote:
> Now that relog reservation is available and relog state tracking is
> in place, all that remains to automatically relog items is the relog
> mechanism itself. An item with relogging enabled is basically pinned
> from writeback until relog is disabled. Instead of being written
> back, the item must instead be periodically committed in a new
> transaction to move it in the physical log. The purpose of moving
> the item is to avoid long term tail pinning and thus avoid log
> deadlocks for long running operations.
> 
> The ideal time to relog an item is in response to tail pushing
> pressure. This accommodates the current workload at any given time
> as opposed to a fixed time interval or log reservation heuristic,
> which risks performance regression. This is essentially the same
> heuristic that drives metadata writeback. XFS already implements
> various log tail pushing heuristics that attempt to keep the log
> progressing on an active fileystem under various workloads.
> 
> The act of relogging an item simply requires to add it to a
> transaction and commit. This pushes the already dirty item into a
> subsequent log checkpoint and frees up its previous location in the
> on-disk log. Joining an item to a transaction of course requires
> locking the item first, which means we have to be aware of
> type-specific locks and lock ordering wherever the relog takes
> place.
> 
> Fundamentally, this points to xfsaild as the ideal location to
> process relog enabled items. xfsaild already processes log resident
> items, is driven by log tail pushing pressure, processes arbitrary
> log item types through callbacks, and is sensitive to type-specific
> locking rules by design. The fact that automatic relogging
> essentially diverts items between writeback or relog also suggests
> xfsaild as an ideal location to process items one way or the other.
> 
> Of course, we don't want xfsaild to process transactions as it is a
> critical component of the log subsystem for driving metadata
> writeback and freeing up log space. Therefore, similar to how
> xfsaild builds up a writeback queue of dirty items and queues writes
> asynchronously, make xfsaild responsible only for directing pending
> relog items into an appropriate queue and create an async
> (workqueue) context for processing the queue. The workqueue context
> utilizes the pre-reserved relog ticket to drain the queue by rolling
> a permanent transaction.

Aha!  I bet that's that workqueue I was musing about earlier.

> Update the AIL pushing infrastructure to support a new RELOG item
> state. If a log item push returns the relog state, queue the item
> for relog instead of writeback. On completion of a push cycle,
> schedule the relog task at the same point metadata buffer I/O is
> submitted. This allows items to be relogged automatically under the
> same locking rules and pressure heuristics that govern metadata
> writeback.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_trace.h      |   1 +
>  fs/xfs/xfs_trans.h      |   1 +
>  fs/xfs/xfs_trans_ail.c  | 103 +++++++++++++++++++++++++++++++++++++++-
>  fs/xfs/xfs_trans_priv.h |   3 ++
>  4 files changed, 106 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index a066617ec54d..df0114ec66f1 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1063,6 +1063,7 @@ DEFINE_LOG_ITEM_EVENT(xfs_ail_push);
>  DEFINE_LOG_ITEM_EVENT(xfs_ail_pinned);
>  DEFINE_LOG_ITEM_EVENT(xfs_ail_locked);
>  DEFINE_LOG_ITEM_EVENT(xfs_ail_flushing);
> +DEFINE_LOG_ITEM_EVENT(xfs_ail_relog);
>  DEFINE_LOG_ITEM_EVENT(xfs_relog_item);
>  DEFINE_LOG_ITEM_EVENT(xfs_relog_item_cancel);
>  
> diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> index fc4c25b6eee4..1637df32c64c 100644
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -99,6 +99,7 @@ void	xfs_log_item_init(struct xfs_mount *mp, struct xfs_log_item *item,
>  #define XFS_ITEM_PINNED		1
>  #define XFS_ITEM_LOCKED		2
>  #define XFS_ITEM_FLUSHING	3
> +#define XFS_ITEM_RELOG		4
>  
>  /*
>   * Deferred operation item relogging limits.
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index a3fb64275baa..71a47faeaae8 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -144,6 +144,75 @@ xfs_ail_max_lsn(
>  	return lsn;
>  }
>  
> +/*
> + * Relog log items on the AIL relog queue.
> + */
> +static void
> +xfs_ail_relog(
> +	struct work_struct	*work)
> +{
> +	struct xfs_ail		*ailp = container_of(work, struct xfs_ail,
> +						     ail_relog_work);
> +	struct xfs_mount	*mp = ailp->ail_mount;
> +	struct xfs_trans_res	tres = {};
> +	struct xfs_trans	*tp;
> +	struct xfs_log_item	*lip;
> +	int			error;
> +
> +	/*
> +	 * The first transaction to submit a relog item contributed relog
> +	 * reservation to the relog ticket before committing. Create an empty
> +	 * transaction and manually associate the relog ticket.
> +	 */
> +	error = xfs_trans_alloc(mp, &tres, 0, 0, 0, &tp);

Ah, and I see that the work item actually does create its own
transaction to relog the items...

> +	ASSERT(!error);
> +	if (error)
> +		return;
> +	tp->t_log_res = M_RES(mp)->tr_relog.tr_logres;
> +	tp->t_log_count = M_RES(mp)->tr_relog.tr_logcount;
> +	tp->t_flags |= M_RES(mp)->tr_relog.tr_logflags;
> +	tp->t_ticket = xfs_log_ticket_get(ailp->ail_relog_tic);
> +
> +	spin_lock(&ailp->ail_lock);
> +	while ((lip = list_first_entry_or_null(&ailp->ail_relog_list,
> +					       struct xfs_log_item,
> +					       li_trans)) != NULL) {

...but this part really cranks up my curiosity about what happens when
there are more items to relog than there is actual reservation in this
transaction?  I think most transactions types reserve enough space that
we could attach hundreds of relogged intent items.

> +		/*
> +		 * Drop the AIL processing ticket reference once the relog list
> +		 * is emptied. At this point it's possible for our transaction
> +		 * to hold the only reference.
> +		 */
> +		list_del_init(&lip->li_trans);
> +		if (list_empty(&ailp->ail_relog_list))
> +			xfs_log_ticket_put(ailp->ail_relog_tic);
> +		spin_unlock(&ailp->ail_lock);
> +
> +		xfs_trans_add_item(tp, lip);
> +		set_bit(XFS_LI_DIRTY, &lip->li_flags);
> +		tp->t_flags |= XFS_TRANS_DIRTY;
> +		/* XXX: include ticket owner task fix */

XXX?

--D

> +		error = xfs_trans_roll(&tp);
> +		ASSERT(!error);
> +		if (error)
> +			goto out;
> +		spin_lock(&ailp->ail_lock);
> +	}
> +	spin_unlock(&ailp->ail_lock);
> +
> +out:
> +	/* XXX: handle shutdown scenario */
> +	/*
> +	 * Drop the relog reference owned by the transaction separately because
> +	 * we don't want the cancel to release reservation if this isn't the
> +	 * final reference. The relog ticket and associated reservation needs
> +	 * to persist so long as relog items are active in the log subsystem.
> +	 */
> +	xfs_trans_ail_relog_put(mp);
> +
> +	tp->t_ticket = NULL;
> +	xfs_trans_cancel(tp);
> +}
> +
>  /*
>   * The cursor keeps track of where our current traversal is up to by tracking
>   * the next item in the list for us. However, for this to be safe, removing an
> @@ -364,7 +433,7 @@ static long
>  xfsaild_push(
>  	struct xfs_ail		*ailp)
>  {
> -	xfs_mount_t		*mp = ailp->ail_mount;
> +	struct xfs_mount	*mp = ailp->ail_mount;
>  	struct xfs_ail_cursor	cur;
>  	struct xfs_log_item	*lip;
>  	xfs_lsn_t		lsn;
> @@ -426,6 +495,23 @@ xfsaild_push(
>  			ailp->ail_last_pushed_lsn = lsn;
>  			break;
>  
> +		case XFS_ITEM_RELOG:
> +			/*
> +			 * The item requires a relog. Add to the pending relog
> +			 * list and set the relogged bit to prevent further
> +			 * relog requests. The relog bit and ticket reference
> +			 * can be dropped from the item at any point, so hold a
> +			 * relog ticket reference for the pending relog list to
> +			 * ensure the ticket stays around.
> +			 */
> +			trace_xfs_ail_relog(lip);
> +			ASSERT(list_empty(&lip->li_trans));
> +			if (list_empty(&ailp->ail_relog_list))
> +				xfs_log_ticket_get(ailp->ail_relog_tic);
> +			list_add_tail(&lip->li_trans, &ailp->ail_relog_list);
> +			set_bit(XFS_LI_RELOGGED, &lip->li_flags);
> +			break;
> +
>  		case XFS_ITEM_FLUSHING:
>  			/*
>  			 * The item or its backing buffer is already being
> @@ -492,6 +578,9 @@ xfsaild_push(
>  	if (xfs_buf_delwri_submit_nowait(&ailp->ail_buf_list))
>  		ailp->ail_log_flush++;
>  
> +	if (!list_empty(&ailp->ail_relog_list))
> +		queue_work(ailp->ail_relog_wq, &ailp->ail_relog_work);
> +
>  	if (!count || XFS_LSN_CMP(lsn, target) >= 0) {
>  out_done:
>  		/*
> @@ -922,15 +1011,24 @@ xfs_trans_ail_init(
>  	spin_lock_init(&ailp->ail_lock);
>  	INIT_LIST_HEAD(&ailp->ail_buf_list);
>  	init_waitqueue_head(&ailp->ail_empty);
> +	INIT_LIST_HEAD(&ailp->ail_relog_list);
> +	INIT_WORK(&ailp->ail_relog_work, xfs_ail_relog);
> +
> +	ailp->ail_relog_wq = alloc_workqueue("xfs-relog/%s", WQ_FREEZABLE, 0,
> +					     mp->m_super->s_id);
> +	if (!ailp->ail_relog_wq)
> +		goto out_free_ailp;
>  
>  	ailp->ail_task = kthread_run(xfsaild, ailp, "xfsaild/%s",
>  			ailp->ail_mount->m_super->s_id);
>  	if (IS_ERR(ailp->ail_task))
> -		goto out_free_ailp;
> +		goto out_destroy_wq;
>  
>  	mp->m_ail = ailp;
>  	return 0;
>  
> +out_destroy_wq:
> +	destroy_workqueue(ailp->ail_relog_wq);
>  out_free_ailp:
>  	kmem_free(ailp);
>  	return -ENOMEM;
> @@ -944,5 +1042,6 @@ xfs_trans_ail_destroy(
>  
>  	ASSERT(ailp->ail_relog_tic == NULL);
>  	kthread_stop(ailp->ail_task);
> +	destroy_workqueue(ailp->ail_relog_wq);
>  	kmem_free(ailp);
>  }
> diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
> index d1edec1cb8ad..33a724534869 100644
> --- a/fs/xfs/xfs_trans_priv.h
> +++ b/fs/xfs/xfs_trans_priv.h
> @@ -63,6 +63,9 @@ struct xfs_ail {
>  	int			ail_log_flush;
>  	struct list_head	ail_buf_list;
>  	wait_queue_head_t	ail_empty;
> +	struct work_struct	ail_relog_work;
> +	struct list_head	ail_relog_list;
> +	struct workqueue_struct	*ail_relog_wq;
>  	struct xlog_ticket	*ail_relog_tic;
>  };
>  
> -- 
> 2.21.1
> 

  parent reply	other threads:[~2020-02-28  0:13 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-27 13:43 [RFC v5 PATCH 0/9] xfs: automatic relogging experiment Brian Foster
2020-02-27 13:43 ` [RFC v5 PATCH 1/9] xfs: set t_task at wait time instead of alloc time Brian Foster
2020-02-27 20:48   ` Allison Collins
2020-02-27 23:28   ` Darrick J. Wong
2020-02-28  0:10     ` Dave Chinner
2020-02-28 13:46       ` Brian Foster
2020-02-27 13:43 ` [RFC v5 PATCH 2/9] xfs: introduce ->tr_relog transaction Brian Foster
2020-02-27 20:49   ` Allison Collins
2020-02-27 23:31   ` Darrick J. Wong
2020-02-28 13:52     ` Brian Foster
2020-02-27 13:43 ` [RFC v5 PATCH 3/9] xfs: automatic relogging reservation management Brian Foster
2020-02-27 20:49   ` Allison Collins
2020-02-28  0:02   ` Darrick J. Wong
2020-02-28 13:55     ` Brian Foster
2020-03-02  3:07   ` Dave Chinner
2020-03-02 18:06     ` Brian Foster
2020-03-02 23:25       ` Dave Chinner
2020-03-03  4:07         ` Dave Chinner
2020-03-03 15:12           ` Brian Foster
2020-03-03 21:47             ` Dave Chinner
2020-03-03 14:13         ` Brian Foster
2020-03-03 21:26           ` Dave Chinner
2020-03-04 14:03             ` Brian Foster
2020-02-27 13:43 ` [RFC v5 PATCH 4/9] xfs: automatic relogging item management Brian Foster
2020-02-27 21:18   ` Allison Collins
2020-03-02  5:58   ` Dave Chinner
2020-03-02 18:08     ` Brian Foster
2020-02-27 13:43 ` [RFC v5 PATCH 5/9] xfs: automatic log item relog mechanism Brian Foster
2020-02-27 22:54   ` Allison Collins
2020-02-28  0:13   ` Darrick J. Wong [this message]
2020-02-28 14:02     ` Brian Foster
2020-03-02  7:32       ` Dave Chinner
2020-03-02  7:18   ` Dave Chinner
2020-03-02 18:52     ` Brian Foster
2020-03-03  0:06       ` Dave Chinner
2020-03-03 14:14         ` Brian Foster
2020-02-27 13:43 ` [RFC v5 PATCH 6/9] xfs: automatically relog the quotaoff start intent Brian Foster
2020-02-27 23:19   ` Allison Collins
2020-02-28 14:03     ` Brian Foster
2020-02-28 18:55       ` Allison Collins
2020-02-28  1:16   ` Darrick J. Wong
2020-02-28 14:04     ` Brian Foster
2020-02-29  5:35       ` Darrick J. Wong
2020-02-29 12:15         ` Brian Foster
2020-02-27 13:43 ` [RFC v5 PATCH 7/9] xfs: buffer relogging support prototype Brian Foster
2020-02-27 23:33   ` Allison Collins
2020-02-28 14:04     ` Brian Foster
2020-03-02  7:47   ` Dave Chinner
2020-03-02 19:00     ` Brian Foster
2020-03-03  0:09       ` Dave Chinner
2020-03-03 14:14         ` Brian Foster
2020-02-27 13:43 ` [RFC v5 PATCH 8/9] xfs: create an error tag for random relog reservation Brian Foster
2020-02-27 23:35   ` Allison Collins
2020-02-27 13:43 ` [RFC v5 PATCH 9/9] xfs: relog random buffers based on errortag Brian Foster
2020-02-27 23:48   ` Allison Collins
2020-02-28 14:06     ` Brian Foster
2020-02-27 15:09 ` [RFC v5 PATCH 0/9] xfs: automatic relogging experiment Darrick J. Wong
2020-02-27 15:18   ` Brian Foster
2020-02-27 15:22     ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200228001345.GS8045@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.