All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-xfs@vger.kernel.org, linux-block@vger.kernel.org, hch@lst.de
Subject: Re: [PATCH 2/2] xfs: add 'discard_sync' mount flag
Date: Mon, 30 Apr 2018 13:19:46 -0400	[thread overview]
Message-ID: <20180430171945.GB22176@bfoster.bfoster> (raw)
In-Reply-To: <1525102372-8430-3-git-send-email-axboe@kernel.dk>

On Mon, Apr 30, 2018 at 09:32:52AM -0600, Jens Axboe wrote:
> XFS recently added support for async discards. While this can be
> a win for some workloads and devices, there are also cases where
> async bursty discard will severly harm the latencies of reads
> and writes.
> 
> Add a 'discard_sync' mount flag to revert to using sync discard,
> issuing them one at the time and waiting for each one. This fixes
> a big performance regression we had moving to kernels that include
> the XFS async discard support.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---

Hm, I figured the async discard stuff would have been a pretty clear win
all around, but then again I'm not terribly familiar with what happens
with discards beneath the fs. I do know that the previous behavior would
cause fs level latencies due to holding up log I/O completion while
discards completed one at a time. My understanding is that this lead to
online discard being pretty much universally "not recommended" in favor
of fstrim.

Do you have any more data around the workload where the old sync discard
behavior actually provides an overall win over the newer behavior? Is it
purely a matter of workload, or is it a workload+device thing with how
discards may be handled by certain devices?

I'm ultimately not against doing this if it's useful for somebody and is
otherwise buried under a mount option, but it would be nice to see if
there's opportunity to improve the async mechanism before resorting to
that. Is the issue here too large bio chains, too many chains at once,
or just simply too many discards (regardless of chaining) at the same
time?

I'm wondering if xfs could separate discard submission from log I/O
completion entirely and then perhaps limit/throttle submission somehow
or another (Christoph, thoughts?) via a background task. Perhaps doing
something like that may even eliminate the need for some discards on
busy filesystems with a lot of block free -> reallocation activity, but
I'm just handwaving atm.

Brian

>  fs/xfs/xfs_log_cil.c | 18 ++++++++++++++----
>  fs/xfs/xfs_mount.h   |  1 +
>  fs/xfs/xfs_super.c   |  7 ++++++-
>  3 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 4668403b1741..4eced3eceea5 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -552,10 +552,16 @@ xlog_discard_busy_extents(
>  	struct bio		*bio = NULL;
>  	struct blk_plug		plug;
>  	int			error = 0;
> +	int			discard_flags = 0;
> +	bool			sync_discard = mp->m_flags & XFS_MOUNT_DISCARD_SYNC;
>  
>  	ASSERT(mp->m_flags & XFS_MOUNT_DISCARD);
>  
> -	blk_start_plug(&plug);
> +	if (!sync_discard)
> +		blk_start_plug(&plug);
> +	else
> +		discard_flags = BLKDEV_DISCARD_SYNC;
> +
>  	list_for_each_entry(busyp, list, list) {
>  		trace_xfs_discard_extent(mp, busyp->agno, busyp->bno,
>  					 busyp->length);
> @@ -563,7 +569,7 @@ xlog_discard_busy_extents(
>  		error = __blkdev_issue_discard(mp->m_ddev_targp->bt_bdev,
>  				XFS_AGB_TO_DADDR(mp, busyp->agno, busyp->bno),
>  				XFS_FSB_TO_BB(mp, busyp->length),
> -				GFP_NOFS, 0, &bio);
> +				GFP_NOFS, discard_flags, &bio);
>  		if (error && error != -EOPNOTSUPP) {
>  			xfs_info(mp,
>  	 "discard failed for extent [0x%llx,%u], error %d",
> @@ -574,14 +580,18 @@ xlog_discard_busy_extents(
>  		}
>  	}
>  
> -	if (bio) {
> +	if (sync_discard) {
> +		xfs_extent_busy_clear(mp, &ctx->busy_extents, false);
> +		kmem_free(ctx);
> +	} else if (bio) {
>  		bio->bi_private = ctx;
>  		bio->bi_end_io = xlog_discard_endio;
>  		submit_bio(bio);
> +		blk_finish_plug(&plug);
>  	} else {
>  		xlog_discard_endio_work(&ctx->discard_endio_work);
> +		blk_finish_plug(&plug);
>  	}
> -	blk_finish_plug(&plug);
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 10b90bbc5162..3f0b7b9106c7 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -219,6 +219,7 @@ typedef struct xfs_mount {
>  						   operations, typically for
>  						   disk errors in metadata */
>  #define XFS_MOUNT_DISCARD	(1ULL << 5)	/* discard unused blocks */
> +#define XFS_MOUNT_DISCARD_SYNC	(1ULL << 6)	/* discard unused blocks sync */
>  #define XFS_MOUNT_NOALIGN	(1ULL << 7)	/* turn off stripe alignment
>  						   allocations */
>  #define XFS_MOUNT_ATTR2		(1ULL << 8)	/* allow use of attr2 format */
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index d71424052917..6d960bb4725f 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -83,7 +83,7 @@ enum {
>  	Opt_quota, Opt_noquota, Opt_usrquota, Opt_grpquota, Opt_prjquota,
>  	Opt_uquota, Opt_gquota, Opt_pquota,
>  	Opt_uqnoenforce, Opt_gqnoenforce, Opt_pqnoenforce, Opt_qnoenforce,
> -	Opt_discard, Opt_nodiscard, Opt_dax, Opt_err,
> +	Opt_discard, Opt_nodiscard, Opt_dax, Opt_err, Opt_discard_sync,
>  };
>  
>  static const match_table_t tokens = {
> @@ -129,6 +129,7 @@ static const match_table_t tokens = {
>  	{Opt_pqnoenforce,"pqnoenforce"},/* project quota limit enforcement */
>  	{Opt_qnoenforce, "qnoenforce"},	/* same as uqnoenforce */
>  	{Opt_discard,	"discard"},	/* Discard unused blocks */
> +	{Opt_discard_sync,"discard_sync"},/* Discard unused blocks sync */
>  	{Opt_nodiscard,	"nodiscard"},	/* Do not discard unused blocks */
>  
>  	{Opt_dax,	"dax"},		/* Enable direct access to bdev pages */
> @@ -363,11 +364,15 @@ xfs_parseargs(
>  			mp->m_qflags |= (XFS_GQUOTA_ACCT | XFS_GQUOTA_ACTIVE);
>  			mp->m_qflags &= ~XFS_GQUOTA_ENFD;
>  			break;
> +		case Opt_discard_sync:
> +			mp->m_flags |= XFS_MOUNT_DISCARD_SYNC;
> +			/* fall through and set XFS_MOUNT_DISCARD too */
>  		case Opt_discard:
>  			mp->m_flags |= XFS_MOUNT_DISCARD;
>  			break;
>  		case Opt_nodiscard:
>  			mp->m_flags &= ~XFS_MOUNT_DISCARD;
> +			mp->m_flags &= ~XFS_MOUNT_DISCARD_SYNC;
>  			break;
>  #ifdef CONFIG_FS_DAX
>  		case Opt_dax:
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2018-04-30 17:19 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-30 15:32 [PATCHSET 0/2] sync discard Jens Axboe
2018-04-30 15:32 ` [PATCH 1/2] block: add BLKDEV_DISCARD_SYNC flag Jens Axboe
2018-04-30 15:32 ` [PATCH 2/2] xfs: add 'discard_sync' mount flag Jens Axboe
2018-04-30 17:19   ` Brian Foster [this message]
2018-04-30 18:07     ` Jens Axboe
2018-04-30 18:25       ` Luis R. Rodriguez
2018-04-30 18:31         ` Jens Axboe
2018-04-30 19:19         ` Eric Sandeen
2018-04-30 19:21           ` Jens Axboe
2018-04-30 19:57             ` Eric Sandeen
2018-04-30 19:58               ` Jens Axboe
2018-04-30 22:59                 ` Eric Sandeen
2018-04-30 23:02                   ` Jens Axboe
2018-04-30 19:18       ` Brian Foster
2018-04-30 21:31   ` Dave Chinner
2018-04-30 21:42     ` Jens Axboe
2018-04-30 22:28       ` Dave Chinner
2018-04-30 22:40         ` Jens Axboe
2018-04-30 23:00           ` Jens Axboe
2018-04-30 23:23             ` Dave Chinner
2018-05-01 11:11               ` Brian Foster
2018-05-01 15:23               ` Jens Axboe
2018-05-02  2:54                 ` Martin K. Petersen
2018-05-02 14:20                   ` Jens Axboe
2018-04-30 23:01           ` Darrick J. Wong
2018-05-02 12:45 ` [PATCHSET 0/2] sync discard Christoph Hellwig
2018-05-02 14:19   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180430171945.GB22176@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.