linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: John Garry <john.g.garry@oracle.com>
Cc: axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me,
	jejb@linux.ibm.com, martin.petersen@oracle.com,
	djwong@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org,
	chandan.babu@oracle.com, dchinner@redhat.com,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nvme@lists.infradead.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, tytso@mit.edu, jbongio@google.com,
	linux-api@vger.kernel.org
Subject: Re: [PATCH 13/21] fs: xfs: Make file data allocations observe the 'forcealign' flag
Date: Tue, 3 Oct 2023 12:42:53 +1100	[thread overview]
Message-ID: <ZRtxnc7LpOlxZnON@dread.disaster.area> (raw)
In-Reply-To: <20230929102726.2985188-14-john.g.garry@oracle.com>

On Fri, Sep 29, 2023 at 10:27:18AM +0000, John Garry wrote:
> From: "Darrick J. Wong" <djwong@kernel.org>
> 
> The existing extsize hint code already did the work of expanding file
> range mapping requests so that the range is aligned to the hint value.
> Now add the code we need to guarantee that the space allocations are
> also always aligned.
> 
> XXX: still need to check all this with reflink
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Co-developed-by: John Garry <john.g.garry@oracle.com>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c | 22 +++++++++++++++++-----
>  fs/xfs/xfs_iomap.c       |  4 +++-
>  2 files changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 328134c22104..6c864dc0a6ff 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3328,6 +3328,19 @@ xfs_bmap_compute_alignments(
>  		align = xfs_get_cowextsz_hint(ap->ip);
>  	else if (ap->datatype & XFS_ALLOC_USERDATA)
>  		align = xfs_get_extsz_hint(ap->ip);
> +
> +	/*
> +	 * xfs_get_cowextsz_hint() returns extsz_hint for when forcealign is
> +	 * set as forcealign and cowextsz_hint are mutually exclusive
> +	 */
> +	if (xfs_inode_forcealign(ap->ip) && align) {
> +		args->alignment = align;
> +		if (stripe_align % align)
> +			stripe_align = align;
> +	} else {
> +		args->alignment = 1;
> +	}

This smells wrong.

If a filesystem has a stripe unit set (hence stripe_align is
non-zero) then any IO that crosses stripe unit boundaries will not
be atomic - they will require multiple IOs to different devices.

Hence if the filesystem has a stripe unit set, then all forced
alignment hints for atomic IO *must* be an exact integer divider
of the stripe unit. hence when an atomic IO bundle is aligned, the
atomic boundaries within the bundle always fall on a stripe unit
boundary and never cross devices.

IOWs, for a striped filesystem, the maximum size/alignment for a
single atomic IO unit is the stripe unit.

This should be enforced when the forced align flag is set on the
inode (i.e. from the ioctl)


> +
>  	if (align) {
>  		if (xfs_bmap_extsize_align(mp, &ap->got, &ap->prev, align, 0,
>  					ap->eof, 0, ap->conv, &ap->offset,
> @@ -3423,7 +3436,6 @@ xfs_bmap_exact_minlen_extent_alloc(
>  	args.minlen = args.maxlen = ap->minlen;
>  	args.total = ap->total;
>  
> -	args.alignment = 1;
>  	args.minalignslop = 0;
>  
>  	args.minleft = ap->minleft;
> @@ -3469,6 +3481,7 @@ xfs_bmap_btalloc_at_eof(
>  {
>  	struct xfs_mount	*mp = args->mp;
>  	struct xfs_perag	*caller_pag = args->pag;
> +	int			orig_alignment = args->alignment;
>  	int			error;
>  
>  	/*
> @@ -3543,10 +3556,10 @@ xfs_bmap_btalloc_at_eof(
>  
>  	/*
>  	 * Allocation failed, so turn return the allocation args to their
> -	 * original non-aligned state so the caller can proceed on allocation
> -	 * failure as if this function was never called.
> +	 * original state so the caller can proceed on allocation failure as
> +	 * if this function was never called.
>  	 */
> -	args->alignment = 1;
> +	args->alignment = orig_alignment;
>  	return 0;
>  }

Urk. Not sure that is right, it's certainly a change of behaviour.

> @@ -3694,7 +3707,6 @@ xfs_bmap_btalloc(
>  		.wasdel		= ap->wasdel,
>  		.resv		= XFS_AG_RESV_NONE,
>  		.datatype	= ap->datatype,
> -		.alignment	= 1,
>  		.minalignslop	= 0,
>  	};
>  	xfs_fileoff_t		orig_offset;
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 18c8f168b153..70fe873951f3 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -181,7 +181,9 @@ xfs_eof_alignment(
>  		 * If mounted with the "-o swalloc" option the alignment is
>  		 * increased from the strip unit size to the stripe width.
>  		 */
> -		if (mp->m_swidth && xfs_has_swalloc(mp))
> +		if (xfs_inode_forcealign(ip))
> +			align = xfs_get_extsz_hint(ip);
> +		else if (mp->m_swidth && xfs_has_swalloc(mp))
>  			align = mp->m_swidth;
>  		else if (mp->m_dalign)
>  			align = mp->m_dalign;

Ah. Now I see. This abuses the stripe alignment code to try to
implement this new inode allocation alignment restriction, rather
than just making the extent size hint alignment mandatory....

Yeah, this can be done better... :)

As it is, I have been working on a series that reworks all this
allocator code to separate out the aligned IO from the exact EOF
allocation case to help clean this up for better perag selection
during allocation. I think that needs to be done first before we go
making the alignment code more intricate like this....

-Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2023-10-03  1:43 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-29 10:27 [PATCH 00/21] block atomic writes John Garry
2023-09-29 10:27 ` [PATCH 01/21] block: Add atomic write operations to request_queue limits John Garry
2023-10-03 16:40   ` Bart Van Assche
2023-10-04  3:00     ` Martin K. Petersen
2023-10-04 17:28       ` Bart Van Assche
2023-10-04 18:26         ` Martin K. Petersen
2023-10-04 21:00       ` Bart Van Assche
2023-10-05  8:22         ` John Garry
2023-11-09 15:10   ` Christoph Hellwig
2023-11-09 17:01     ` John Garry
2023-11-10  6:23       ` Christoph Hellwig
2023-11-10  9:04         ` John Garry
2023-09-29 10:27 ` [PATCH 02/21] block: Limit atomic writes according to bio and queue limits John Garry
2023-11-09 15:13   ` Christoph Hellwig
2023-11-09 17:41     ` John Garry
2023-12-04  3:19   ` Ming Lei
2023-12-04  3:55     ` Ming Lei
2023-12-04  9:35       ` John Garry
2023-09-29 10:27 ` [PATCH 03/21] fs/bdev: Add atomic write support info to statx John Garry
2023-09-29 22:49   ` Eric Biggers
2023-10-01 13:23     ` Bart Van Assche
2023-10-02  9:51       ` John Garry
2023-10-02 18:39         ` Bart Van Assche
2023-10-03  0:28           ` Martin K. Petersen
2023-11-09 15:15             ` Christoph Hellwig
2023-10-03  1:51         ` Dave Chinner
2023-10-03  2:57           ` Darrick J. Wong
2023-10-03  7:23             ` John Garry
2023-10-03 15:46               ` Darrick J. Wong
2023-10-04 14:19                 ` John Garry
2023-09-29 10:27 ` [PATCH 04/21] fs: Add RWF_ATOMIC and IOCB_ATOMIC flags for atomic write support John Garry
2023-10-06 18:15   ` Jeremy Bongio
2023-10-09 22:02     ` Dave Chinner
2023-09-29 10:27 ` [PATCH 05/21] block: Add REQ_ATOMIC flag John Garry
2023-09-29 10:27 ` [PATCH 06/21] block: Pass blk_queue_get_max_sectors() a request pointer John Garry
2023-09-29 10:27 ` [PATCH 07/21] block: Limit atomic write IO size according to atomic_write_max_sectors John Garry
2023-09-29 10:27 ` [PATCH 08/21] block: Error an attempt to split an atomic write bio John Garry
2023-09-29 10:27 ` [PATCH 09/21] block: Add checks to merging of atomic writes John Garry
2023-09-30 13:40   ` kernel test robot
2023-10-02 22:50     ` Nathan Chancellor
2023-10-04 11:40       ` John Garry
2023-09-29 10:27 ` [PATCH 10/21] block: Add fops atomic write support John Garry
2023-09-29 17:51   ` Bart Van Assche
2023-10-02 10:10     ` John Garry
2023-10-02 19:12       ` Bart Van Assche
2023-10-03  0:48         ` Martin K. Petersen
2023-10-03 16:55           ` Bart Van Assche
2023-10-04  2:53             ` Martin K. Petersen
2023-10-04 17:22               ` Bart Van Assche
2023-10-04 18:17                 ` Martin K. Petersen
2023-10-05 17:10                   ` Bart Van Assche
2023-10-05 22:36                     ` Dave Chinner
2023-10-05 22:58                       ` Bart Van Assche
2023-10-06  4:31                         ` Dave Chinner
2023-10-06 17:22                           ` Bart Van Assche
2023-10-07  1:21                             ` Martin K. Petersen
2023-10-03  8:37         ` John Garry
2023-10-03 16:45           ` Bart Van Assche
2023-10-04  9:14             ` John Garry
2023-10-04 17:34               ` Bart Van Assche
2023-10-04 21:59                 ` Dave Chinner
2023-12-04  2:30   ` Ming Lei
2023-12-04  9:27     ` John Garry
2023-12-04 12:18       ` Ming Lei
2023-12-04 13:13         ` John Garry
2023-12-05  1:45           ` Ming Lei
2023-12-05 10:49             ` John Garry
2023-09-29 10:27 ` [PATCH 11/21] fs: xfs: Don't use low-space allocator for alignment > 1 John Garry
2023-10-03  1:16   ` Dave Chinner
2023-10-03  3:00     ` Darrick J. Wong
2023-10-03  4:34       ` Dave Chinner
2023-10-03 10:22       ` John Garry
2023-09-29 10:27 ` [PATCH 12/21] fs: xfs: Introduce FORCEALIGN inode flag John Garry
2023-11-09 15:24   ` Christoph Hellwig
2023-09-29 10:27 ` [PATCH 13/21] fs: xfs: Make file data allocations observe the 'forcealign' flag John Garry
2023-10-03  1:42   ` Dave Chinner [this message]
2023-10-03 10:13     ` John Garry
2023-09-29 10:27 ` [PATCH 14/21] fs: xfs: Enable file data forcealign feature John Garry
2023-09-29 10:27 ` [PATCH 15/21] fs: xfs: Support atomic write for statx John Garry
2023-10-03  3:32   ` Dave Chinner
2023-10-03 10:56     ` John Garry
2023-10-03 16:10       ` Darrick J. Wong
2023-09-29 10:27 ` [PATCH 16/21] fs: iomap: Atomic write support John Garry
2023-10-03  4:24   ` Dave Chinner
2023-10-03 12:55     ` John Garry
2023-10-03 16:47     ` Darrick J. Wong
2023-10-04  1:16       ` Dave Chinner
2023-10-24 12:59     ` John Garry
2023-09-29 10:27 ` [PATCH 17/21] fs: xfs: iomap atomic " John Garry
2023-11-09 15:26   ` Christoph Hellwig
2023-11-10 10:42     ` John Garry
2023-11-28  8:56       ` John Garry
2023-11-28 13:56         ` Christoph Hellwig
2023-11-28 17:42           ` John Garry
2023-11-29  2:45             ` Martin K. Petersen
2023-12-04 13:45             ` Christoph Hellwig
2023-12-04 15:19               ` John Garry
2023-12-04 15:39                 ` Christoph Hellwig
2023-12-04 18:06                   ` John Garry
2023-12-05  4:55                 ` Theodore Ts'o
2023-12-05 11:09                   ` John Garry
2023-12-05 13:59                 ` Ming Lei
2023-09-29 10:27 ` [PATCH 18/21] scsi: sd: Support reading atomic properties from block limits VPD John Garry
2023-09-29 17:54   ` Bart Van Assche
2023-10-02 11:27     ` John Garry
2023-10-06 17:52       ` Bart Van Assche
2023-10-06 23:48         ` Martin K. Petersen
2023-09-29 10:27 ` [PATCH 19/21] scsi: sd: Add WRITE_ATOMIC_16 support John Garry
2023-09-29 17:59   ` Bart Van Assche
2023-10-02 11:36     ` John Garry
2023-10-02 19:21       ` Bart Van Assche
2023-09-29 10:27 ` [PATCH 20/21] scsi: scsi_debug: Atomic write support John Garry
2023-09-29 10:27 ` [PATCH 21/21] nvme: Support atomic writes John Garry
     [not found]   ` <CGME20231004113943eucas1p23a51ce5ef06c36459f826101bb7b85fc@eucas1p2.samsung.com>
2023-10-04 11:39     ` Pankaj Raghav
2023-10-05 10:24       ` John Garry
2023-10-05 13:32         ` Pankaj Raghav
2023-10-05 15:05           ` John Garry
2023-11-09 15:36   ` Christoph Hellwig
2023-11-09 15:42     ` Matthew Wilcox
2023-11-09 15:46       ` Christoph Hellwig
2023-11-09 19:08         ` John Garry
2023-11-10  6:29           ` Christoph Hellwig
2023-11-10  8:44             ` John Garry
2023-09-29 14:58 ` [PATCH 00/21] block " Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZRtxnc7LpOlxZnON@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=chandan.babu@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jbongio@google.com \
    --cc=jejb@linux.ibm.com \
    --cc=john.g.garry@oracle.com \
    --cc=kbusch@kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=sagi@grimberg.me \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).