linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: John Garry <john.g.garry@oracle.com>
Cc: axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me,
	jejb@linux.ibm.com, martin.petersen@oracle.com,
	djwong@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org,
	chandan.babu@oracle.com, dchinner@redhat.com,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nvme@lists.infradead.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, tytso@mit.edu, jbongio@google.com,
	linux-api@vger.kernel.org
Subject: Re: [PATCH 15/21] fs: xfs: Support atomic write for statx
Date: Tue, 3 Oct 2023 14:32:16 +1100	[thread overview]
Message-ID: <ZRuLQKKPCzyUZtC9@dread.disaster.area> (raw)
In-Reply-To: <20230929102726.2985188-16-john.g.garry@oracle.com>

On Fri, Sep 29, 2023 at 10:27:20AM +0000, John Garry wrote:
> Support providing info on atomic write unit min and max for an inode.
> 
> For simplicity, currently we limit the min at the FS block size, but a
> lower limit could be supported in future.
> 
> The atomic write unit min and max is limited by the guaranteed extent
> alignment for the inode.
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  fs/xfs/xfs_iops.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_iops.h |  4 ++++
>  2 files changed, 55 insertions(+)
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 1c1e6171209d..5bff80748223 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -546,6 +546,46 @@ xfs_stat_blksize(
>  	return PAGE_SIZE;
>  }
>  
> +void xfs_ip_atomic_write_attr(struct xfs_inode *ip,
> +			xfs_filblks_t *unit_min_fsb,
> +			xfs_filblks_t *unit_max_fsb)

Formatting.

Also, we don't use variable name shorthand for function names -
xfs_get_atomic_write_hint(ip) to match xfs_get_extsz_hint(ip)
would be appropriate, right?



> +{
> +	xfs_extlen_t		extsz_hint = xfs_get_extsz_hint(ip);
> +	struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
> +	struct block_device	*bdev = target->bt_bdev;
> +	struct xfs_mount	*mp = ip->i_mount;
> +	xfs_filblks_t		atomic_write_unit_min,
> +				atomic_write_unit_max,
> +				align;
> +
> +	atomic_write_unit_min = XFS_B_TO_FSB(mp,
> +		queue_atomic_write_unit_min_bytes(bdev->bd_queue));
> +	atomic_write_unit_max = XFS_B_TO_FSB(mp,
> +		queue_atomic_write_unit_max_bytes(bdev->bd_queue));

These should be set in the buftarg at mount time, like we do with
sector size masks. Then we don't need to convert them to fsbs on
every single lookup.

> +	/* for RT, unset extsize gives hint of 1 */
> +	/* for !RT, unset extsize gives hint of 0 */
> +	if (extsz_hint && (XFS_IS_REALTIME_INODE(ip) ||
> +	    (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN)))

Logic is non-obvious. The compound is (rt || force), not
(extsz && rt), so it took me a while to actually realise I read this
incorrectly.

	if (extsz_hint &&
	    (XFS_IS_REALTIME_INODE(ip) ||
	     (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN))) {

> +		align = extsz_hint;
> +	else
> +		align = 1;

And now the logic looks wrong to me. We don't want to use extsz hint
for RT inodes if force align is not set, this will always use it
regardless of the fact it has nothing to do with force alignment.

Indeed, if XFS_DIFLAG2_FORCEALIGN is not set, then shouldn't this
always return min/max = 0 because atomic alignments are not in us on
this inode?

i.e. the first thing this code should do is:

	*unit_min_fsb = 0;
	*unit_max_fsb = 0;
	if (!(ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN))
		return;

Then we can check device support:

	if (!buftarg->bt_atomic_write_max)
		return;

Then we can check for extent size hints. If that's not set:

	align = xfs_get_extsz_hint(ip);
	if (align <= 1) {
		unit_min_fsb = 1;
		unit_max_fsb = 1;
		return;
	}

And finally, if there is an extent size hint, we can return that.

> +	if (atomic_write_unit_max == 0) {
> +		*unit_min_fsb = 0;
> +		*unit_max_fsb = 0;
> +	} else if (atomic_write_unit_min == 0) {
> +		*unit_min_fsb = 1;
> +		*unit_max_fsb = min_t(xfs_filblks_t, atomic_write_unit_max,
> +					align);

Why is it valid for a device to have a zero minimum size? If it can
set a maximum, it should -always- set a minimum size as logical
sector size is a valid lower bound, yes?

> +	} else {
> +		*unit_min_fsb = min_t(xfs_filblks_t, atomic_write_unit_min,
> +					align);
> +		*unit_max_fsb = min_t(xfs_filblks_t, atomic_write_unit_max,
> +					align);
> +	}

Nothing here guarantees the power-of-2 sizes that the RWF_ATOMIC
user interface requires....

It also doesn't check that the extent size hint is aligned with
atomic write units.

It also doesn't check either against stripe unit alignment....

> +}
> +
>  STATIC int
>  xfs_vn_getattr(
>  	struct mnt_idmap	*idmap,
> @@ -614,6 +654,17 @@ xfs_vn_getattr(
>  			stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
>  			stat->dio_offset_align = bdev_logical_block_size(bdev);
>  		}
> +		if (request_mask & STATX_WRITE_ATOMIC) {
> +			xfs_filblks_t unit_min_fsb, unit_max_fsb;
> +
> +			xfs_ip_atomic_write_attr(ip, &unit_min_fsb,
> +				&unit_max_fsb);
> +			stat->atomic_write_unit_min = XFS_FSB_TO_B(mp, unit_min_fsb);
> +			stat->atomic_write_unit_max = XFS_FSB_TO_B(mp, unit_max_fsb);

That's just nasty. We pull byte units from the bdev, convert them to
fsb to round them, then convert them back to byte counts. We should
be doing all the work in one set of units....

> +			stat->attributes |= STATX_ATTR_WRITE_ATOMIC;
> +			stat->attributes_mask |= STATX_ATTR_WRITE_ATOMIC;
> +			stat->result_mask |= STATX_WRITE_ATOMIC;

If the min/max are zero, then atomic writes are not supported on
this inode, right? Why would we set any of the attributes or result
mask to say it is supported on this file?


-Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2023-10-03  3:32 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-29 10:27 [PATCH 00/21] block atomic writes John Garry
2023-09-29 10:27 ` [PATCH 01/21] block: Add atomic write operations to request_queue limits John Garry
2023-10-03 16:40   ` Bart Van Assche
2023-10-04  3:00     ` Martin K. Petersen
2023-10-04 17:28       ` Bart Van Assche
2023-10-04 18:26         ` Martin K. Petersen
2023-10-04 21:00       ` Bart Van Assche
2023-10-05  8:22         ` John Garry
2023-11-09 15:10   ` Christoph Hellwig
2023-11-09 17:01     ` John Garry
2023-11-10  6:23       ` Christoph Hellwig
2023-11-10  9:04         ` John Garry
2023-09-29 10:27 ` [PATCH 02/21] block: Limit atomic writes according to bio and queue limits John Garry
2023-11-09 15:13   ` Christoph Hellwig
2023-11-09 17:41     ` John Garry
2023-12-04  3:19   ` Ming Lei
2023-12-04  3:55     ` Ming Lei
2023-12-04  9:35       ` John Garry
2023-09-29 10:27 ` [PATCH 03/21] fs/bdev: Add atomic write support info to statx John Garry
2023-09-29 22:49   ` Eric Biggers
2023-10-01 13:23     ` Bart Van Assche
2023-10-02  9:51       ` John Garry
2023-10-02 18:39         ` Bart Van Assche
2023-10-03  0:28           ` Martin K. Petersen
2023-11-09 15:15             ` Christoph Hellwig
2023-10-03  1:51         ` Dave Chinner
2023-10-03  2:57           ` Darrick J. Wong
2023-10-03  7:23             ` John Garry
2023-10-03 15:46               ` Darrick J. Wong
2023-10-04 14:19                 ` John Garry
2023-09-29 10:27 ` [PATCH 04/21] fs: Add RWF_ATOMIC and IOCB_ATOMIC flags for atomic write support John Garry
2023-10-06 18:15   ` Jeremy Bongio
2023-10-09 22:02     ` Dave Chinner
2023-09-29 10:27 ` [PATCH 05/21] block: Add REQ_ATOMIC flag John Garry
2023-09-29 10:27 ` [PATCH 06/21] block: Pass blk_queue_get_max_sectors() a request pointer John Garry
2023-09-29 10:27 ` [PATCH 07/21] block: Limit atomic write IO size according to atomic_write_max_sectors John Garry
2023-09-29 10:27 ` [PATCH 08/21] block: Error an attempt to split an atomic write bio John Garry
2023-09-29 10:27 ` [PATCH 09/21] block: Add checks to merging of atomic writes John Garry
2023-09-30 13:40   ` kernel test robot
2023-10-02 22:50     ` Nathan Chancellor
2023-10-04 11:40       ` John Garry
2023-09-29 10:27 ` [PATCH 10/21] block: Add fops atomic write support John Garry
2023-09-29 17:51   ` Bart Van Assche
2023-10-02 10:10     ` John Garry
2023-10-02 19:12       ` Bart Van Assche
2023-10-03  0:48         ` Martin K. Petersen
2023-10-03 16:55           ` Bart Van Assche
2023-10-04  2:53             ` Martin K. Petersen
2023-10-04 17:22               ` Bart Van Assche
2023-10-04 18:17                 ` Martin K. Petersen
2023-10-05 17:10                   ` Bart Van Assche
2023-10-05 22:36                     ` Dave Chinner
2023-10-05 22:58                       ` Bart Van Assche
2023-10-06  4:31                         ` Dave Chinner
2023-10-06 17:22                           ` Bart Van Assche
2023-10-07  1:21                             ` Martin K. Petersen
2023-10-03  8:37         ` John Garry
2023-10-03 16:45           ` Bart Van Assche
2023-10-04  9:14             ` John Garry
2023-10-04 17:34               ` Bart Van Assche
2023-10-04 21:59                 ` Dave Chinner
2023-12-04  2:30   ` Ming Lei
2023-12-04  9:27     ` John Garry
2023-12-04 12:18       ` Ming Lei
2023-12-04 13:13         ` John Garry
2023-12-05  1:45           ` Ming Lei
2023-12-05 10:49             ` John Garry
2023-09-29 10:27 ` [PATCH 11/21] fs: xfs: Don't use low-space allocator for alignment > 1 John Garry
2023-10-03  1:16   ` Dave Chinner
2023-10-03  3:00     ` Darrick J. Wong
2023-10-03  4:34       ` Dave Chinner
2023-10-03 10:22       ` John Garry
2023-09-29 10:27 ` [PATCH 12/21] fs: xfs: Introduce FORCEALIGN inode flag John Garry
2023-11-09 15:24   ` Christoph Hellwig
2023-09-29 10:27 ` [PATCH 13/21] fs: xfs: Make file data allocations observe the 'forcealign' flag John Garry
2023-10-03  1:42   ` Dave Chinner
2023-10-03 10:13     ` John Garry
2023-09-29 10:27 ` [PATCH 14/21] fs: xfs: Enable file data forcealign feature John Garry
2023-09-29 10:27 ` [PATCH 15/21] fs: xfs: Support atomic write for statx John Garry
2023-10-03  3:32   ` Dave Chinner [this message]
2023-10-03 10:56     ` John Garry
2023-10-03 16:10       ` Darrick J. Wong
2023-09-29 10:27 ` [PATCH 16/21] fs: iomap: Atomic write support John Garry
2023-10-03  4:24   ` Dave Chinner
2023-10-03 12:55     ` John Garry
2023-10-03 16:47     ` Darrick J. Wong
2023-10-04  1:16       ` Dave Chinner
2023-10-24 12:59     ` John Garry
2023-09-29 10:27 ` [PATCH 17/21] fs: xfs: iomap atomic " John Garry
2023-11-09 15:26   ` Christoph Hellwig
2023-11-10 10:42     ` John Garry
2023-11-28  8:56       ` John Garry
2023-11-28 13:56         ` Christoph Hellwig
2023-11-28 17:42           ` John Garry
2023-11-29  2:45             ` Martin K. Petersen
2023-12-04 13:45             ` Christoph Hellwig
2023-12-04 15:19               ` John Garry
2023-12-04 15:39                 ` Christoph Hellwig
2023-12-04 18:06                   ` John Garry
2023-12-05  4:55                 ` Theodore Ts'o
2023-12-05 11:09                   ` John Garry
2023-12-05 13:59                 ` Ming Lei
2023-09-29 10:27 ` [PATCH 18/21] scsi: sd: Support reading atomic properties from block limits VPD John Garry
2023-09-29 17:54   ` Bart Van Assche
2023-10-02 11:27     ` John Garry
2023-10-06 17:52       ` Bart Van Assche
2023-10-06 23:48         ` Martin K. Petersen
2023-09-29 10:27 ` [PATCH 19/21] scsi: sd: Add WRITE_ATOMIC_16 support John Garry
2023-09-29 17:59   ` Bart Van Assche
2023-10-02 11:36     ` John Garry
2023-10-02 19:21       ` Bart Van Assche
2023-09-29 10:27 ` [PATCH 20/21] scsi: scsi_debug: Atomic write support John Garry
2023-09-29 10:27 ` [PATCH 21/21] nvme: Support atomic writes John Garry
     [not found]   ` <CGME20231004113943eucas1p23a51ce5ef06c36459f826101bb7b85fc@eucas1p2.samsung.com>
2023-10-04 11:39     ` Pankaj Raghav
2023-10-05 10:24       ` John Garry
2023-10-05 13:32         ` Pankaj Raghav
2023-10-05 15:05           ` John Garry
2023-11-09 15:36   ` Christoph Hellwig
2023-11-09 15:42     ` Matthew Wilcox
2023-11-09 15:46       ` Christoph Hellwig
2023-11-09 19:08         ` John Garry
2023-11-10  6:29           ` Christoph Hellwig
2023-11-10  8:44             ` John Garry
2023-09-29 14:58 ` [PATCH 00/21] block " Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZRuLQKKPCzyUZtC9@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=chandan.babu@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jbongio@google.com \
    --cc=jejb@linux.ibm.com \
    --cc=john.g.garry@oracle.com \
    --cc=kbusch@kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=sagi@grimberg.me \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).