linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Garry <john.g.garry@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me,
	jejb@linux.ibm.com, martin.petersen@oracle.com,
	djwong@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org,
	chandan.babu@oracle.com, dchinner@redhat.com,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nvme@lists.infradead.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, tytso@mit.edu, jbongio@google.com,
	linux-api@vger.kernel.org
Subject: Re: [PATCH 15/21] fs: xfs: Support atomic write for statx
Date: Tue, 3 Oct 2023 11:56:52 +0100	[thread overview]
Message-ID: <9be14161-907e-92f6-d214-11df00693fac@oracle.com> (raw)
In-Reply-To: <ZRuLQKKPCzyUZtC9@dread.disaster.area>

On 03/10/2023 04:32, Dave Chinner wrote:
> On Fri, Sep 29, 2023 at 10:27:20AM +0000, John Garry wrote:
>> Support providing info on atomic write unit min and max for an inode.
>>
>> For simplicity, currently we limit the min at the FS block size, but a
>> lower limit could be supported in future.
>>
>> The atomic write unit min and max is limited by the guaranteed extent
>> alignment for the inode.
>>
>> Signed-off-by: John Garry <john.g.garry@oracle.com>
>> ---
>>   fs/xfs/xfs_iops.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++
>>   fs/xfs/xfs_iops.h |  4 ++++
>>   2 files changed, 55 insertions(+)
>>
>> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
>> index 1c1e6171209d..5bff80748223 100644
>> --- a/fs/xfs/xfs_iops.c
>> +++ b/fs/xfs/xfs_iops.c
>> @@ -546,6 +546,46 @@ xfs_stat_blksize(
>>   	return PAGE_SIZE;
>>   }
>>   
>> +void xfs_ip_atomic_write_attr(struct xfs_inode *ip,
>> +			xfs_filblks_t *unit_min_fsb,
>> +			xfs_filblks_t *unit_max_fsb)
> 
> Formatting.

Change args to 1x tab indent, right?

> 
> Also, we don't use variable name shorthand for function names -
> xfs_get_atomic_write_hint(ip) to match xfs_get_extsz_hint(ip)
> would be appropriate, right?

Changing the name format would be ok. However we are not returning a 
hint, but rather the inode atomic write unit min and max values in FS 
blocks. Anyway, I'll look to rework the name.

> 
> 
> 
>> +{
>> +	xfs_extlen_t		extsz_hint = xfs_get_extsz_hint(ip);
>> +	struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
>> +	struct block_device	*bdev = target->bt_bdev;
>> +	struct xfs_mount	*mp = ip->i_mount;
>> +	xfs_filblks_t		atomic_write_unit_min,
>> +				atomic_write_unit_max,
>> +				align;
>> +
>> +	atomic_write_unit_min = XFS_B_TO_FSB(mp,
>> +		queue_atomic_write_unit_min_bytes(bdev->bd_queue));
>> +	atomic_write_unit_max = XFS_B_TO_FSB(mp,
>> +		queue_atomic_write_unit_max_bytes(bdev->bd_queue));
> 
> These should be set in the buftarg at mount time, like we do with
> sector size masks. Then we don't need to convert them to fsbs on
> every single lookup.

ok, fine. However I do still have a doubt on whether these values should 
be changeable - please see (small) comment about 
atomic_write_max_sectors in patch 7/21

> 
>> +	/* for RT, unset extsize gives hint of 1 */
>> +	/* for !RT, unset extsize gives hint of 0 */
>> +	if (extsz_hint && (XFS_IS_REALTIME_INODE(ip) ||
>> +	    (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN)))
> 
> Logic is non-obvious. The compound is (rt || force), not
> (extsz && rt), so it took me a while to actually realise I read this
> incorrectly.
> 
> 	if (extsz_hint &&
> 	    (XFS_IS_REALTIME_INODE(ip) ||
> 	     (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN))) {
> 
>> +		align = extsz_hint;
>> +	else
>> +		align = 1;
> 
> And now the logic looks wrong to me. We don't want to use extsz hint
> for RT inodes if force align is not set, this will always use it
> regardless of the fact it has nothing to do with force alignment.

extsz_hint comes from xfs_get_extsz_hint(), which gives us the SB 
extsize for the RT inode and this alignment is guaranteed, no?

> 
> Indeed, if XFS_DIFLAG2_FORCEALIGN is not set, then shouldn't this
> always return min/max = 0 because atomic alignments are not in us on
> this inode?

As above, for RT I thought that extsize alignment was guaranteed and we 
don't need to bother with XFS_DIFLAG2_FORCEALIGN there.

> 
> i.e. the first thing this code should do is:
> 
> 	*unit_min_fsb = 0;
> 	*unit_max_fsb = 0;
> 	if (!(ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN))
> 		return;
> 
> Then we can check device support:
> 
> 	if (!buftarg->bt_atomic_write_max)
> 		return;
> 
> Then we can check for extent size hints. If that's not set:
> 
> 	align = xfs_get_extsz_hint(ip);
> 	if (align <= 1) {
> 		unit_min_fsb = 1;
> 		unit_max_fsb = 1;
> 		return;
> 	}
> 
> And finally, if there is an extent size hint, we can return that.
> 
>> +	if (atomic_write_unit_max == 0) {
>> +		*unit_min_fsb = 0;
>> +		*unit_max_fsb = 0;
>> +	} else if (atomic_write_unit_min == 0) {
>> +		*unit_min_fsb = 1;
>> +		*unit_max_fsb = min_t(xfs_filblks_t, atomic_write_unit_max,
>> +					align);
> 
> Why is it valid for a device to have a zero minimum size?

It's not valid. Local variables atomic_write_unit_max and 
atomic_write_unit_min unit here is FS blocks - maybe I should change names.

The idea is that for simplicity we won't support atomic writes for XFS 
of size less than 1x FS block initially. So if the bdev has - for 
example - queue_atomic_write_unit_min_bytes() == 2K and 
queue_atomic_write_unit_max_bytes() == 64K, then (ignoring alignment) we 
say that unit_min_fsb = 1 and unit_max_fsb = 16 (for 4K FS blocks).

> If it can
> set a maximum, it should -always- set a minimum size as logical
> sector size is a valid lower bound, yes?
> 
>> +	} else {
>> +		*unit_min_fsb = min_t(xfs_filblks_t, atomic_write_unit_min,
>> +					align);
>> +		*unit_max_fsb = min_t(xfs_filblks_t, atomic_write_unit_max,
>> +					align);
>> +	}
> 
> Nothing here guarantees the power-of-2 sizes that the RWF_ATOMIC
> user interface requires....

atomic_write_unit_min and atomic_write_unit_max will be powers-of-2 (or 0).

But, you are right, we don't check align is a power-of-2 - that can be 
added.

> 
> It also doesn't check that the extent size hint is aligned with
> atomic write units.

If we add a check for align being a power-of-2 and atomic_write_unit_min 
and atomic_write_unit_max are already powers-of-2, then this can be 
relied on, right?

> 
> It also doesn't check either against stripe unit alignment....

As mentioned in earlier response, this could be enforced.

> 
>> +}
>> +
>>   STATIC int
>>   xfs_vn_getattr(
>>   	struct mnt_idmap	*idmap,
>> @@ -614,6 +654,17 @@ xfs_vn_getattr(
>>   			stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
>>   			stat->dio_offset_align = bdev_logical_block_size(bdev);
>>   		}
>> +		if (request_mask & STATX_WRITE_ATOMIC) {
>> +			xfs_filblks_t unit_min_fsb, unit_max_fsb;
>> +
>> +			xfs_ip_atomic_write_attr(ip, &unit_min_fsb,
>> +				&unit_max_fsb);
>> +			stat->atomic_write_unit_min = XFS_FSB_TO_B(mp, unit_min_fsb);
>> +			stat->atomic_write_unit_max = XFS_FSB_TO_B(mp, unit_max_fsb);
> 
> That's just nasty. We pull byte units from the bdev, convert them to
> fsb to round them, then convert them back to byte counts. We should
> be doing all the work in one set of units....

ok, agreed. bytes is probably best.

> 
>> +			stat->attributes |= STATX_ATTR_WRITE_ATOMIC;
>> +			stat->attributes_mask |= STATX_ATTR_WRITE_ATOMIC;
>> +			stat->result_mask |= STATX_WRITE_ATOMIC;
> 
> If the min/max are zero, then atomic writes are not supported on
> this inode, right? Why would we set any of the attributes or result
> mask to say it is supported on this file?

ok, we won't set STATX_ATTR_WRITE_ATOMIC for min/max are zero

Thanks,
John

  reply	other threads:[~2023-10-03 10:57 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-29 10:27 [PATCH 00/21] block atomic writes John Garry
2023-09-29 10:27 ` [PATCH 01/21] block: Add atomic write operations to request_queue limits John Garry
2023-10-03 16:40   ` Bart Van Assche
2023-10-04  3:00     ` Martin K. Petersen
2023-10-04 17:28       ` Bart Van Assche
2023-10-04 18:26         ` Martin K. Petersen
2023-10-04 21:00       ` Bart Van Assche
2023-10-05  8:22         ` John Garry
2023-11-09 15:10   ` Christoph Hellwig
2023-11-09 17:01     ` John Garry
2023-11-10  6:23       ` Christoph Hellwig
2023-11-10  9:04         ` John Garry
2023-09-29 10:27 ` [PATCH 02/21] block: Limit atomic writes according to bio and queue limits John Garry
2023-11-09 15:13   ` Christoph Hellwig
2023-11-09 17:41     ` John Garry
2023-12-04  3:19   ` Ming Lei
2023-12-04  3:55     ` Ming Lei
2023-12-04  9:35       ` John Garry
2023-09-29 10:27 ` [PATCH 03/21] fs/bdev: Add atomic write support info to statx John Garry
2023-09-29 22:49   ` Eric Biggers
2023-10-01 13:23     ` Bart Van Assche
2023-10-02  9:51       ` John Garry
2023-10-02 18:39         ` Bart Van Assche
2023-10-03  0:28           ` Martin K. Petersen
2023-11-09 15:15             ` Christoph Hellwig
2023-10-03  1:51         ` Dave Chinner
2023-10-03  2:57           ` Darrick J. Wong
2023-10-03  7:23             ` John Garry
2023-10-03 15:46               ` Darrick J. Wong
2023-10-04 14:19                 ` John Garry
2023-09-29 10:27 ` [PATCH 04/21] fs: Add RWF_ATOMIC and IOCB_ATOMIC flags for atomic write support John Garry
2023-10-06 18:15   ` Jeremy Bongio
2023-10-09 22:02     ` Dave Chinner
2023-09-29 10:27 ` [PATCH 05/21] block: Add REQ_ATOMIC flag John Garry
2023-09-29 10:27 ` [PATCH 06/21] block: Pass blk_queue_get_max_sectors() a request pointer John Garry
2023-09-29 10:27 ` [PATCH 07/21] block: Limit atomic write IO size according to atomic_write_max_sectors John Garry
2023-09-29 10:27 ` [PATCH 08/21] block: Error an attempt to split an atomic write bio John Garry
2023-09-29 10:27 ` [PATCH 09/21] block: Add checks to merging of atomic writes John Garry
2023-09-30 13:40   ` kernel test robot
2023-10-02 22:50     ` Nathan Chancellor
2023-10-04 11:40       ` John Garry
2023-09-29 10:27 ` [PATCH 10/21] block: Add fops atomic write support John Garry
2023-09-29 17:51   ` Bart Van Assche
2023-10-02 10:10     ` John Garry
2023-10-02 19:12       ` Bart Van Assche
2023-10-03  0:48         ` Martin K. Petersen
2023-10-03 16:55           ` Bart Van Assche
2023-10-04  2:53             ` Martin K. Petersen
2023-10-04 17:22               ` Bart Van Assche
2023-10-04 18:17                 ` Martin K. Petersen
2023-10-05 17:10                   ` Bart Van Assche
2023-10-05 22:36                     ` Dave Chinner
2023-10-05 22:58                       ` Bart Van Assche
2023-10-06  4:31                         ` Dave Chinner
2023-10-06 17:22                           ` Bart Van Assche
2023-10-07  1:21                             ` Martin K. Petersen
2023-10-03  8:37         ` John Garry
2023-10-03 16:45           ` Bart Van Assche
2023-10-04  9:14             ` John Garry
2023-10-04 17:34               ` Bart Van Assche
2023-10-04 21:59                 ` Dave Chinner
2023-12-04  2:30   ` Ming Lei
2023-12-04  9:27     ` John Garry
2023-12-04 12:18       ` Ming Lei
2023-12-04 13:13         ` John Garry
2023-12-05  1:45           ` Ming Lei
2023-12-05 10:49             ` John Garry
2023-09-29 10:27 ` [PATCH 11/21] fs: xfs: Don't use low-space allocator for alignment > 1 John Garry
2023-10-03  1:16   ` Dave Chinner
2023-10-03  3:00     ` Darrick J. Wong
2023-10-03  4:34       ` Dave Chinner
2023-10-03 10:22       ` John Garry
2023-09-29 10:27 ` [PATCH 12/21] fs: xfs: Introduce FORCEALIGN inode flag John Garry
2023-11-09 15:24   ` Christoph Hellwig
2023-09-29 10:27 ` [PATCH 13/21] fs: xfs: Make file data allocations observe the 'forcealign' flag John Garry
2023-10-03  1:42   ` Dave Chinner
2023-10-03 10:13     ` John Garry
2023-09-29 10:27 ` [PATCH 14/21] fs: xfs: Enable file data forcealign feature John Garry
2023-09-29 10:27 ` [PATCH 15/21] fs: xfs: Support atomic write for statx John Garry
2023-10-03  3:32   ` Dave Chinner
2023-10-03 10:56     ` John Garry [this message]
2023-10-03 16:10       ` Darrick J. Wong
2023-09-29 10:27 ` [PATCH 16/21] fs: iomap: Atomic write support John Garry
2023-10-03  4:24   ` Dave Chinner
2023-10-03 12:55     ` John Garry
2023-10-03 16:47     ` Darrick J. Wong
2023-10-04  1:16       ` Dave Chinner
2023-10-24 12:59     ` John Garry
2023-09-29 10:27 ` [PATCH 17/21] fs: xfs: iomap atomic " John Garry
2023-11-09 15:26   ` Christoph Hellwig
2023-11-10 10:42     ` John Garry
2023-11-28  8:56       ` John Garry
2023-11-28 13:56         ` Christoph Hellwig
2023-11-28 17:42           ` John Garry
2023-11-29  2:45             ` Martin K. Petersen
2023-12-04 13:45             ` Christoph Hellwig
2023-12-04 15:19               ` John Garry
2023-12-04 15:39                 ` Christoph Hellwig
2023-12-04 18:06                   ` John Garry
2023-12-05  4:55                 ` Theodore Ts'o
2023-12-05 11:09                   ` John Garry
2023-12-05 13:59                 ` Ming Lei
2023-09-29 10:27 ` [PATCH 18/21] scsi: sd: Support reading atomic properties from block limits VPD John Garry
2023-09-29 17:54   ` Bart Van Assche
2023-10-02 11:27     ` John Garry
2023-10-06 17:52       ` Bart Van Assche
2023-10-06 23:48         ` Martin K. Petersen
2023-09-29 10:27 ` [PATCH 19/21] scsi: sd: Add WRITE_ATOMIC_16 support John Garry
2023-09-29 17:59   ` Bart Van Assche
2023-10-02 11:36     ` John Garry
2023-10-02 19:21       ` Bart Van Assche
2023-09-29 10:27 ` [PATCH 20/21] scsi: scsi_debug: Atomic write support John Garry
2023-09-29 10:27 ` [PATCH 21/21] nvme: Support atomic writes John Garry
     [not found]   ` <CGME20231004113943eucas1p23a51ce5ef06c36459f826101bb7b85fc@eucas1p2.samsung.com>
2023-10-04 11:39     ` Pankaj Raghav
2023-10-05 10:24       ` John Garry
2023-10-05 13:32         ` Pankaj Raghav
2023-10-05 15:05           ` John Garry
2023-11-09 15:36   ` Christoph Hellwig
2023-11-09 15:42     ` Matthew Wilcox
2023-11-09 15:46       ` Christoph Hellwig
2023-11-09 19:08         ` John Garry
2023-11-10  6:29           ` Christoph Hellwig
2023-11-10  8:44             ` John Garry
2023-09-29 14:58 ` [PATCH 00/21] block " Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9be14161-907e-92f6-d214-11df00693fac@oracle.com \
    --to=john.g.garry@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=chandan.babu@oracle.com \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jbongio@google.com \
    --cc=jejb@linux.ibm.com \
    --cc=kbusch@kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=sagi@grimberg.me \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).