From: John Garry <john.g.garry@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me,
jejb@linux.ibm.com, martin.petersen@oracle.com,
djwong@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org,
chandan.babu@oracle.com, dchinner@redhat.com,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-nvme@lists.infradead.org, linux-xfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org, tytso@mit.edu, jbongio@google.com,
linux-api@vger.kernel.org
Subject: Re: [PATCH 15/21] fs: xfs: Support atomic write for statx
Date: Tue, 3 Oct 2023 11:56:52 +0100 [thread overview]
Message-ID: <9be14161-907e-92f6-d214-11df00693fac@oracle.com> (raw)
In-Reply-To: <ZRuLQKKPCzyUZtC9@dread.disaster.area>
On 03/10/2023 04:32, Dave Chinner wrote:
> On Fri, Sep 29, 2023 at 10:27:20AM +0000, John Garry wrote:
>> Support providing info on atomic write unit min and max for an inode.
>>
>> For simplicity, currently we limit the min at the FS block size, but a
>> lower limit could be supported in future.
>>
>> The atomic write unit min and max is limited by the guaranteed extent
>> alignment for the inode.
>>
>> Signed-off-by: John Garry <john.g.garry@oracle.com>
>> ---
>> fs/xfs/xfs_iops.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++
>> fs/xfs/xfs_iops.h | 4 ++++
>> 2 files changed, 55 insertions(+)
>>
>> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
>> index 1c1e6171209d..5bff80748223 100644
>> --- a/fs/xfs/xfs_iops.c
>> +++ b/fs/xfs/xfs_iops.c
>> @@ -546,6 +546,46 @@ xfs_stat_blksize(
>> return PAGE_SIZE;
>> }
>>
>> +void xfs_ip_atomic_write_attr(struct xfs_inode *ip,
>> + xfs_filblks_t *unit_min_fsb,
>> + xfs_filblks_t *unit_max_fsb)
>
> Formatting.
Change args to 1x tab indent, right?
>
> Also, we don't use variable name shorthand for function names -
> xfs_get_atomic_write_hint(ip) to match xfs_get_extsz_hint(ip)
> would be appropriate, right?
Changing the name format would be ok. However we are not returning a
hint, but rather the inode atomic write unit min and max values in FS
blocks. Anyway, I'll look to rework the name.
>
>
>
>> +{
>> + xfs_extlen_t extsz_hint = xfs_get_extsz_hint(ip);
>> + struct xfs_buftarg *target = xfs_inode_buftarg(ip);
>> + struct block_device *bdev = target->bt_bdev;
>> + struct xfs_mount *mp = ip->i_mount;
>> + xfs_filblks_t atomic_write_unit_min,
>> + atomic_write_unit_max,
>> + align;
>> +
>> + atomic_write_unit_min = XFS_B_TO_FSB(mp,
>> + queue_atomic_write_unit_min_bytes(bdev->bd_queue));
>> + atomic_write_unit_max = XFS_B_TO_FSB(mp,
>> + queue_atomic_write_unit_max_bytes(bdev->bd_queue));
>
> These should be set in the buftarg at mount time, like we do with
> sector size masks. Then we don't need to convert them to fsbs on
> every single lookup.
ok, fine. However I do still have a doubt on whether these values should
be changeable - please see (small) comment about
atomic_write_max_sectors in patch 7/21
>
>> + /* for RT, unset extsize gives hint of 1 */
>> + /* for !RT, unset extsize gives hint of 0 */
>> + if (extsz_hint && (XFS_IS_REALTIME_INODE(ip) ||
>> + (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN)))
>
> Logic is non-obvious. The compound is (rt || force), not
> (extsz && rt), so it took me a while to actually realise I read this
> incorrectly.
>
> if (extsz_hint &&
> (XFS_IS_REALTIME_INODE(ip) ||
> (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN))) {
>
>> + align = extsz_hint;
>> + else
>> + align = 1;
>
> And now the logic looks wrong to me. We don't want to use extsz hint
> for RT inodes if force align is not set, this will always use it
> regardless of the fact it has nothing to do with force alignment.
extsz_hint comes from xfs_get_extsz_hint(), which gives us the SB
extsize for the RT inode and this alignment is guaranteed, no?
>
> Indeed, if XFS_DIFLAG2_FORCEALIGN is not set, then shouldn't this
> always return min/max = 0 because atomic alignments are not in us on
> this inode?
As above, for RT I thought that extsize alignment was guaranteed and we
don't need to bother with XFS_DIFLAG2_FORCEALIGN there.
>
> i.e. the first thing this code should do is:
>
> *unit_min_fsb = 0;
> *unit_max_fsb = 0;
> if (!(ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN))
> return;
>
> Then we can check device support:
>
> if (!buftarg->bt_atomic_write_max)
> return;
>
> Then we can check for extent size hints. If that's not set:
>
> align = xfs_get_extsz_hint(ip);
> if (align <= 1) {
> unit_min_fsb = 1;
> unit_max_fsb = 1;
> return;
> }
>
> And finally, if there is an extent size hint, we can return that.
>
>> + if (atomic_write_unit_max == 0) {
>> + *unit_min_fsb = 0;
>> + *unit_max_fsb = 0;
>> + } else if (atomic_write_unit_min == 0) {
>> + *unit_min_fsb = 1;
>> + *unit_max_fsb = min_t(xfs_filblks_t, atomic_write_unit_max,
>> + align);
>
> Why is it valid for a device to have a zero minimum size?
It's not valid. Local variables atomic_write_unit_max and
atomic_write_unit_min unit here is FS blocks - maybe I should change names.
The idea is that for simplicity we won't support atomic writes for XFS
of size less than 1x FS block initially. So if the bdev has - for
example - queue_atomic_write_unit_min_bytes() == 2K and
queue_atomic_write_unit_max_bytes() == 64K, then (ignoring alignment) we
say that unit_min_fsb = 1 and unit_max_fsb = 16 (for 4K FS blocks).
> If it can
> set a maximum, it should -always- set a minimum size as logical
> sector size is a valid lower bound, yes?
>
>> + } else {
>> + *unit_min_fsb = min_t(xfs_filblks_t, atomic_write_unit_min,
>> + align);
>> + *unit_max_fsb = min_t(xfs_filblks_t, atomic_write_unit_max,
>> + align);
>> + }
>
> Nothing here guarantees the power-of-2 sizes that the RWF_ATOMIC
> user interface requires....
atomic_write_unit_min and atomic_write_unit_max will be powers-of-2 (or 0).
But, you are right, we don't check align is a power-of-2 - that can be
added.
>
> It also doesn't check that the extent size hint is aligned with
> atomic write units.
If we add a check for align being a power-of-2 and atomic_write_unit_min
and atomic_write_unit_max are already powers-of-2, then this can be
relied on, right?
>
> It also doesn't check either against stripe unit alignment....
As mentioned in earlier response, this could be enforced.
>
>> +}
>> +
>> STATIC int
>> xfs_vn_getattr(
>> struct mnt_idmap *idmap,
>> @@ -614,6 +654,17 @@ xfs_vn_getattr(
>> stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
>> stat->dio_offset_align = bdev_logical_block_size(bdev);
>> }
>> + if (request_mask & STATX_WRITE_ATOMIC) {
>> + xfs_filblks_t unit_min_fsb, unit_max_fsb;
>> +
>> + xfs_ip_atomic_write_attr(ip, &unit_min_fsb,
>> + &unit_max_fsb);
>> + stat->atomic_write_unit_min = XFS_FSB_TO_B(mp, unit_min_fsb);
>> + stat->atomic_write_unit_max = XFS_FSB_TO_B(mp, unit_max_fsb);
>
> That's just nasty. We pull byte units from the bdev, convert them to
> fsb to round them, then convert them back to byte counts. We should
> be doing all the work in one set of units....
ok, agreed. bytes is probably best.
>
>> + stat->attributes |= STATX_ATTR_WRITE_ATOMIC;
>> + stat->attributes_mask |= STATX_ATTR_WRITE_ATOMIC;
>> + stat->result_mask |= STATX_WRITE_ATOMIC;
>
> If the min/max are zero, then atomic writes are not supported on
> this inode, right? Why would we set any of the attributes or result
> mask to say it is supported on this file?
ok, we won't set STATX_ATTR_WRITE_ATOMIC for min/max are zero
Thanks,
John
next prev parent reply other threads:[~2023-10-03 10:57 UTC|newest]
Thread overview: 124+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-29 10:27 [PATCH 00/21] block atomic writes John Garry
2023-09-29 10:27 ` [PATCH 01/21] block: Add atomic write operations to request_queue limits John Garry
2023-10-03 16:40 ` Bart Van Assche
2023-10-04 3:00 ` Martin K. Petersen
2023-10-04 17:28 ` Bart Van Assche
2023-10-04 18:26 ` Martin K. Petersen
2023-10-04 21:00 ` Bart Van Assche
2023-10-05 8:22 ` John Garry
2023-11-09 15:10 ` Christoph Hellwig
2023-11-09 17:01 ` John Garry
2023-11-10 6:23 ` Christoph Hellwig
2023-11-10 9:04 ` John Garry
2023-09-29 10:27 ` [PATCH 02/21] block: Limit atomic writes according to bio and queue limits John Garry
2023-11-09 15:13 ` Christoph Hellwig
2023-11-09 17:41 ` John Garry
2023-12-04 3:19 ` Ming Lei
2023-12-04 3:55 ` Ming Lei
2023-12-04 9:35 ` John Garry
2023-09-29 10:27 ` [PATCH 03/21] fs/bdev: Add atomic write support info to statx John Garry
2023-09-29 22:49 ` Eric Biggers
2023-10-01 13:23 ` Bart Van Assche
2023-10-02 9:51 ` John Garry
2023-10-02 18:39 ` Bart Van Assche
2023-10-03 0:28 ` Martin K. Petersen
2023-11-09 15:15 ` Christoph Hellwig
2023-10-03 1:51 ` Dave Chinner
2023-10-03 2:57 ` Darrick J. Wong
2023-10-03 7:23 ` John Garry
2023-10-03 15:46 ` Darrick J. Wong
2023-10-04 14:19 ` John Garry
2023-09-29 10:27 ` [PATCH 04/21] fs: Add RWF_ATOMIC and IOCB_ATOMIC flags for atomic write support John Garry
2023-10-06 18:15 ` Jeremy Bongio
2023-10-09 22:02 ` Dave Chinner
2023-09-29 10:27 ` [PATCH 05/21] block: Add REQ_ATOMIC flag John Garry
2023-09-29 10:27 ` [PATCH 06/21] block: Pass blk_queue_get_max_sectors() a request pointer John Garry
2023-09-29 10:27 ` [PATCH 07/21] block: Limit atomic write IO size according to atomic_write_max_sectors John Garry
2023-09-29 10:27 ` [PATCH 08/21] block: Error an attempt to split an atomic write bio John Garry
2023-09-29 10:27 ` [PATCH 09/21] block: Add checks to merging of atomic writes John Garry
2023-09-30 13:40 ` kernel test robot
2023-10-02 22:50 ` Nathan Chancellor
2023-10-04 11:40 ` John Garry
2023-09-29 10:27 ` [PATCH 10/21] block: Add fops atomic write support John Garry
2023-09-29 17:51 ` Bart Van Assche
2023-10-02 10:10 ` John Garry
2023-10-02 19:12 ` Bart Van Assche
2023-10-03 0:48 ` Martin K. Petersen
2023-10-03 16:55 ` Bart Van Assche
2023-10-04 2:53 ` Martin K. Petersen
2023-10-04 17:22 ` Bart Van Assche
2023-10-04 18:17 ` Martin K. Petersen
2023-10-05 17:10 ` Bart Van Assche
2023-10-05 22:36 ` Dave Chinner
2023-10-05 22:58 ` Bart Van Assche
2023-10-06 4:31 ` Dave Chinner
2023-10-06 17:22 ` Bart Van Assche
2023-10-07 1:21 ` Martin K. Petersen
2023-10-03 8:37 ` John Garry
2023-10-03 16:45 ` Bart Van Assche
2023-10-04 9:14 ` John Garry
2023-10-04 17:34 ` Bart Van Assche
2023-10-04 21:59 ` Dave Chinner
2023-12-04 2:30 ` Ming Lei
2023-12-04 9:27 ` John Garry
2023-12-04 12:18 ` Ming Lei
2023-12-04 13:13 ` John Garry
2023-12-05 1:45 ` Ming Lei
2023-12-05 10:49 ` John Garry
2023-09-29 10:27 ` [PATCH 11/21] fs: xfs: Don't use low-space allocator for alignment > 1 John Garry
2023-10-03 1:16 ` Dave Chinner
2023-10-03 3:00 ` Darrick J. Wong
2023-10-03 4:34 ` Dave Chinner
2023-10-03 10:22 ` John Garry
2023-09-29 10:27 ` [PATCH 12/21] fs: xfs: Introduce FORCEALIGN inode flag John Garry
2023-11-09 15:24 ` Christoph Hellwig
2023-09-29 10:27 ` [PATCH 13/21] fs: xfs: Make file data allocations observe the 'forcealign' flag John Garry
2023-10-03 1:42 ` Dave Chinner
2023-10-03 10:13 ` John Garry
2023-09-29 10:27 ` [PATCH 14/21] fs: xfs: Enable file data forcealign feature John Garry
2023-09-29 10:27 ` [PATCH 15/21] fs: xfs: Support atomic write for statx John Garry
2023-10-03 3:32 ` Dave Chinner
2023-10-03 10:56 ` John Garry [this message]
2023-10-03 16:10 ` Darrick J. Wong
2023-09-29 10:27 ` [PATCH 16/21] fs: iomap: Atomic write support John Garry
2023-10-03 4:24 ` Dave Chinner
2023-10-03 12:55 ` John Garry
2023-10-03 16:47 ` Darrick J. Wong
2023-10-04 1:16 ` Dave Chinner
2023-10-24 12:59 ` John Garry
2023-09-29 10:27 ` [PATCH 17/21] fs: xfs: iomap atomic " John Garry
2023-11-09 15:26 ` Christoph Hellwig
2023-11-10 10:42 ` John Garry
2023-11-28 8:56 ` John Garry
2023-11-28 13:56 ` Christoph Hellwig
2023-11-28 17:42 ` John Garry
2023-11-29 2:45 ` Martin K. Petersen
2023-12-04 13:45 ` Christoph Hellwig
2023-12-04 15:19 ` John Garry
2023-12-04 15:39 ` Christoph Hellwig
2023-12-04 18:06 ` John Garry
2023-12-05 4:55 ` Theodore Ts'o
2023-12-05 11:09 ` John Garry
2023-12-05 13:59 ` Ming Lei
2023-09-29 10:27 ` [PATCH 18/21] scsi: sd: Support reading atomic properties from block limits VPD John Garry
2023-09-29 17:54 ` Bart Van Assche
2023-10-02 11:27 ` John Garry
2023-10-06 17:52 ` Bart Van Assche
2023-10-06 23:48 ` Martin K. Petersen
2023-09-29 10:27 ` [PATCH 19/21] scsi: sd: Add WRITE_ATOMIC_16 support John Garry
2023-09-29 17:59 ` Bart Van Assche
2023-10-02 11:36 ` John Garry
2023-10-02 19:21 ` Bart Van Assche
2023-09-29 10:27 ` [PATCH 20/21] scsi: scsi_debug: Atomic write support John Garry
2023-09-29 10:27 ` [PATCH 21/21] nvme: Support atomic writes John Garry
[not found] ` <CGME20231004113943eucas1p23a51ce5ef06c36459f826101bb7b85fc@eucas1p2.samsung.com>
2023-10-04 11:39 ` Pankaj Raghav
2023-10-05 10:24 ` John Garry
2023-10-05 13:32 ` Pankaj Raghav
2023-10-05 15:05 ` John Garry
2023-11-09 15:36 ` Christoph Hellwig
2023-11-09 15:42 ` Matthew Wilcox
2023-11-09 15:46 ` Christoph Hellwig
2023-11-09 19:08 ` John Garry
2023-11-10 6:29 ` Christoph Hellwig
2023-11-10 8:44 ` John Garry
2023-09-29 14:58 ` [PATCH 00/21] block " Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9be14161-907e-92f6-d214-11df00693fac@oracle.com \
--to=john.g.garry@oracle.com \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=chandan.babu@oracle.com \
--cc=david@fromorbit.com \
--cc=dchinner@redhat.com \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=jbongio@google.com \
--cc=jejb@linux.ibm.com \
--cc=kbusch@kernel.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-xfs@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=sagi@grimberg.me \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).