linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Garry <john.g.garry@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: djwong@kernel.org, hch@lst.de, viro@zeniv.linux.org.uk,
	brauner@kernel.org, jack@suse.cz, chandan.babu@oracle.com,
	axboe@kernel.dk, martin.petersen@oracle.com,
	linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, tytso@mit.edu, jbongio@google.com,
	ojaswin@linux.ibm.com, ritesh.list@gmail.com,
	linux-block@vger.kernel.org
Subject: Re: [PATCH v2 08/14] fs: xfs: iomap: Sub-extent zeroing
Date: Thu, 7 Mar 2024 12:57:06 +0000	[thread overview]
Message-ID: <66204b26-7a8a-4c21-8728-aa3c5af5822c@oracle.com> (raw)
In-Reply-To: <Zejnc+M32wRIutNZ@dread.disaster.area>


>>   
>>   	if (unlikely(!xfs_valid_startblock(ip, imap->br_startblock)))
>>   		return xfs_alert_fsblock_zero(ip, imap);
>> @@ -134,6 +135,8 @@ xfs_bmbt_to_iomap(
>>   
>>   	iomap->validity_cookie = sequence_cookie;
>>   	iomap->folio_ops = &xfs_iomap_folio_ops;
>> +	if (extsz > 1)
>> +		iomap->extent_shift = ffs(extsz) - 1;
> 
> 	iomap->extent_size = mp->m_bsize;
> 	if (xfs_inode_has_force_align(ip))
> 		iomap->extent_size *= ip->i_extsize;

ok, fine


> 
>> @@ -563,11 +566,19 @@ xfs_iomap_write_unwritten(
>>   	xfs_fsize_t	i_size;
>>   	uint		resblks;
>>   	int		error;
>> +	xfs_extlen_t	extsz = xfs_get_extsz(ip);
>>   
>>   	trace_xfs_unwritten_convert(ip, offset, count);
>>   
>> -	offset_fsb = XFS_B_TO_FSBT(mp, offset);
>> -	count_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
>> +	if (extsz > 1) {
>> +		xfs_extlen_t extsize_bytes = XFS_FSB_TO_B(mp, extsz);
>> +
>> +		offset_fsb = XFS_B_TO_FSBT(mp, round_down(offset, extsize_bytes));
>> +		count_fsb = XFS_B_TO_FSB(mp, round_up(offset + count, extsize_bytes));
>> +	} else {
>> +		offset_fsb = XFS_B_TO_FSBT(mp, offset);
>> +		count_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
>> +	}
> 
> I don't think this is correct. We should only be converting the
> extent when the entire range has had data written to it. If we are
> doing unaligned writes, we end up running 3 separate unwritten
> conversion transactions - the leading zeroing, the data written and
> the trailing zeroing.

Then I missed that in the code.

For sub-FS block conversion, I thought that this was doing the complete 
FS blocks conversion, including for the head and tail zeros. And now for 
sub-extent writes, we would be similarly doing the full extent 
conversion, including head and tail zeros.

> 
> This will end up converting the entire range to written when the
> leading zeroing completes, exposing stale data until the data and
> trailing zeroing completes.

That would not be good.

> 
> Concurrent reads (both DIO and buffered) can see this stale data
> while the write is in progress, leading to a mechanism where a user
> can issue sub-atomic write range IO and concurrent overlapping reads
> to read arbitrary stale data from the disk just before it is
> overwritten.
> 
> I suspect the only way to fix this for sub-force-aligned DIo writes
> if for iomap_dio_bio_iter() to chain the zeroing and data bios so
> the entire range gets a single completion run on it instead of three
> separate sub-aligned extent IO completions. We only need to do this
> in the zeroing case - this is already the DIo slow path, so
> additional submission overhead is not an issue. It would, however,
> reduce completion overhead and latency, as we only need to run a
> single extent conversion instead of 3, so chaining the bios on
> aligned writes may well be a net win...
> 

ok, I'll check that idea.

> Thoughts? Christoph probably needs to weigh in on this one...
> 

ok

Cheers,
John


  reply	other threads:[~2024-03-07 12:57 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-04 13:04 [PATCH v2 00/14] block atomic writes for XFS John Garry
2024-03-04 13:04 ` [PATCH v2 01/14] block: Add blk_validate_atomic_write_op_size() John Garry
2024-03-04 13:04 ` [PATCH v2 02/14] fs: xfs: Don't use low-space allocator for alignment > 1 John Garry
2024-03-04 22:15   ` Dave Chinner
2024-03-05 13:36     ` John Garry
2024-03-04 13:04 ` [PATCH v2 03/14] fs: xfs: Introduce FORCEALIGN inode flag John Garry
2024-03-04 13:04 ` [PATCH v2 04/14] fs: xfs: Make file data allocations observe the 'forcealign' flag John Garry
2024-03-05  0:44   ` Dave Chinner
2024-03-05 15:22     ` John Garry
2024-03-05 22:18       ` Dave Chinner
2024-03-06  9:41         ` John Garry
2024-03-04 13:04 ` [PATCH v2 05/14] fs: xfs: Enable file data forcealign feature John Garry
2024-03-04 13:04 ` [PATCH v2 06/14] fs: xfs: Do not free EOF blocks for forcealign John Garry
2024-03-06 21:07   ` Dave Chinner
2024-03-07 11:38     ` John Garry
2024-03-04 13:04 ` [PATCH v2 07/14] fs: iomap: Sub-extent zeroing John Garry
2024-03-06 21:14   ` Dave Chinner
2024-03-07 11:51     ` John Garry
2024-03-04 13:04 ` [PATCH v2 08/14] fs: xfs: " John Garry
2024-03-06 22:00   ` Dave Chinner
2024-03-07 12:57     ` John Garry [this message]
2024-03-04 13:04 ` [PATCH v2 09/14] fs: Add FS_XFLAG_ATOMICWRITES flag John Garry
2024-03-04 13:04 ` [PATCH v2 10/14] fs: iomap: Atomic write support John Garry
2024-03-04 13:04 ` [PATCH v2 11/14] fs: xfs: Support FS_XFLAG_ATOMICWRITES for forcealign John Garry
2024-03-06 21:43   ` Dave Chinner
2024-03-07 12:42     ` John Garry
2024-03-04 13:04 ` [PATCH v2 12/14] fs: xfs: Support atomic write for statx John Garry
2024-03-06 21:31   ` Dave Chinner
2024-03-07 10:35     ` John Garry
2024-03-04 13:04 ` [PATCH v2 13/14] fs: xfs: Validate atomic writes John Garry
2024-03-06 21:22   ` Dave Chinner
2024-03-07 10:19     ` John Garry
2024-03-04 13:04 ` [PATCH v2 14/14] fs: xfs: Support setting FMODE_CAN_ATOMIC_WRITE John Garry
2024-03-06 21:33   ` Dave Chinner
2024-03-07 11:55     ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=66204b26-7a8a-4c21-8728-aa3c5af5822c@oracle.com \
    --to=john.g.garry@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=chandan.babu@oracle.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jbongio@google.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ojaswin@linux.ibm.com \
    --cc=ritesh.list@gmail.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).