All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Roesch <shr@fb.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: io-uring@vger.kernel.org, kernel-team@fb.com, linux-mm@kvack.org,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	david@fromorbit.com, jack@suse.cz, hch@infradead.org,
	axboe@kernel.dk
Subject: Re: [PATCH v8 06/14] iomap: Return -EAGAIN from iomap_write_iter()
Date: Thu, 9 Jun 2022 11:49:34 -0700	[thread overview]
Message-ID: <d46b77d5-29d6-d3c0-43dc-72d5d41e111b@fb.com> (raw)
In-Reply-To: <YqDyT7uSd0vv15aL@casper.infradead.org>



On 6/8/22 12:02 PM, Matthew Wilcox wrote:
> On Wed, Jun 08, 2022 at 10:17:33AM -0700, Stefan Roesch wrote:
>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>> index b06a5c24a4db..f701dcb7c26a 100644
>> --- a/fs/iomap/buffered-io.c
>> +++ b/fs/iomap/buffered-io.c
>> @@ -829,7 +829,13 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>>  		length -= status;
>>  	} while (iov_iter_count(i) && length);
>>  
>> -	return written ? written : status;
>> +	if (status == -EAGAIN) {
>> +		iov_iter_revert(i, written);
>> +		return -EAGAIN;
>> +	}
>> +	if (written)
>> +		return written;
>> +	return status;
>>  }
> 
> I still don't understand how this can possibly work.  Walk me through it.
> 
> Let's imagine we have a file laid out such that extent 1 is bytes
> 0-4095 of the file and extent 2 is extent 4096-16385 of the file.
> We do a write of 5000 bytes starting at offset 4000 of the file.
> 
> iomap_iter() tells us about the first extent and we write the first
> 96 bytes of our data to the first extent, returning 96.  iomap_iter()
> tells us about the second extent, and we write the next 4000 bytes to
> the second extent.  Then we get a page fault and get to the -EAGAIN case.
> We rewind the iter 4000 bytes.
> 

We have two data structures, the iomap_iter and iov_iter. After the first
96 bytes, the iov_iter offset get updated in iomap_write_iter() and then the
iomap_iter pos gets updated in iomap_iter()->iomap_iter_advance().

We then get the second extend from iomap_iter(). In iomap_write_iter() the
first page is obtained and written successfully, then the second page is
faulted. At this point the iov offset of the iov_iter has advanced. To reset
it to the state when the function iomap_write_iter() was entered, the iov_iter
is reset to iov_offset - written bytes.

iomap_write_iter() is exited and returns -EAGAIN. As iomap_write_iter() returns
an error, the iomap_iter pos is not updated in iomap_iter(). Only the number
of bytes written in the write of the first extent from iomap_file_buffered_write()
is returned from iomap_file_buffered_write().

In xfs_file_buffered_write we updated the iocb->ki_pos with the number of
bytes written. In io-uring, the io_write() call receives the short write result.
It copies the iov_iter struct into the work context for the io worker.
The io_worker uses that information to complete the rest of the write.

The above reset is required to keep the pos in iomap_iter and the offset in
iov_iter in sync.



Side Note:

I had an earlier version of the patch that was changing the signature of the
function iomap_write_iter(). It was returning a return code and changing the
processed value of the iomap_iter (which then also changes the pos value of 
the iomap_iter). This version (version 7 of the patch) does not require to
reset the offset of the iov_iter. It can update the pos in iomap_iter even
when -EAGAIN is returned.



> How do we not end up writing garbage when the kworker does the retry?
> I'd understand if we rewound the iter all the way to the start.  Or if
> we didn't rewind the iter at all and were able to pick up partway through
> the write.  But rewinding to the start of the extent feels like it can't
> possibly work.

  reply	other threads:[~2022-06-09 18:50 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-08 17:17 [PATCH v8 00/14] io-uring/xfs: support async buffered writes Stefan Roesch
2022-06-08 17:17 ` [PATCH v8 01/14] mm: Move starting of background writeback into the main balancing loop Stefan Roesch
2022-06-08 17:17 ` [PATCH v8 02/14] mm: Move updates of dirty_exceeded into one place Stefan Roesch
2022-06-08 17:17 ` [PATCH v8 03/14] mm: Add balance_dirty_pages_ratelimited_flags() function Stefan Roesch
2022-06-08 18:44   ` Matthew Wilcox
2022-06-08 17:17 ` [PATCH v8 04/14] iomap: Add flags parameter to iomap_page_create() Stefan Roesch
2022-06-08 18:51   ` Matthew Wilcox
2022-06-08 17:17 ` [PATCH v8 05/14] iomap: Add async buffered write support Stefan Roesch
2022-06-08 17:17 ` [PATCH v8 06/14] iomap: Return -EAGAIN from iomap_write_iter() Stefan Roesch
2022-06-08 19:02   ` Matthew Wilcox
2022-06-09 18:49     ` Stefan Roesch [this message]
2022-06-08 17:17 ` [PATCH v8 07/14] fs: Add check for async buffered writes to generic_write_checks Stefan Roesch
2022-06-10 11:50   ` Christian Brauner
2022-06-08 17:17 ` [PATCH v8 08/14] fs: add __remove_file_privs() with flags parameter Stefan Roesch
2022-06-10 11:53   ` Christian Brauner
2022-06-13  8:50   ` [fs] b6c81e63ec: phoronix-test-suite.fio.SequentialWrite.Sync.Yes.No.4KB.DefaultTestDirectory.mb_s -4.3% regression kernel test robot
2022-06-13  8:50     ` kernel test robot
2022-06-08 17:17 ` [PATCH v8 09/14] fs: Split off inode_needs_update_time and __file_update_time Stefan Roesch
2022-06-10 11:55   ` Christian Brauner
2022-06-08 17:17 ` [PATCH v8 10/14] fs: Add async write file modification handling Stefan Roesch
2022-06-10 12:38   ` Christian Brauner
2022-06-08 17:17 ` [PATCH v8 11/14] io_uring: Add support for async buffered writes Stefan Roesch
2022-06-08 17:17 ` [PATCH v8 12/14] io_uring: Add tracepoint for short writes Stefan Roesch
2022-06-08 17:17 ` [PATCH v8 13/14] xfs: Specify lockmode when calling xfs_ilock_for_iomap() Stefan Roesch
2022-06-08 17:17 ` [PATCH v8 14/14] xfs: Add async buffered write support Stefan Roesch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d46b77d5-29d6-d3c0-43dc-72d5d41e111b@fb.com \
    --to=shr@fb.com \
    --cc=axboe@kernel.dk \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=io-uring@vger.kernel.org \
    --cc=jack@suse.cz \
    --cc=kernel-team@fb.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.