All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Stefan Roesch <shr@fb.com>
Cc: io-uring@vger.kernel.org, kernel-team@fb.com, linux-mm@kvack.org,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	jack@suse.cz, hch@infradead.org
Subject: Re: [PATCH v6 05/16] iomap: Add async buffered write support
Date: Fri, 27 May 2022 08:37:05 +1000	[thread overview]
Message-ID: <20220526223705.GJ1098723@dread.disaster.area> (raw)
In-Reply-To: <20220526173840.578265-6-shr@fb.com>

On Thu, May 26, 2022 at 10:38:29AM -0700, Stefan Roesch wrote:
> This adds async buffered write support to iomap.
> 
> This replaces the call to balance_dirty_pages_ratelimited() with the
> call to balance_dirty_pages_ratelimited_flags. This allows to specify if
> the write request is async or not.
> 
> In addition this also moves the above function call to the beginning of
> the function. If the function call is at the end of the function and the
> decision is made to throttle writes, then there is no request that
> io-uring can wait on. By moving it to the beginning of the function, the
> write request is not issued, but returns -EAGAIN instead. io-uring will
> punt the request and process it in the io-worker.
> 
> By moving the function call to the beginning of the function, the write
> throttling will happen one page later.

Won't it happen one page sooner? I.e. on single page writes we'll
end up throttling *before* we dirty the page, not *after* we dirty
the page. IOWs, we can't wait for the page that we just dirtied to
be cleaned to make progress and so this now makes the loop dependent
on pages dirtied by other writers being cleaned to guarantee
forwards progress?

That seems like a subtle but quite significant change of
algorithm...

> Signed-off-by: Stefan Roesch <shr@fb.com>
> Reviewed-by: Jan Kara <jack@suse.cz>
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index d6ddc54e190e..2281667646d2 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -559,6 +559,7 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
>  	loff_t block_size = i_blocksize(iter->inode);
>  	loff_t block_start = round_down(pos, block_size);
>  	loff_t block_end = round_up(pos + len, block_size);
> +	unsigned int nr_blocks = i_blocks_per_folio(iter->inode, folio);
>  	size_t from = offset_in_folio(folio, pos), to = from + len;
>  	size_t poff, plen;
>  
> @@ -567,6 +568,8 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
>  	folio_clear_error(folio);
>  
>  	iop = iomap_page_create(iter->inode, folio, iter->flags);
> +	if ((iter->flags & IOMAP_NOWAIT) && !iop && nr_blocks > 1)
> +		return -EAGAIN;
>  

Hmmm. I see a what looks to be an undesirable pattern here...

1. Memory allocation failure here on the second page of a write.

> @@ -806,8 +828,6 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>  		pos += status;
>  		written += status;
>  		length -= status;
> -
> -		balance_dirty_pages_ratelimited(iter->inode->i_mapping);
>  	} while (iov_iter_count(i) && length);
>  
>  	return written ? written : status;

2. we break and return 4kB from the first page copied.

> @@ -825,6 +845,9 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i,
>  	};
>  	int ret;
>  
> +	if (iocb->ki_flags & IOCB_NOWAIT)
> +		iter.flags |= IOMAP_NOWAIT;
> +
>  	while ((ret = iomap_iter(&iter, ops)) > 0)
>  		iter.processed = iomap_write_iter(&iter, i);

3. This sets iter.processed = 4kB, and we call iomap_iter() again.
This sees iter.processed > 0 and there's still more to write, so
it returns 1, and go around the loop again.

Hence spurious memory allocation failures in the IOMAP_NOWAIT will
not cause this buffered write loop to exit. Worst case, we fail
allocation on every second __iomap_write_begin() call and so the
write takes much longer and consume lots more CPU hammering memory
alocation because no single memory allocation will cause the write
to return a short write to the caller.

This seems undesirable to me. If we are failing memory allocations,
we need to back off, not hammer memory allocation harder without
allowing reclaim to make progress...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2022-05-26 22:37 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-26 17:38 [PATCH v6 00/16] io-uring/xfs: support async buffered writes Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 01/16] mm: Move starting of background writeback into the main balancing loop Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 02/16] mm: Move updates of dirty_exceeded into one place Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 03/16] mm: Add balance_dirty_pages_ratelimited_flags() function Stefan Roesch
2022-05-31  6:52   ` Christoph Hellwig
2022-05-26 17:38 ` [PATCH v6 04/16] iomap: Add flags parameter to iomap_page_create() Stefan Roesch
2022-05-26 18:25   ` Darrick J. Wong
2022-05-26 18:43     ` Stefan Roesch
2022-06-01  0:34     ` Olivier Langlois
2022-06-01  8:21       ` Jan Kara
2022-06-01 17:29         ` Olivier Langlois
2022-05-31  6:54   ` Christoph Hellwig
2022-05-31 18:12     ` Stefan Roesch
2022-06-01 17:56       ` Darrick J. Wong
2022-05-26 17:38 ` [PATCH v6 05/16] iomap: Add async buffered write support Stefan Roesch
2022-05-26 18:42   ` Darrick J. Wong
2022-05-26 22:37   ` Dave Chinner [this message]
2022-05-27  8:42     ` Jan Kara
2022-05-27 22:52       ` Dave Chinner
2022-05-31  7:55         ` Jan Kara
2022-05-31  6:58   ` Christoph Hellwig
2022-05-26 17:38 ` [PATCH v6 06/16] fs: Add check for async buffered writes to generic_write_checks Stefan Roesch
2022-05-31  6:59   ` Christoph Hellwig
2022-05-26 17:38 ` [PATCH v6 07/16] fs: add __remove_file_privs() with flags parameter Stefan Roesch
2022-05-31  7:00   ` Christoph Hellwig
2022-05-26 17:38 ` [PATCH v6 08/16] fs: Split off inode_needs_update_time and __file_update_time Stefan Roesch
2022-05-31  7:01   ` Christoph Hellwig
2022-05-31 19:02     ` Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 09/16] fs: Add async write file modification handling Stefan Roesch
2022-05-31  7:01   ` Christoph Hellwig
2022-05-26 17:38 ` [PATCH v6 10/16] fs: Optimization for concurrent file time updates Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 11/16] io_uring: Add support for async buffered writes Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 12/16] io_uring: Add tracepoint for short writes Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 13/16] xfs: Specify lockmode when calling xfs_ilock_for_iomap() Stefan Roesch
2022-05-31  7:03   ` Christoph Hellwig
2022-05-26 17:38 ` [PATCH v6 14/16] xfs: Change function signature of xfs_ilock_iocb() Stefan Roesch
2022-05-31  7:04   ` Christoph Hellwig
2022-05-31 19:15     ` Stefan Roesch
2022-06-01  5:26       ` Christoph Hellwig
2022-06-01 17:15         ` Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 15/16] xfs: Add async buffered write support Stefan Roesch
2022-05-31  7:05   ` Christoph Hellwig
2022-05-26 17:38 ` [PATCH v6 16/16] xfs: Enable " Stefan Roesch
2022-05-31  7:05   ` Christoph Hellwig
2022-05-31 19:18     ` Stefan Roesch
2022-05-26 18:12 ` [PATCH v6 00/16] io-uring/xfs: support async buffered writes Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220526223705.GJ1098723@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=io-uring@vger.kernel.org \
    --cc=jack@suse.cz \
    --cc=kernel-team@fb.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=shr@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.