linux-f2fs-devel.lists.sourceforge.net archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-cifs@vger.kernel.org, cluster-devel@redhat.com,
	linux-kernel@vger.kernel.org,
	Matthew Wilcox <willy@infradead.org>,
	linux-f2fs-devel@lists.sourceforge.net,
	Vishal Moola <vishal.moola@gmail.com>,
	linux-mm@kvack.org, linux-nilfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, ceph-devel@vger.kernel.org,
	linux-ext4@vger.kernel.org, linux-afs@lists.infradead.org,
	linux-btrfs@vger.kernel.org
Subject: Re: [f2fs-dev] [PATCH 04/23] page-writeback: Convert write_cache_pages() to use filemap_get_folios_tag()
Date: Thu, 3 Nov 2022 19:45:01 -0700	[thread overview]
Message-ID: <Y2R8rRr0ZdrlT32m@magnolia> (raw)
In-Reply-To: <20221104003235.GZ2703033@dread.disaster.area>

On Fri, Nov 04, 2022 at 11:32:35AM +1100, Dave Chinner wrote:
> On Thu, Nov 03, 2022 at 03:28:05PM -0700, Vishal Moola wrote:
> > On Wed, Oct 19, 2022 at 08:01:52AM +1100, Dave Chinner wrote:
> > > On Thu, Sep 01, 2022 at 03:01:19PM -0700, Vishal Moola (Oracle) wrote:
> > > > Converted function to use folios throughout. This is in preparation for
> > > > the removal of find_get_pages_range_tag().
> > > > 
> > > > Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> > > > ---
> > > >  mm/page-writeback.c | 44 +++++++++++++++++++++++---------------------
> > > >  1 file changed, 23 insertions(+), 21 deletions(-)
> > > > 
> > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > > > index 032a7bf8d259..087165357a5a 100644
> > > > --- a/mm/page-writeback.c
> > > > +++ b/mm/page-writeback.c
> > > > @@ -2285,15 +2285,15 @@ int write_cache_pages(struct address_space *mapping,
> > > >  	int ret = 0;
> > > >  	int done = 0;
> > > >  	int error;
> > > > -	struct pagevec pvec;
> > > > -	int nr_pages;
> > > > +	struct folio_batch fbatch;
> > > > +	int nr_folios;
> > > >  	pgoff_t index;
> > > >  	pgoff_t end;		/* Inclusive */
> > > >  	pgoff_t done_index;
> > > >  	int range_whole = 0;
> > > >  	xa_mark_t tag;
> > > >  
> > > > -	pagevec_init(&pvec);
> > > > +	folio_batch_init(&fbatch);
> > > >  	if (wbc->range_cyclic) {
> > > >  		index = mapping->writeback_index; /* prev offset */
> > > >  		end = -1;
> > > > @@ -2313,17 +2313,18 @@ int write_cache_pages(struct address_space *mapping,
> > > >  	while (!done && (index <= end)) {
> > > >  		int i;
> > > >  
> > > > -		nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end,
> > > > -				tag);
> > > > -		if (nr_pages == 0)
> > > > +		nr_folios = filemap_get_folios_tag(mapping, &index, end,
> > > > +				tag, &fbatch);
> > > 
> > > This can find and return dirty multi-page folios if the filesystem
> > > enables them in the mapping at instantiation time, right?
> > 
> > Yup, it will.
> > 
> > > > +
> > > > +		if (nr_folios == 0)
> > > >  			break;
> > > >  
> > > > -		for (i = 0; i < nr_pages; i++) {
> > > > -			struct page *page = pvec.pages[i];
> > > > +		for (i = 0; i < nr_folios; i++) {
> > > > +			struct folio *folio = fbatch.folios[i];
> > > >  
> > > > -			done_index = page->index;
> > > > +			done_index = folio->index;
> > > >  
> > > > -			lock_page(page);
> > > > +			folio_lock(folio);
> > > >  
> > > >  			/*
> > > >  			 * Page truncated or invalidated. We can freely skip it
> > > > @@ -2333,30 +2334,30 @@ int write_cache_pages(struct address_space *mapping,
> > > >  			 * even if there is now a new, dirty page at the same
> > > >  			 * pagecache address.
> > > >  			 */
> > > > -			if (unlikely(page->mapping != mapping)) {
> > > > +			if (unlikely(folio->mapping != mapping)) {
> > > >  continue_unlock:
> > > > -				unlock_page(page);
> > > > +				folio_unlock(folio);
> > > >  				continue;
> > > >  			}
> > > >  
> > > > -			if (!PageDirty(page)) {
> > > > +			if (!folio_test_dirty(folio)) {
> > > >  				/* someone wrote it for us */
> > > >  				goto continue_unlock;
> > > >  			}
> > > >  
> > > > -			if (PageWriteback(page)) {
> > > > +			if (folio_test_writeback(folio)) {
> > > >  				if (wbc->sync_mode != WB_SYNC_NONE)
> > > > -					wait_on_page_writeback(page);
> > > > +					folio_wait_writeback(folio);
> > > >  				else
> > > >  					goto continue_unlock;
> > > >  			}
> > > >  
> > > > -			BUG_ON(PageWriteback(page));
> > > > -			if (!clear_page_dirty_for_io(page))
> > > > +			BUG_ON(folio_test_writeback(folio));
> > > > +			if (!folio_clear_dirty_for_io(folio))
> > > >  				goto continue_unlock;
> > > >  
> > > >  			trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
> > > > -			error = (*writepage)(page, wbc, data);
> > > > +			error = writepage(&folio->page, wbc, data);
> > > 
> > > Yet, IIUC, this treats all folios as if they are single page folios.
> > > i.e. it passes the head page of a multi-page folio to a callback
> > > that will treat it as a single PAGE_SIZE page, because that's all
> > > the writepage callbacks are currently expected to be passed...
> > > 
> > > So won't this break writeback of dirty multipage folios?
> > 
> > Yes, it appears it would. But it wouldn't because its already 'broken'.
> 
> It is? Then why isn't XFS broken on existing kernels? Oh, we don't
> know because it hasn't been tested?
> 
> Seriously - if this really is broken, and this patchset further
> propagating the brokeness, then somebody needs to explain to me why
> this is not corrupting data in XFS.

It looks like iomap_do_writepage finds the folio size correctly

	end_pos = folio_pos(folio) + folio_size(folio);

and iomap_writpage_map will map out the correct number of blocks

	unsigned nblocks = i_blocks_per_folio(inode, folio);

	for (i = 0; i < nblocks && pos < end_pos; i++, pos += len) {

right?  The interface is dangerous because anyone who enables multipage
folios has to be aware that ->writepage can be handed a multipage folio.

(That said, the lack of mention of xfs in the testing plan doesn't give
me much confidence anyone has checked this...)

> I get it that page/folios are in transition, but passing a
> multi-page folio page to an interface that expects a PAGE_SIZE
> struct page is a pretty nasty landmine, regardless of how broken the
> higher level iteration code already might be.
> 
> At minimum, it needs to be documented, though I'd much prefer that
> we explicitly duplicate write_cache_pages() as write_cache_folios()
> with a callback that takes a folio and change the code to be fully
> multi-page folio safe. Then filesystems that support folios (and
> large folios) natively can be passed folios without going through
> this crappy "folio->page, page->folio" dance because the writepage
> APIs are unaware of multi-page folio constructs.

Agree.  Build the new one, move callers over, and kill the old one.

> Then you can convert the individual filesystems using
> write_cache_pages() to call write_cache_folios() one at a time,
> updating the filesystem callback to do the conversion from folio to
> struct page and checking that it an order-0 page that it has been
> handed....
> 
> > The current find_get_pages_range_tag() actually has the exact same
> > issue. The current code to fill up the pages array is:
> > 
> > 		pages[ret] = &folio->page;
> > 		if (++ret == nr_pages) {
> > 			*index = folio->index + folio_nr_pages(folio);
> > 			goto out;
> 
> "It's already broken so we can make it more broken" isn't an
> acceptible answer....
> 
> > Its not great to leave it 'broken' but its something that isn't - or at
> > least shouldn't be - creating any problems at present. And I believe Matthew
> > has plans to address them at some point before they actually become problems?
> 
> You are modifying the interfaces and doing folio conversions that
> expose and propagate the brokenness. The brokeness needs to be
> either avoided or fixed and not propagated further. Doing the above
> write_cache_folios() conversion avoids the propagating the
> brokenness, adds runtime detection of brokenness, and provides the
> right interface for writeback iteration of folios.
> 
> Fixing the generic writeback iterator properly is not much extra
> work, and it sets the model for filesytsems that have copy-pasted
> write_cache_pages() and then hacked it around for their own purposes
> (e.g. ext4, btrfs) to follow.
> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

  reply	other threads:[~2022-11-04  2:45 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-01 22:01 [f2fs-dev] [PATCH 00/23] Convert to filemap_get_folios_tag() Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 01/23] pagemap: Add filemap_grab_folio() Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 02/23] filemap: Added filemap_get_folios_tag() Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 03/23] filemap: Convert __filemap_fdatawait_range() to use filemap_get_folios_tag() Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 04/23] page-writeback: Convert write_cache_pages() " Vishal Moola (Oracle)
2022-10-18 21:01   ` Dave Chinner
2022-11-03 22:28     ` Vishal Moola
2022-11-04  0:32       ` Dave Chinner
2022-11-04  2:45         ` Darrick J. Wong [this message]
2022-11-04  3:36           ` Dave Chinner
2022-11-04 20:06         ` Matthew Wilcox
2022-11-04 15:27     ` Matthew Wilcox
2022-09-01 22:01 ` [f2fs-dev] [PATCH 05/23] afs: Convert afs_writepages_region() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 06/23] btrfs: Convert btree_write_cache_pages() to use filemap_get_folio_tag() Vishal Moola (Oracle)
2022-09-02 12:28   ` David Sterba
2022-09-01 22:01 ` [f2fs-dev] [PATCH 07/23] btrfs: Convert extent_write_cache_pages() to use filemap_get_folios_tag() Vishal Moola (Oracle)
2022-09-02 12:29   ` David Sterba
2022-09-01 22:01 ` [f2fs-dev] [PATCH 08/23] ceph: Convert ceph_writepages_start() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 09/23] cifs: Convert wdata_alloc_and_fillpages() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 10/23] ext4: Convert mpage_prepare_extent_to_map() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 11/23] f2fs: Convert f2fs_fsync_node_pages() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 12/23] f2fs: Convert f2fs_flush_inline_data() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 13/23] f2fs: Convert f2fs_sync_node_pages() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 14/23] f2fs: Convert f2fs_write_cache_pages() " Vishal Moola (Oracle)
2022-09-02 19:57   ` kernel test robot
2022-09-02 21:39   ` kernel test robot
2022-09-01 22:01 ` [f2fs-dev] [PATCH 15/23] f2fs: Convert last_fsync_dnode() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 16/23] f2fs: Convert f2fs_sync_meta_pages() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 17/23] gfs2: Convert gfs2_write_cache_jdata() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 18/23] nilfs2: Convert nilfs_lookup_dirty_data_buffers() " Vishal Moola (Oracle)
2022-09-03 17:38   ` Ryusuke Konishi
2022-09-01 22:01 ` [f2fs-dev] [PATCH 19/23] nilfs2: Convert nilfs_lookup_dirty_node_buffers() " Vishal Moola (Oracle)
2022-09-03 17:37   ` Ryusuke Konishi
2022-09-01 22:01 ` [f2fs-dev] [PATCH 20/23] nilfs2: Convert nilfs_btree_lookup_dirty_buffers() " Vishal Moola (Oracle)
2022-09-03 17:37   ` Ryusuke Konishi
2022-09-01 22:01 ` [f2fs-dev] [PATCH 21/23] nilfs2: Convert nilfs_copy_dirty_pages() " Vishal Moola (Oracle)
2022-09-03 17:37   ` Ryusuke Konishi
2022-09-01 22:01 ` [f2fs-dev] [PATCH 22/23] nilfs2: Convert nilfs_clear_dirty_pages() " Vishal Moola (Oracle)
2022-09-03 17:38   ` Ryusuke Konishi
2022-09-01 22:01 ` [f2fs-dev] [PATCH 23/23] filemap: Remove find_get_pages_range_tag() Vishal Moola (Oracle)
2022-10-14 13:59 ` [f2fs-dev] [PATCH 05/23] afs: Convert afs_writepages_region() to use filemap_get_folios_tag() David Howells
2022-10-18 21:45 ` [f2fs-dev] [PATCH 00/23] Convert to filemap_get_folios_tag() Dave Chinner
2022-11-03 21:59   ` Vishal Moola

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2R8rRr0ZdrlT32m@magnolia \
    --to=djwong@kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=cluster-devel@redhat.com \
    --cc=david@fromorbit.com \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nilfs@vger.kernel.org \
    --cc=vishal.moola@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).