linux-f2fs-devel.lists.sourceforge.net archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Vishal Moola <vishal.moola@gmail.com>
Cc: linux-cifs@vger.kernel.org, linux-nilfs@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Matthew Wilcox <willy@infradead.org>,
	linux-f2fs-devel@lists.sourceforge.net, cluster-devel@redhat.com,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-afs@lists.infradead.org, linux-btrfs@vger.kernel.org
Subject: Re: [f2fs-dev] [PATCH 04/23] page-writeback: Convert write_cache_pages() to use filemap_get_folios_tag()
Date: Fri, 4 Nov 2022 11:32:35 +1100	[thread overview]
Message-ID: <20221104003235.GZ2703033@dread.disaster.area> (raw)
In-Reply-To: <Y2RAdUtJrOJmYU4L@fedora>

On Thu, Nov 03, 2022 at 03:28:05PM -0700, Vishal Moola wrote:
> On Wed, Oct 19, 2022 at 08:01:52AM +1100, Dave Chinner wrote:
> > On Thu, Sep 01, 2022 at 03:01:19PM -0700, Vishal Moola (Oracle) wrote:
> > > Converted function to use folios throughout. This is in preparation for
> > > the removal of find_get_pages_range_tag().
> > > 
> > > Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> > > ---
> > >  mm/page-writeback.c | 44 +++++++++++++++++++++++---------------------
> > >  1 file changed, 23 insertions(+), 21 deletions(-)
> > > 
> > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > > index 032a7bf8d259..087165357a5a 100644
> > > --- a/mm/page-writeback.c
> > > +++ b/mm/page-writeback.c
> > > @@ -2285,15 +2285,15 @@ int write_cache_pages(struct address_space *mapping,
> > >  	int ret = 0;
> > >  	int done = 0;
> > >  	int error;
> > > -	struct pagevec pvec;
> > > -	int nr_pages;
> > > +	struct folio_batch fbatch;
> > > +	int nr_folios;
> > >  	pgoff_t index;
> > >  	pgoff_t end;		/* Inclusive */
> > >  	pgoff_t done_index;
> > >  	int range_whole = 0;
> > >  	xa_mark_t tag;
> > >  
> > > -	pagevec_init(&pvec);
> > > +	folio_batch_init(&fbatch);
> > >  	if (wbc->range_cyclic) {
> > >  		index = mapping->writeback_index; /* prev offset */
> > >  		end = -1;
> > > @@ -2313,17 +2313,18 @@ int write_cache_pages(struct address_space *mapping,
> > >  	while (!done && (index <= end)) {
> > >  		int i;
> > >  
> > > -		nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end,
> > > -				tag);
> > > -		if (nr_pages == 0)
> > > +		nr_folios = filemap_get_folios_tag(mapping, &index, end,
> > > +				tag, &fbatch);
> > 
> > This can find and return dirty multi-page folios if the filesystem
> > enables them in the mapping at instantiation time, right?
> 
> Yup, it will.
> 
> > > +
> > > +		if (nr_folios == 0)
> > >  			break;
> > >  
> > > -		for (i = 0; i < nr_pages; i++) {
> > > -			struct page *page = pvec.pages[i];
> > > +		for (i = 0; i < nr_folios; i++) {
> > > +			struct folio *folio = fbatch.folios[i];
> > >  
> > > -			done_index = page->index;
> > > +			done_index = folio->index;
> > >  
> > > -			lock_page(page);
> > > +			folio_lock(folio);
> > >  
> > >  			/*
> > >  			 * Page truncated or invalidated. We can freely skip it
> > > @@ -2333,30 +2334,30 @@ int write_cache_pages(struct address_space *mapping,
> > >  			 * even if there is now a new, dirty page at the same
> > >  			 * pagecache address.
> > >  			 */
> > > -			if (unlikely(page->mapping != mapping)) {
> > > +			if (unlikely(folio->mapping != mapping)) {
> > >  continue_unlock:
> > > -				unlock_page(page);
> > > +				folio_unlock(folio);
> > >  				continue;
> > >  			}
> > >  
> > > -			if (!PageDirty(page)) {
> > > +			if (!folio_test_dirty(folio)) {
> > >  				/* someone wrote it for us */
> > >  				goto continue_unlock;
> > >  			}
> > >  
> > > -			if (PageWriteback(page)) {
> > > +			if (folio_test_writeback(folio)) {
> > >  				if (wbc->sync_mode != WB_SYNC_NONE)
> > > -					wait_on_page_writeback(page);
> > > +					folio_wait_writeback(folio);
> > >  				else
> > >  					goto continue_unlock;
> > >  			}
> > >  
> > > -			BUG_ON(PageWriteback(page));
> > > -			if (!clear_page_dirty_for_io(page))
> > > +			BUG_ON(folio_test_writeback(folio));
> > > +			if (!folio_clear_dirty_for_io(folio))
> > >  				goto continue_unlock;
> > >  
> > >  			trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
> > > -			error = (*writepage)(page, wbc, data);
> > > +			error = writepage(&folio->page, wbc, data);
> > 
> > Yet, IIUC, this treats all folios as if they are single page folios.
> > i.e. it passes the head page of a multi-page folio to a callback
> > that will treat it as a single PAGE_SIZE page, because that's all
> > the writepage callbacks are currently expected to be passed...
> > 
> > So won't this break writeback of dirty multipage folios?
> 
> Yes, it appears it would. But it wouldn't because its already 'broken'.

It is? Then why isn't XFS broken on existing kernels? Oh, we don't
know because it hasn't been tested?

Seriously - if this really is broken, and this patchset further
propagating the brokeness, then somebody needs to explain to me why
this is not corrupting data in XFS.

I get it that page/folios are in transition, but passing a
multi-page folio page to an interface that expects a PAGE_SIZE
struct page is a pretty nasty landmine, regardless of how broken the
higher level iteration code already might be.

At minimum, it needs to be documented, though I'd much prefer that
we explicitly duplicate write_cache_pages() as write_cache_folios()
with a callback that takes a folio and change the code to be fully
multi-page folio safe. Then filesystems that support folios (and
large folios) natively can be passed folios without going through
this crappy "folio->page, page->folio" dance because the writepage
APIs are unaware of multi-page folio constructs.

Then you can convert the individual filesystems using
write_cache_pages() to call write_cache_folios() one at a time,
updating the filesystem callback to do the conversion from folio to
struct page and checking that it an order-0 page that it has been
handed....

> The current find_get_pages_range_tag() actually has the exact same
> issue. The current code to fill up the pages array is:
> 
> 		pages[ret] = &folio->page;
> 		if (++ret == nr_pages) {
> 			*index = folio->index + folio_nr_pages(folio);
> 			goto out;

"It's already broken so we can make it more broken" isn't an
acceptible answer....

> Its not great to leave it 'broken' but its something that isn't - or at
> least shouldn't be - creating any problems at present. And I believe Matthew
> has plans to address them at some point before they actually become problems?

You are modifying the interfaces and doing folio conversions that
expose and propagate the brokenness. The brokeness needs to be
either avoided or fixed and not propagated further. Doing the above
write_cache_folios() conversion avoids the propagating the
brokenness, adds runtime detection of brokenness, and provides the
right interface for writeback iteration of folios.

Fixing the generic writeback iterator properly is not much extra
work, and it sets the model for filesytsems that have copy-pasted
write_cache_pages() and then hacked it around for their own purposes
(e.g. ext4, btrfs) to follow.

-Dave.
-- 
Dave Chinner
david@fromorbit.com


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

  reply	other threads:[~2022-11-04  0:39 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-01 22:01 [f2fs-dev] [PATCH 00/23] Convert to filemap_get_folios_tag() Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 01/23] pagemap: Add filemap_grab_folio() Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 02/23] filemap: Added filemap_get_folios_tag() Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 03/23] filemap: Convert __filemap_fdatawait_range() to use filemap_get_folios_tag() Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 04/23] page-writeback: Convert write_cache_pages() " Vishal Moola (Oracle)
2022-10-18 21:01   ` Dave Chinner
2022-11-03 22:28     ` Vishal Moola
2022-11-04  0:32       ` Dave Chinner [this message]
2022-11-04  2:45         ` Darrick J. Wong
2022-11-04  3:36           ` Dave Chinner
2022-11-04 20:06         ` Matthew Wilcox
2022-11-04 15:27     ` Matthew Wilcox
2022-09-01 22:01 ` [f2fs-dev] [PATCH 05/23] afs: Convert afs_writepages_region() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 06/23] btrfs: Convert btree_write_cache_pages() to use filemap_get_folio_tag() Vishal Moola (Oracle)
2022-09-02 12:28   ` David Sterba
2022-09-01 22:01 ` [f2fs-dev] [PATCH 07/23] btrfs: Convert extent_write_cache_pages() to use filemap_get_folios_tag() Vishal Moola (Oracle)
2022-09-02 12:29   ` David Sterba
2022-09-01 22:01 ` [f2fs-dev] [PATCH 08/23] ceph: Convert ceph_writepages_start() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 09/23] cifs: Convert wdata_alloc_and_fillpages() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 10/23] ext4: Convert mpage_prepare_extent_to_map() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 11/23] f2fs: Convert f2fs_fsync_node_pages() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 12/23] f2fs: Convert f2fs_flush_inline_data() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 13/23] f2fs: Convert f2fs_sync_node_pages() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 14/23] f2fs: Convert f2fs_write_cache_pages() " Vishal Moola (Oracle)
2022-09-02 19:57   ` kernel test robot
2022-09-02 21:39   ` kernel test robot
2022-09-01 22:01 ` [f2fs-dev] [PATCH 15/23] f2fs: Convert last_fsync_dnode() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 16/23] f2fs: Convert f2fs_sync_meta_pages() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 17/23] gfs2: Convert gfs2_write_cache_jdata() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 18/23] nilfs2: Convert nilfs_lookup_dirty_data_buffers() " Vishal Moola (Oracle)
2022-09-03 17:38   ` Ryusuke Konishi
2022-09-01 22:01 ` [f2fs-dev] [PATCH 19/23] nilfs2: Convert nilfs_lookup_dirty_node_buffers() " Vishal Moola (Oracle)
2022-09-03 17:37   ` Ryusuke Konishi
2022-09-01 22:01 ` [f2fs-dev] [PATCH 20/23] nilfs2: Convert nilfs_btree_lookup_dirty_buffers() " Vishal Moola (Oracle)
2022-09-03 17:37   ` Ryusuke Konishi
2022-09-01 22:01 ` [f2fs-dev] [PATCH 21/23] nilfs2: Convert nilfs_copy_dirty_pages() " Vishal Moola (Oracle)
2022-09-03 17:37   ` Ryusuke Konishi
2022-09-01 22:01 ` [f2fs-dev] [PATCH 22/23] nilfs2: Convert nilfs_clear_dirty_pages() " Vishal Moola (Oracle)
2022-09-03 17:38   ` Ryusuke Konishi
2022-09-01 22:01 ` [f2fs-dev] [PATCH 23/23] filemap: Remove find_get_pages_range_tag() Vishal Moola (Oracle)
2022-10-14 13:59 ` [f2fs-dev] [PATCH 05/23] afs: Convert afs_writepages_region() to use filemap_get_folios_tag() David Howells
2022-10-18 21:45 ` [f2fs-dev] [PATCH 00/23] Convert to filemap_get_folios_tag() Dave Chinner
2022-11-03 21:59   ` Vishal Moola

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221104003235.GZ2703033@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=cluster-devel@redhat.com \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nilfs@vger.kernel.org \
    --cc=vishal.moola@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).