From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-cifs@vger.kernel.org, cluster-devel@redhat.com,
linux-kernel@vger.kernel.org,
Matthew Wilcox <willy@infradead.org>,
linux-f2fs-devel@lists.sourceforge.net,
Vishal Moola <vishal.moola@gmail.com>,
linux-mm@kvack.org, linux-nilfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org, ceph-devel@vger.kernel.org,
linux-ext4@vger.kernel.org, linux-afs@lists.infradead.org,
linux-btrfs@vger.kernel.org
Subject: Re: [f2fs-dev] [PATCH 04/23] page-writeback: Convert write_cache_pages() to use filemap_get_folios_tag()
Date: Thu, 3 Nov 2022 19:45:01 -0700 [thread overview]
Message-ID: <Y2R8rRr0ZdrlT32m@magnolia> (raw)
In-Reply-To: <20221104003235.GZ2703033@dread.disaster.area>
On Fri, Nov 04, 2022 at 11:32:35AM +1100, Dave Chinner wrote:
> On Thu, Nov 03, 2022 at 03:28:05PM -0700, Vishal Moola wrote:
> > On Wed, Oct 19, 2022 at 08:01:52AM +1100, Dave Chinner wrote:
> > > On Thu, Sep 01, 2022 at 03:01:19PM -0700, Vishal Moola (Oracle) wrote:
> > > > Converted function to use folios throughout. This is in preparation for
> > > > the removal of find_get_pages_range_tag().
> > > >
> > > > Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> > > > ---
> > > > mm/page-writeback.c | 44 +++++++++++++++++++++++---------------------
> > > > 1 file changed, 23 insertions(+), 21 deletions(-)
> > > >
> > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > > > index 032a7bf8d259..087165357a5a 100644
> > > > --- a/mm/page-writeback.c
> > > > +++ b/mm/page-writeback.c
> > > > @@ -2285,15 +2285,15 @@ int write_cache_pages(struct address_space *mapping,
> > > > int ret = 0;
> > > > int done = 0;
> > > > int error;
> > > > - struct pagevec pvec;
> > > > - int nr_pages;
> > > > + struct folio_batch fbatch;
> > > > + int nr_folios;
> > > > pgoff_t index;
> > > > pgoff_t end; /* Inclusive */
> > > > pgoff_t done_index;
> > > > int range_whole = 0;
> > > > xa_mark_t tag;
> > > >
> > > > - pagevec_init(&pvec);
> > > > + folio_batch_init(&fbatch);
> > > > if (wbc->range_cyclic) {
> > > > index = mapping->writeback_index; /* prev offset */
> > > > end = -1;
> > > > @@ -2313,17 +2313,18 @@ int write_cache_pages(struct address_space *mapping,
> > > > while (!done && (index <= end)) {
> > > > int i;
> > > >
> > > > - nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end,
> > > > - tag);
> > > > - if (nr_pages == 0)
> > > > + nr_folios = filemap_get_folios_tag(mapping, &index, end,
> > > > + tag, &fbatch);
> > >
> > > This can find and return dirty multi-page folios if the filesystem
> > > enables them in the mapping at instantiation time, right?
> >
> > Yup, it will.
> >
> > > > +
> > > > + if (nr_folios == 0)
> > > > break;
> > > >
> > > > - for (i = 0; i < nr_pages; i++) {
> > > > - struct page *page = pvec.pages[i];
> > > > + for (i = 0; i < nr_folios; i++) {
> > > > + struct folio *folio = fbatch.folios[i];
> > > >
> > > > - done_index = page->index;
> > > > + done_index = folio->index;
> > > >
> > > > - lock_page(page);
> > > > + folio_lock(folio);
> > > >
> > > > /*
> > > > * Page truncated or invalidated. We can freely skip it
> > > > @@ -2333,30 +2334,30 @@ int write_cache_pages(struct address_space *mapping,
> > > > * even if there is now a new, dirty page at the same
> > > > * pagecache address.
> > > > */
> > > > - if (unlikely(page->mapping != mapping)) {
> > > > + if (unlikely(folio->mapping != mapping)) {
> > > > continue_unlock:
> > > > - unlock_page(page);
> > > > + folio_unlock(folio);
> > > > continue;
> > > > }
> > > >
> > > > - if (!PageDirty(page)) {
> > > > + if (!folio_test_dirty(folio)) {
> > > > /* someone wrote it for us */
> > > > goto continue_unlock;
> > > > }
> > > >
> > > > - if (PageWriteback(page)) {
> > > > + if (folio_test_writeback(folio)) {
> > > > if (wbc->sync_mode != WB_SYNC_NONE)
> > > > - wait_on_page_writeback(page);
> > > > + folio_wait_writeback(folio);
> > > > else
> > > > goto continue_unlock;
> > > > }
> > > >
> > > > - BUG_ON(PageWriteback(page));
> > > > - if (!clear_page_dirty_for_io(page))
> > > > + BUG_ON(folio_test_writeback(folio));
> > > > + if (!folio_clear_dirty_for_io(folio))
> > > > goto continue_unlock;
> > > >
> > > > trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
> > > > - error = (*writepage)(page, wbc, data);
> > > > + error = writepage(&folio->page, wbc, data);
> > >
> > > Yet, IIUC, this treats all folios as if they are single page folios.
> > > i.e. it passes the head page of a multi-page folio to a callback
> > > that will treat it as a single PAGE_SIZE page, because that's all
> > > the writepage callbacks are currently expected to be passed...
> > >
> > > So won't this break writeback of dirty multipage folios?
> >
> > Yes, it appears it would. But it wouldn't because its already 'broken'.
>
> It is? Then why isn't XFS broken on existing kernels? Oh, we don't
> know because it hasn't been tested?
>
> Seriously - if this really is broken, and this patchset further
> propagating the brokeness, then somebody needs to explain to me why
> this is not corrupting data in XFS.
It looks like iomap_do_writepage finds the folio size correctly
end_pos = folio_pos(folio) + folio_size(folio);
and iomap_writpage_map will map out the correct number of blocks
unsigned nblocks = i_blocks_per_folio(inode, folio);
for (i = 0; i < nblocks && pos < end_pos; i++, pos += len) {
right? The interface is dangerous because anyone who enables multipage
folios has to be aware that ->writepage can be handed a multipage folio.
(That said, the lack of mention of xfs in the testing plan doesn't give
me much confidence anyone has checked this...)
> I get it that page/folios are in transition, but passing a
> multi-page folio page to an interface that expects a PAGE_SIZE
> struct page is a pretty nasty landmine, regardless of how broken the
> higher level iteration code already might be.
>
> At minimum, it needs to be documented, though I'd much prefer that
> we explicitly duplicate write_cache_pages() as write_cache_folios()
> with a callback that takes a folio and change the code to be fully
> multi-page folio safe. Then filesystems that support folios (and
> large folios) natively can be passed folios without going through
> this crappy "folio->page, page->folio" dance because the writepage
> APIs are unaware of multi-page folio constructs.
Agree. Build the new one, move callers over, and kill the old one.
> Then you can convert the individual filesystems using
> write_cache_pages() to call write_cache_folios() one at a time,
> updating the filesystem callback to do the conversion from folio to
> struct page and checking that it an order-0 page that it has been
> handed....
>
> > The current find_get_pages_range_tag() actually has the exact same
> > issue. The current code to fill up the pages array is:
> >
> > pages[ret] = &folio->page;
> > if (++ret == nr_pages) {
> > *index = folio->index + folio_nr_pages(folio);
> > goto out;
>
> "It's already broken so we can make it more broken" isn't an
> acceptible answer....
>
> > Its not great to leave it 'broken' but its something that isn't - or at
> > least shouldn't be - creating any problems at present. And I believe Matthew
> > has plans to address them at some point before they actually become problems?
>
> You are modifying the interfaces and doing folio conversions that
> expose and propagate the brokenness. The brokeness needs to be
> either avoided or fixed and not propagated further. Doing the above
> write_cache_folios() conversion avoids the propagating the
> brokenness, adds runtime detection of brokenness, and provides the
> right interface for writeback iteration of folios.
>
> Fixing the generic writeback iterator properly is not much extra
> work, and it sets the model for filesytsems that have copy-pasted
> write_cache_pages() and then hacked it around for their own purposes
> (e.g. ext4, btrfs) to follow.
>
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
next prev parent reply other threads:[~2022-11-04 2:45 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-01 22:01 [f2fs-dev] [PATCH 00/23] Convert to filemap_get_folios_tag() Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 01/23] pagemap: Add filemap_grab_folio() Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 02/23] filemap: Added filemap_get_folios_tag() Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 03/23] filemap: Convert __filemap_fdatawait_range() to use filemap_get_folios_tag() Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 04/23] page-writeback: Convert write_cache_pages() " Vishal Moola (Oracle)
2022-10-18 21:01 ` Dave Chinner
2022-11-03 22:28 ` Vishal Moola
2022-11-04 0:32 ` Dave Chinner
2022-11-04 2:45 ` Darrick J. Wong [this message]
2022-11-04 3:36 ` Dave Chinner
2022-11-04 20:06 ` Matthew Wilcox
2022-11-04 15:27 ` Matthew Wilcox
2022-09-01 22:01 ` [f2fs-dev] [PATCH 05/23] afs: Convert afs_writepages_region() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 06/23] btrfs: Convert btree_write_cache_pages() to use filemap_get_folio_tag() Vishal Moola (Oracle)
2022-09-02 12:28 ` David Sterba
2022-09-01 22:01 ` [f2fs-dev] [PATCH 07/23] btrfs: Convert extent_write_cache_pages() to use filemap_get_folios_tag() Vishal Moola (Oracle)
2022-09-02 12:29 ` David Sterba
2022-09-01 22:01 ` [f2fs-dev] [PATCH 08/23] ceph: Convert ceph_writepages_start() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 09/23] cifs: Convert wdata_alloc_and_fillpages() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 10/23] ext4: Convert mpage_prepare_extent_to_map() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 11/23] f2fs: Convert f2fs_fsync_node_pages() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 12/23] f2fs: Convert f2fs_flush_inline_data() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 13/23] f2fs: Convert f2fs_sync_node_pages() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 14/23] f2fs: Convert f2fs_write_cache_pages() " Vishal Moola (Oracle)
2022-09-02 19:57 ` kernel test robot
2022-09-02 21:39 ` kernel test robot
2022-09-01 22:01 ` [f2fs-dev] [PATCH 15/23] f2fs: Convert last_fsync_dnode() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 16/23] f2fs: Convert f2fs_sync_meta_pages() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 17/23] gfs2: Convert gfs2_write_cache_jdata() " Vishal Moola (Oracle)
2022-09-01 22:01 ` [f2fs-dev] [PATCH 18/23] nilfs2: Convert nilfs_lookup_dirty_data_buffers() " Vishal Moola (Oracle)
2022-09-03 17:38 ` Ryusuke Konishi
2022-09-01 22:01 ` [f2fs-dev] [PATCH 19/23] nilfs2: Convert nilfs_lookup_dirty_node_buffers() " Vishal Moola (Oracle)
2022-09-03 17:37 ` Ryusuke Konishi
2022-09-01 22:01 ` [f2fs-dev] [PATCH 20/23] nilfs2: Convert nilfs_btree_lookup_dirty_buffers() " Vishal Moola (Oracle)
2022-09-03 17:37 ` Ryusuke Konishi
2022-09-01 22:01 ` [f2fs-dev] [PATCH 21/23] nilfs2: Convert nilfs_copy_dirty_pages() " Vishal Moola (Oracle)
2022-09-03 17:37 ` Ryusuke Konishi
2022-09-01 22:01 ` [f2fs-dev] [PATCH 22/23] nilfs2: Convert nilfs_clear_dirty_pages() " Vishal Moola (Oracle)
2022-09-03 17:38 ` Ryusuke Konishi
2022-09-01 22:01 ` [f2fs-dev] [PATCH 23/23] filemap: Remove find_get_pages_range_tag() Vishal Moola (Oracle)
2022-10-14 13:59 ` [f2fs-dev] [PATCH 05/23] afs: Convert afs_writepages_region() to use filemap_get_folios_tag() David Howells
2022-10-18 21:45 ` [f2fs-dev] [PATCH 00/23] Convert to filemap_get_folios_tag() Dave Chinner
2022-11-03 21:59 ` Vishal Moola
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y2R8rRr0ZdrlT32m@magnolia \
--to=djwong@kernel.org \
--cc=ceph-devel@vger.kernel.org \
--cc=cluster-devel@redhat.com \
--cc=david@fromorbit.com \
--cc=linux-afs@lists.infradead.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-cifs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nilfs@vger.kernel.org \
--cc=vishal.moola@gmail.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).