All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/11] ext4: Stop using ext4_writepage() for writeout of ordered data
@ 2022-12-02 18:39 Jan Kara
  2022-12-02 18:39 ` [PATCH v2 1/11] ext4: Handle redirtying in ext4_bio_write_page() Jan Kara
                   ` (12 more replies)
  0 siblings, 13 replies; 20+ messages in thread
From: Jan Kara @ 2022-12-02 18:39 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani, Jan Kara

Hello,

this patch series modifies ext4 so that we stop using ext4_writepage() for
writeout of ordered data during transaction commit (through
generic_writepages() from jbd2_journal_submit_inode_data_buffers()). Instead we
directly call ext4_writepages() from the
ext4_journal_submit_inode_data_buffers(). This is part of Christoph's effort
to get rid of the .writepage() callback in all filesystems.

I have also modified ext4_writepages() to use write_cache_pages() instead of
generic_writepages() so now we don't expose .writepage hook at all. We still
keep ext4_writepage() as a callback for write_cache_pages(). We should refactor
that path as well and get rid of ext4_writepage() completely but that is for a
separate cleanup. Also note that jbd2 still uses generic_writepages() in its
jbd2_journal_submit_inode_data_buffers() helper because it is still used from
OCFS2. Again, something to be dealt with in a separate patchset.

Changes since v1:
* Added Reviewed-by tags from Ritesh
* Added patch to get rid of generic_writepages() in ext4_writepages()
* Added patch to get rid of .writepage hook

								Honza
Previous versions:
Link: http://lore.kernel.org/r/20221130162435.2324-1-jack@suse.cz # v1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/11] ext4: Handle redirtying in ext4_bio_write_page()
  2022-12-02 18:39 [PATCH v2 0/11] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
@ 2022-12-02 18:39 ` Jan Kara
  2022-12-02 18:39 ` [PATCH v2 2/11] ext4: Move keep_towrite handling to ext4_bio_write_page() Jan Kara
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2022-12-02 18:39 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani, Jan Kara

Since we want to transition transaction commits to use ext4_writepages()
for writing back ordered, add handling of page redirtying into
ext4_bio_write_page(). Also move buffer dirty bit clearing into the same
place other buffer state handling.

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/page-io.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 97fa7b4c645f..4e68ace86f11 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -482,6 +482,13 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 			/* A hole? We can safely clear the dirty bit */
 			if (!buffer_mapped(bh))
 				clear_buffer_dirty(bh);
+			/*
+			 * Keeping dirty some buffer we cannot write? Make
+			 * sure to redirty the page. This happens e.g. when
+			 * doing writeout for transaction commit.
+			 */
+			if (buffer_dirty(bh) && !PageDirty(page))
+				redirty_page_for_writepage(wbc, page);
 			if (io->io_bio)
 				ext4_io_submit(io);
 			continue;
@@ -489,6 +496,7 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 		if (buffer_new(bh))
 			clear_buffer_new(bh);
 		set_buffer_async_write(bh);
+		clear_buffer_dirty(bh);
 		nr_to_submit++;
 	} while ((bh = bh->b_this_page) != head);
 
@@ -532,7 +540,10 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 			printk_ratelimited(KERN_ERR "%s: ret = %d\n", __func__, ret);
 			redirty_page_for_writepage(wbc, page);
 			do {
-				clear_buffer_async_write(bh);
+				if (buffer_async_write(bh)) {
+					clear_buffer_async_write(bh);
+					set_buffer_dirty(bh);
+				}
 				bh = bh->b_this_page;
 			} while (bh != head);
 			goto unlock;
@@ -546,7 +557,6 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 		io_submit_add_bh(io, inode,
 				 bounce_page ? bounce_page : page, bh);
 		nr_submitted++;
-		clear_buffer_dirty(bh);
 	} while ((bh = bh->b_this_page) != head);
 
 unlock:
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/11] ext4: Move keep_towrite handling to ext4_bio_write_page()
  2022-12-02 18:39 [PATCH v2 0/11] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
  2022-12-02 18:39 ` [PATCH v2 1/11] ext4: Handle redirtying in ext4_bio_write_page() Jan Kara
@ 2022-12-02 18:39 ` Jan Kara
  2022-12-02 18:39 ` [PATCH v2 3/11] ext4: Remove nr_submitted from ext4_bio_write_page() Jan Kara
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2022-12-02 18:39 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani, Jan Kara

When we are writing back page but we cannot for some reason write all
its buffers (e.g. because we cannot allocate blocks in current context) we
have to keep TOWRITE tag set in the mapping as otherwise racing
WB_SYNC_ALL writeback that could write these buffers can skip the page
and result in data loss. We will need this logic for writeback during
transaction commit so move the logic from ext4_writepage() to
ext4_bio_write_page().

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/ext4.h    |  3 +--
 fs/ext4/inode.c   |  6 ++----
 fs/ext4/page-io.c | 36 +++++++++++++++++++++---------------
 3 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 8d5453852f98..1b3bffc04fd0 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3756,8 +3756,7 @@ extern void ext4_end_io_rsv_work(struct work_struct *work);
 extern void ext4_io_submit(struct ext4_io_submit *io);
 extern int ext4_bio_write_page(struct ext4_io_submit *io,
 			       struct page *page,
-			       int len,
-			       bool keep_towrite);
+			       int len);
 extern struct ext4_io_end_vec *ext4_alloc_io_end_vec(ext4_io_end_t *io_end);
 extern struct ext4_io_end_vec *ext4_last_io_end_vec(ext4_io_end_t *io_end);
 
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2b5ef1b64249..43eb175d0c1c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2009,7 +2009,6 @@ static int ext4_writepage(struct page *page,
 	struct buffer_head *page_bufs = NULL;
 	struct inode *inode = page->mapping->host;
 	struct ext4_io_submit io_submit;
-	bool keep_towrite = false;
 
 	if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) {
 		folio_invalidate(folio, 0, folio_size(folio));
@@ -2067,7 +2066,6 @@ static int ext4_writepage(struct page *page,
 			unlock_page(page);
 			return 0;
 		}
-		keep_towrite = true;
 	}
 
 	if (PageChecked(page) && ext4_should_journal_data(inode))
@@ -2084,7 +2082,7 @@ static int ext4_writepage(struct page *page,
 		unlock_page(page);
 		return -ENOMEM;
 	}
-	ret = ext4_bio_write_page(&io_submit, page, len, keep_towrite);
+	ret = ext4_bio_write_page(&io_submit, page, len);
 	ext4_io_submit(&io_submit);
 	/* Drop io_end reference we got from init */
 	ext4_put_io_end_defer(io_submit.io_end);
@@ -2118,7 +2116,7 @@ static int mpage_submit_page(struct mpage_da_data *mpd, struct page *page)
 		len = size & ~PAGE_MASK;
 	else
 		len = PAGE_SIZE;
-	err = ext4_bio_write_page(&mpd->io_submit, page, len, false);
+	err = ext4_bio_write_page(&mpd->io_submit, page, len);
 	if (!err)
 		mpd->wbc->nr_to_write--;
 	mpd->first_page++;
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 4e68ace86f11..4f9ecacd10aa 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -430,8 +430,7 @@ static void io_submit_add_bh(struct ext4_io_submit *io,
 
 int ext4_bio_write_page(struct ext4_io_submit *io,
 			struct page *page,
-			int len,
-			bool keep_towrite)
+			int len)
 {
 	struct page *bounce_page = NULL;
 	struct inode *inode = page->mapping->host;
@@ -441,14 +440,11 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 	int nr_submitted = 0;
 	int nr_to_submit = 0;
 	struct writeback_control *wbc = io->io_wbc;
+	bool keep_towrite = false;
 
 	BUG_ON(!PageLocked(page));
 	BUG_ON(PageWriteback(page));
 
-	if (keep_towrite)
-		set_page_writeback_keepwrite(page);
-	else
-		set_page_writeback(page);
 	ClearPageError(page);
 
 	/*
@@ -483,12 +479,17 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 			if (!buffer_mapped(bh))
 				clear_buffer_dirty(bh);
 			/*
-			 * Keeping dirty some buffer we cannot write? Make
-			 * sure to redirty the page. This happens e.g. when
-			 * doing writeout for transaction commit.
+			 * Keeping dirty some buffer we cannot write? Make sure
+			 * to redirty the page and keep TOWRITE tag so that
+			 * racing WB_SYNC_ALL writeback does not skip the page.
+			 * This happens e.g. when doing writeout for
+			 * transaction commit.
 			 */
-			if (buffer_dirty(bh) && !PageDirty(page))
-				redirty_page_for_writepage(wbc, page);
+			if (buffer_dirty(bh)) {
+				if (!PageDirty(page))
+					redirty_page_for_writepage(wbc, page);
+				keep_towrite = true;
+			}
 			if (io->io_bio)
 				ext4_io_submit(io);
 			continue;
@@ -500,6 +501,10 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 		nr_to_submit++;
 	} while ((bh = bh->b_this_page) != head);
 
+	/* Nothing to submit? Just unlock the page... */
+	if (!nr_to_submit)
+		goto unlock;
+
 	bh = head = page_buffers(page);
 
 	/*
@@ -550,6 +555,11 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 		}
 	}
 
+	if (keep_towrite)
+		set_page_writeback_keepwrite(page);
+	else
+		set_page_writeback(page);
+
 	/* Now submit buffers to write */
 	do {
 		if (!buffer_async_write(bh))
@@ -558,11 +568,7 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 				 bounce_page ? bounce_page : page, bh);
 		nr_submitted++;
 	} while ((bh = bh->b_this_page) != head);
-
 unlock:
 	unlock_page(page);
-	/* Nothing submitted - we have to end page writeback */
-	if (!nr_submitted)
-		end_page_writeback(page);
 	return ret;
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/11] ext4: Remove nr_submitted from ext4_bio_write_page()
  2022-12-02 18:39 [PATCH v2 0/11] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
  2022-12-02 18:39 ` [PATCH v2 1/11] ext4: Handle redirtying in ext4_bio_write_page() Jan Kara
  2022-12-02 18:39 ` [PATCH v2 2/11] ext4: Move keep_towrite handling to ext4_bio_write_page() Jan Kara
@ 2022-12-02 18:39 ` Jan Kara
  2022-12-02 18:39 ` [PATCH v2 4/11] ext4: Drop pointless IO submission " Jan Kara
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2022-12-02 18:39 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani, Jan Kara

nr_submitted is the same as nr_to_submit. Drop one of them.

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/page-io.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 4f9ecacd10aa..2bdfb8a046d9 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -437,7 +437,6 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 	unsigned block_start;
 	struct buffer_head *bh, *head;
 	int ret = 0;
-	int nr_submitted = 0;
 	int nr_to_submit = 0;
 	struct writeback_control *wbc = io->io_wbc;
 	bool keep_towrite = false;
@@ -566,7 +565,6 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 			continue;
 		io_submit_add_bh(io, inode,
 				 bounce_page ? bounce_page : page, bh);
-		nr_submitted++;
 	} while ((bh = bh->b_this_page) != head);
 unlock:
 	unlock_page(page);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 4/11] ext4: Drop pointless IO submission from ext4_bio_write_page()
  2022-12-02 18:39 [PATCH v2 0/11] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
                   ` (2 preceding siblings ...)
  2022-12-02 18:39 ` [PATCH v2 3/11] ext4: Remove nr_submitted from ext4_bio_write_page() Jan Kara
@ 2022-12-02 18:39 ` Jan Kara
  2022-12-02 18:39 ` [PATCH v2 5/11] ext4: Add support for writepages calls that cannot map blocks Jan Kara
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2022-12-02 18:39 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani, Jan Kara

We submit outstanding IO in ext4_bio_write_page() if we find a buffer we
are not going to write. This is however pointless because we already
handle submission of previous IO in case we detect newly added buffer
head is discontiguous. So just delete the pointless IO submission call.

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/page-io.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 2bdfb8a046d9..beaec6d81074 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -489,8 +489,6 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 					redirty_page_for_writepage(wbc, page);
 				keep_towrite = true;
 			}
-			if (io->io_bio)
-				ext4_io_submit(io);
 			continue;
 		}
 		if (buffer_new(bh))
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 5/11] ext4: Add support for writepages calls that cannot map blocks
  2022-12-02 18:39 [PATCH v2 0/11] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
                   ` (3 preceding siblings ...)
  2022-12-02 18:39 ` [PATCH v2 4/11] ext4: Drop pointless IO submission " Jan Kara
@ 2022-12-02 18:39 ` Jan Kara
  2022-12-02 18:39 ` [PATCH v2 6/11] ext4: Provide ext4_do_writepages() Jan Kara
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2022-12-02 18:39 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani, Jan Kara

Add support for calls to ext4_writepages() than cannot map blocks. These
will be issued from jbd2 transaction commit code.

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 61 +++++++++++++++++++++++++++++++++++++------------
 1 file changed, 47 insertions(+), 14 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 43eb175d0c1c..91cf9c0f2a7e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1557,6 +1557,7 @@ struct mpage_da_data {
 	struct ext4_map_blocks map;
 	struct ext4_io_submit io_submit;	/* IO submission data */
 	unsigned int do_map:1;
+	unsigned int can_map:1;	/* Can writepages call map blocks? */
 	unsigned int scanned_until_end:1;
 };
 
@@ -2549,18 +2550,33 @@ static int ext4_da_writepages_trans_blocks(struct inode *inode)
 				MAX_WRITEPAGES_EXTENT_LEN + bpp - 1, bpp);
 }
 
+/* Return true if the page needs to be written as part of transaction commit */
+static bool ext4_page_nomap_can_writeout(struct page *page)
+{
+	struct buffer_head *bh, *head;
+
+	bh = head = page_buffers(page);
+	do {
+		if (buffer_dirty(bh) && buffer_mapped(bh) && !buffer_delay(bh))
+			return true;
+	} while ((bh = bh->b_this_page) != head);
+	return false;
+}
+
 /*
  * mpage_prepare_extent_to_map - find & lock contiguous range of dirty pages
- * 				 and underlying extent to map
+ * 				 needing mapping, submit mapped pages
  *
  * @mpd - where to look for pages
  *
  * Walk dirty pages in the mapping. If they are fully mapped, submit them for
- * IO immediately. When we find a page which isn't mapped we start accumulating
- * extent of buffers underlying these pages that needs mapping (formed by
- * either delayed or unwritten buffers). We also lock the pages containing
- * these buffers. The extent found is returned in @mpd structure (starting at
- * mpd->lblk with length mpd->len blocks).
+ * IO immediately. If we cannot map blocks, we submit just already mapped
+ * buffers in the page for IO and keep page dirty. When we can map blocks and
+ * we find a page which isn't mapped we start accumulating extent of buffers
+ * underlying these pages that needs mapping (formed by either delayed or
+ * unwritten buffers). We also lock the pages containing these buffers. The
+ * extent found is returned in @mpd structure (starting at mpd->lblk with
+ * length mpd->len blocks).
  *
  * Note that this function can attach bios to one io_end structure which are
  * neither logically nor physically contiguous. Although it may seem as an
@@ -2651,14 +2667,30 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 			if (mpd->map.m_len == 0)
 				mpd->first_page = page->index;
 			mpd->next_page = page->index + 1;
-			/* Add all dirty buffers to mpd */
-			lblk = ((ext4_lblk_t)page->index) <<
-				(PAGE_SHIFT - blkbits);
-			head = page_buffers(page);
-			err = mpage_process_page_bufs(mpd, head, head, lblk);
-			if (err <= 0)
-				goto out;
-			err = 0;
+			/*
+			 * Writeout for transaction commit where we cannot
+			 * modify metadata is simple. Just submit the page.
+			 */
+			if (!mpd->can_map) {
+				if (ext4_page_nomap_can_writeout(page)) {
+					err = mpage_submit_page(mpd, page);
+					if (err < 0)
+						goto out;
+				} else {
+					unlock_page(page);
+					mpd->first_page++;
+				}
+			} else {
+				/* Add all dirty buffers to mpd */
+				lblk = ((ext4_lblk_t)page->index) <<
+					(PAGE_SHIFT - blkbits);
+				head = page_buffers(page);
+				err = mpage_process_page_bufs(mpd, head, head,
+							      lblk);
+				if (err <= 0)
+					goto out;
+				err = 0;
+			}
 			left--;
 		}
 		pagevec_release(&pvec);
@@ -2778,6 +2810,7 @@ static int ext4_writepages(struct address_space *mapping,
 	 */
 	mpd.do_map = 0;
 	mpd.scanned_until_end = 0;
+	mpd.can_map = 1;
 	mpd.io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
 	if (!mpd.io_submit.io_end) {
 		ret = -ENOMEM;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 6/11] ext4: Provide ext4_do_writepages()
  2022-12-02 18:39 [PATCH v2 0/11] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
                   ` (4 preceding siblings ...)
  2022-12-02 18:39 ` [PATCH v2 5/11] ext4: Add support for writepages calls that cannot map blocks Jan Kara
@ 2022-12-02 18:39 ` Jan Kara
  2022-12-02 18:39 ` [PATCH v2 7/11] ext4: Move percpu_rwsem protection into ext4_writepages() Jan Kara
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2022-12-02 18:39 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani, Jan Kara

Provide ext4_do_writepages() function that takes mpage_da_data as an
argument and make ext4_writepages() just a simple wrapper around it. No
functional changes.

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 96 +++++++++++++++++++++++++++----------------------
 1 file changed, 54 insertions(+), 42 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 91cf9c0f2a7e..fa145c6b2630 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1543,9 +1543,12 @@ void ext4_da_release_space(struct inode *inode, int to_free)
  */
 
 struct mpage_da_data {
+	/* These are input fields for ext4_do_writepages() */
 	struct inode *inode;
 	struct writeback_control *wbc;
+	unsigned int can_map:1;	/* Can writepages call map blocks? */
 
+	/* These are internal state of ext4_do_writepages() */
 	pgoff_t first_page;	/* The first page to write */
 	pgoff_t next_page;	/* Current page to examine */
 	pgoff_t last_page;	/* Last page to examine */
@@ -1557,7 +1560,6 @@ struct mpage_da_data {
 	struct ext4_map_blocks map;
 	struct ext4_io_submit io_submit;	/* IO submission data */
 	unsigned int do_map:1;
-	unsigned int can_map:1;	/* Can writepages call map blocks? */
 	unsigned int scanned_until_end:1;
 };
 
@@ -2703,16 +2705,16 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 	return err;
 }
 
-static int ext4_writepages(struct address_space *mapping,
-			   struct writeback_control *wbc)
+static int ext4_do_writepages(struct mpage_da_data *mpd)
 {
+	struct writeback_control *wbc = mpd->wbc;
 	pgoff_t	writeback_index = 0;
 	long nr_to_write = wbc->nr_to_write;
 	int range_whole = 0;
 	int cycled = 1;
 	handle_t *handle = NULL;
-	struct mpage_da_data mpd;
-	struct inode *inode = mapping->host;
+	struct inode *inode = mpd->inode;
+	struct address_space *mapping = inode->i_mapping;
 	int needed_blocks, rsv_blocks = 0, ret = 0;
 	struct ext4_sb_info *sbi = EXT4_SB(mapping->host->i_sb);
 	struct blk_plug plug;
@@ -2787,19 +2789,18 @@ static int ext4_writepages(struct address_space *mapping,
 		writeback_index = mapping->writeback_index;
 		if (writeback_index)
 			cycled = 0;
-		mpd.first_page = writeback_index;
-		mpd.last_page = -1;
+		mpd->first_page = writeback_index;
+		mpd->last_page = -1;
 	} else {
-		mpd.first_page = wbc->range_start >> PAGE_SHIFT;
-		mpd.last_page = wbc->range_end >> PAGE_SHIFT;
+		mpd->first_page = wbc->range_start >> PAGE_SHIFT;
+		mpd->last_page = wbc->range_end >> PAGE_SHIFT;
 	}
 
-	mpd.inode = inode;
-	mpd.wbc = wbc;
-	ext4_io_submit_init(&mpd.io_submit, wbc);
+	ext4_io_submit_init(&mpd->io_submit, wbc);
 retry:
 	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
-		tag_pages_for_writeback(mapping, mpd.first_page, mpd.last_page);
+		tag_pages_for_writeback(mapping, mpd->first_page,
+					mpd->last_page);
 	blk_start_plug(&plug);
 
 	/*
@@ -2808,28 +2809,27 @@ static int ext4_writepages(struct address_space *mapping,
 	 * in the block layer on device congestion while having transaction
 	 * started.
 	 */
-	mpd.do_map = 0;
-	mpd.scanned_until_end = 0;
-	mpd.can_map = 1;
-	mpd.io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
-	if (!mpd.io_submit.io_end) {
+	mpd->do_map = 0;
+	mpd->scanned_until_end = 0;
+	mpd->io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
+	if (!mpd->io_submit.io_end) {
 		ret = -ENOMEM;
 		goto unplug;
 	}
-	ret = mpage_prepare_extent_to_map(&mpd);
+	ret = mpage_prepare_extent_to_map(mpd);
 	/* Unlock pages we didn't use */
-	mpage_release_unused_pages(&mpd, false);
+	mpage_release_unused_pages(mpd, false);
 	/* Submit prepared bio */
-	ext4_io_submit(&mpd.io_submit);
-	ext4_put_io_end_defer(mpd.io_submit.io_end);
-	mpd.io_submit.io_end = NULL;
+	ext4_io_submit(&mpd->io_submit);
+	ext4_put_io_end_defer(mpd->io_submit.io_end);
+	mpd->io_submit.io_end = NULL;
 	if (ret < 0)
 		goto unplug;
 
-	while (!mpd.scanned_until_end && wbc->nr_to_write > 0) {
+	while (!mpd->scanned_until_end && wbc->nr_to_write > 0) {
 		/* For each extent of pages we use new io_end */
-		mpd.io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
-		if (!mpd.io_submit.io_end) {
+		mpd->io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
+		if (!mpd->io_submit.io_end) {
 			ret = -ENOMEM;
 			break;
 		}
@@ -2853,16 +2853,16 @@ static int ext4_writepages(struct address_space *mapping,
 			       "%ld pages, ino %lu; err %d", __func__,
 				wbc->nr_to_write, inode->i_ino, ret);
 			/* Release allocated io_end */
-			ext4_put_io_end(mpd.io_submit.io_end);
-			mpd.io_submit.io_end = NULL;
+			ext4_put_io_end(mpd->io_submit.io_end);
+			mpd->io_submit.io_end = NULL;
 			break;
 		}
-		mpd.do_map = 1;
+		mpd->do_map = 1;
 
-		trace_ext4_da_write_pages(inode, mpd.first_page, mpd.wbc);
-		ret = mpage_prepare_extent_to_map(&mpd);
-		if (!ret && mpd.map.m_len)
-			ret = mpage_map_and_submit_extent(handle, &mpd,
+		trace_ext4_da_write_pages(inode, mpd->first_page, wbc);
+		ret = mpage_prepare_extent_to_map(mpd);
+		if (!ret && mpd->map.m_len)
+			ret = mpage_map_and_submit_extent(handle, mpd,
 					&give_up_on_write);
 		/*
 		 * Caution: If the handle is synchronous,
@@ -2877,12 +2877,12 @@ static int ext4_writepages(struct address_space *mapping,
 		if (!ext4_handle_valid(handle) || handle->h_sync == 0) {
 			ext4_journal_stop(handle);
 			handle = NULL;
-			mpd.do_map = 0;
+			mpd->do_map = 0;
 		}
 		/* Unlock pages we didn't use */
-		mpage_release_unused_pages(&mpd, give_up_on_write);
+		mpage_release_unused_pages(mpd, give_up_on_write);
 		/* Submit prepared bio */
-		ext4_io_submit(&mpd.io_submit);
+		ext4_io_submit(&mpd->io_submit);
 
 		/*
 		 * Drop our io_end reference we got from init. We have
@@ -2892,11 +2892,11 @@ static int ext4_writepages(struct address_space *mapping,
 		 * up doing unwritten extent conversion.
 		 */
 		if (handle) {
-			ext4_put_io_end_defer(mpd.io_submit.io_end);
+			ext4_put_io_end_defer(mpd->io_submit.io_end);
 			ext4_journal_stop(handle);
 		} else
-			ext4_put_io_end(mpd.io_submit.io_end);
-		mpd.io_submit.io_end = NULL;
+			ext4_put_io_end(mpd->io_submit.io_end);
+		mpd->io_submit.io_end = NULL;
 
 		if (ret == -ENOSPC && sbi->s_journal) {
 			/*
@@ -2916,8 +2916,8 @@ static int ext4_writepages(struct address_space *mapping,
 	blk_finish_plug(&plug);
 	if (!ret && !cycled && wbc->nr_to_write > 0) {
 		cycled = 1;
-		mpd.last_page = writeback_index - 1;
-		mpd.first_page = 0;
+		mpd->last_page = writeback_index - 1;
+		mpd->first_page = 0;
 		goto retry;
 	}
 
@@ -2927,7 +2927,7 @@ static int ext4_writepages(struct address_space *mapping,
 		 * Set the writeback_index so that range_cyclic
 		 * mode will write it back later
 		 */
-		mapping->writeback_index = mpd.first_page;
+		mapping->writeback_index = mpd->first_page;
 
 out_writepages:
 	trace_ext4_writepages_result(inode, wbc, ret,
@@ -2936,6 +2936,18 @@ static int ext4_writepages(struct address_space *mapping,
 	return ret;
 }
 
+static int ext4_writepages(struct address_space *mapping,
+			   struct writeback_control *wbc)
+{
+	struct mpage_da_data mpd = {
+		.inode = mapping->host,
+		.wbc = wbc,
+		.can_map = 1,
+	};
+
+	return ext4_do_writepages(&mpd);
+}
+
 static int ext4_dax_writepages(struct address_space *mapping,
 			       struct writeback_control *wbc)
 {
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 7/11] ext4: Move percpu_rwsem protection into ext4_writepages()
  2022-12-02 18:39 [PATCH v2 0/11] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
                   ` (5 preceding siblings ...)
  2022-12-02 18:39 ` [PATCH v2 6/11] ext4: Provide ext4_do_writepages() Jan Kara
@ 2022-12-02 18:39 ` Jan Kara
  2022-12-02 18:39 ` [PATCH v2 8/11] ext4: Switch to using ext4_do_writepages() for ordered data writeout Jan Kara
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2022-12-02 18:39 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani, Jan Kara

Move protection by percpu_rwsem from ext4_do_writepages() to
ext4_writepages(). We will not want to grab this protection during
transaction commits as that would be prone to deadlocks and the
protection is not needed. Move the shutdown state checking as well since
we want to be able to complete commit while the shutdown is in progress.

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index fa145c6b2630..aeef57d907c6 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2720,10 +2720,6 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
 	struct blk_plug plug;
 	bool give_up_on_write = false;
 
-	if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
-		return -EIO;
-
-	percpu_down_read(&sbi->s_writepages_rwsem);
 	trace_ext4_writepages(inode, wbc);
 
 	/*
@@ -2932,20 +2928,28 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
 out_writepages:
 	trace_ext4_writepages_result(inode, wbc, ret,
 				     nr_to_write - wbc->nr_to_write);
-	percpu_up_read(&sbi->s_writepages_rwsem);
 	return ret;
 }
 
 static int ext4_writepages(struct address_space *mapping,
 			   struct writeback_control *wbc)
 {
+	struct super_block *sb = mapping->host->i_sb;
 	struct mpage_da_data mpd = {
 		.inode = mapping->host,
 		.wbc = wbc,
 		.can_map = 1,
 	};
+	int ret;
+
+	if (unlikely(ext4_forced_shutdown(EXT4_SB(sb))))
+		return -EIO;
 
-	return ext4_do_writepages(&mpd);
+	percpu_down_read(&EXT4_SB(sb)->s_writepages_rwsem);
+	ret = ext4_do_writepages(&mpd);
+	percpu_up_read(&EXT4_SB(sb)->s_writepages_rwsem);
+
+	return ret;
 }
 
 static int ext4_dax_writepages(struct address_space *mapping,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 8/11] ext4: Switch to using ext4_do_writepages() for ordered data writeout
  2022-12-02 18:39 [PATCH v2 0/11] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
                   ` (6 preceding siblings ...)
  2022-12-02 18:39 ` [PATCH v2 7/11] ext4: Move percpu_rwsem protection into ext4_writepages() Jan Kara
@ 2022-12-02 18:39 ` Jan Kara
  2022-12-02 18:39 ` [PATCH v2 9/11] ext4: Switch to using write_cache_pages() for data=journal writeout Jan Kara
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2022-12-02 18:39 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani, Jan Kara

Use the standard writepages method (ext4_do_writepages()) to perform
writeout of ordered data during journal commit.

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/ext4.h  |  1 +
 fs/ext4/inode.c | 16 ++++++++++++++++
 fs/ext4/super.c |  3 +--
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 1b3bffc04fd0..07b55cc48578 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2999,6 +2999,7 @@ extern void ext4_set_inode_flags(struct inode *, bool init);
 extern int ext4_alloc_da_blocks(struct inode *inode);
 extern void ext4_set_aops(struct inode *inode);
 extern int ext4_writepage_trans_blocks(struct inode *);
+extern int ext4_normal_submit_inode_data_buffers(struct jbd2_inode *jinode);
 extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks);
 extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
 			     loff_t lstart, loff_t lend);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index aeef57d907c6..88772e1ddd3b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2952,6 +2952,22 @@ static int ext4_writepages(struct address_space *mapping,
 	return ret;
 }
 
+int ext4_normal_submit_inode_data_buffers(struct jbd2_inode *jinode)
+{
+	struct writeback_control wbc = {
+		.sync_mode = WB_SYNC_ALL,
+		.nr_to_write = LONG_MAX,
+		.range_start = jinode->i_dirty_start,
+		.range_end = jinode->i_dirty_end,
+	};
+	struct mpage_da_data mpd = {
+		.inode = jinode->i_vfs_inode,
+		.wbc = &wbc,
+		.can_map = 0,
+	};
+	return ext4_do_writepages(&mpd);
+}
+
 static int ext4_dax_writepages(struct address_space *mapping,
 			       struct writeback_control *wbc)
 {
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 7cdd2138c897..c02329dd7574 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -540,8 +540,7 @@ static int ext4_journal_submit_inode_data_buffers(struct jbd2_inode *jinode)
 	if (ext4_should_journal_data(jinode->i_vfs_inode))
 		ret = ext4_journalled_submit_inode_data_buffers(jinode);
 	else
-		ret = jbd2_journal_submit_inode_data_buffers(jinode);
-
+		ret = ext4_normal_submit_inode_data_buffers(jinode);
 	return ret;
 }
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 9/11] ext4: Switch to using write_cache_pages() for data=journal writeout
  2022-12-02 18:39 [PATCH v2 0/11] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
                   ` (7 preceding siblings ...)
  2022-12-02 18:39 ` [PATCH v2 8/11] ext4: Switch to using ext4_do_writepages() for ordered data writeout Jan Kara
@ 2022-12-02 18:39 ` Jan Kara
  2022-12-04  6:58   ` Christoph Hellwig
  2022-12-02 18:39 ` [PATCH v2 0/11] ext4: Stop providing .writepage hook Jan Kara
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2022-12-02 18:39 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani, Jan Kara

Instead of using generic_writepages(), let's use write_cache_pages() for
writeout of journalled data. It will allow us to stop providing
.writepage callback. Our data=journal writeback path would benefit from
a larger cleanup and refactoring but that's for a separate cleanup
series.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 88772e1ddd3b..d14c6d44bdf1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2705,6 +2705,12 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 	return err;
 }
 
+static int __writepage(struct page *page, struct writeback_control *wbc,
+		       void *data)
+{
+	return ext4_writepage(page, wbc);
+}
+
 static int ext4_do_writepages(struct mpage_da_data *mpd)
 {
 	struct writeback_control *wbc = mpd->wbc;
@@ -2731,7 +2737,9 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
 		goto out_writepages;
 
 	if (ext4_should_journal_data(inode)) {
-		ret = generic_writepages(mapping, wbc);
+		blk_start_plug(&plug);
+		ret = write_cache_pages(mapping, wbc, __writepage, mapping);
+		blk_finish_plug(&plug);
 		goto out_writepages;
 	}
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 0/11] ext4: Stop providing .writepage hook
  2022-12-02 18:39 [PATCH v2 0/11] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
                   ` (8 preceding siblings ...)
  2022-12-02 18:39 ` [PATCH v2 9/11] ext4: Switch to using write_cache_pages() for data=journal writeout Jan Kara
@ 2022-12-02 18:39 ` Jan Kara
  2022-12-04  6:59   ` Christoph Hellwig
  2022-12-02 18:39 ` [PATCH v2 1/11] ext4: Remove ordered data support from ext4_writepage() Jan Kara
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2022-12-02 18:39 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani, Jan Kara

Now we don't need .writepage hook for anything anymore. Reclaim is fine
with relying on .writepages to clean pages and we often couldn't do much
from the .writepage callback anyway. We only need to provide
.migrate_folio callback for the ext4_journalled_aops - let's use
buffer_migrate_page_norefs() there so that buffers cannot be modified
under jdb2's hands.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d14c6d44bdf1..1c9dec0d5109 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3717,7 +3717,6 @@ static int ext4_iomap_swap_activate(struct swap_info_struct *sis,
 static const struct address_space_operations ext4_aops = {
 	.read_folio		= ext4_read_folio,
 	.readahead		= ext4_readahead,
-	.writepage		= ext4_writepage,
 	.writepages		= ext4_writepages,
 	.write_begin		= ext4_write_begin,
 	.write_end		= ext4_write_end,
@@ -3735,7 +3734,6 @@ static const struct address_space_operations ext4_aops = {
 static const struct address_space_operations ext4_journalled_aops = {
 	.read_folio		= ext4_read_folio,
 	.readahead		= ext4_readahead,
-	.writepage		= ext4_writepage,
 	.writepages		= ext4_writepages,
 	.write_begin		= ext4_write_begin,
 	.write_end		= ext4_journalled_write_end,
@@ -3744,6 +3742,7 @@ static const struct address_space_operations ext4_journalled_aops = {
 	.invalidate_folio	= ext4_journalled_invalidate_folio,
 	.release_folio		= ext4_release_folio,
 	.direct_IO		= noop_direct_IO,
+	.migrate_folio		= buffer_migrate_folio_norefs,
 	.is_partially_uptodate  = block_is_partially_uptodate,
 	.error_remove_page	= generic_error_remove_page,
 	.swap_activate		= ext4_iomap_swap_activate,
@@ -3752,7 +3751,6 @@ static const struct address_space_operations ext4_journalled_aops = {
 static const struct address_space_operations ext4_da_aops = {
 	.read_folio		= ext4_read_folio,
 	.readahead		= ext4_readahead,
-	.writepage		= ext4_writepage,
 	.writepages		= ext4_writepages,
 	.write_begin		= ext4_da_write_begin,
 	.write_end		= ext4_da_write_end,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 1/11] ext4: Remove ordered data support from ext4_writepage()
  2022-12-02 18:39 [PATCH v2 0/11] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
                   ` (9 preceding siblings ...)
  2022-12-02 18:39 ` [PATCH v2 0/11] ext4: Stop providing .writepage hook Jan Kara
@ 2022-12-02 18:39 ` Jan Kara
  2022-12-04  7:06   ` Christoph Hellwig
  2022-12-03  0:52 ` [PATCH v2 0/11] ext4: Stop using ext4_writepage() for writeout of ordered data Ritesh Harjani
  2022-12-04  6:56 ` Christoph Hellwig
  12 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2022-12-02 18:39 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani, Jan Kara

ext4_writepage() should not be called for ordered data anymore. Remove
support for it from the function.

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 128 ++++++------------------------------------------
 1 file changed, 16 insertions(+), 112 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1c9dec0d5109..1e125538ceec 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1642,12 +1642,6 @@ static void ext4_print_free_blocks(struct inode *inode)
 	return;
 }
 
-static int ext4_bh_delay_or_unwritten(handle_t *handle, struct inode *inode,
-				      struct buffer_head *bh)
-{
-	return (buffer_delay(bh) || buffer_unwritten(bh)) && buffer_dirty(bh);
-}
-
 /*
  * ext4_insert_delayed_block - adds a delayed block to the extents status
  *                             tree, incrementing the reserved cluster/block
@@ -1962,56 +1956,17 @@ static int __ext4_journalled_writepage(struct page *page,
 }
 
 /*
- * Note that we don't need to start a transaction unless we're journaling data
- * because we should have holes filled from ext4_page_mkwrite(). We even don't
- * need to file the inode to the transaction's list in ordered mode because if
- * we are writing back data added by write(), the inode is already there and if
- * we are writing back data modified via mmap(), no one guarantees in which
- * transaction the data will hit the disk. In case we are journaling data, we
- * cannot start transaction directly because transaction start ranks above page
- * lock so we have to do some magic.
- *
- * This function can get called via...
- *   - ext4_writepages after taking page lock (have journal handle)
- *   - journal_submit_inode_data_buffers (no journal handle)
- *   - shrink_page_list via the kswapd/direct reclaim (no journal handle)
- *   - grab_page_cache when doing write_begin (have journal handle)
- *
- * We don't do any block allocation in this function. If we have page with
- * multiple blocks we need to write those buffer_heads that are mapped. This
- * is important for mmaped based write. So if we do with blocksize 1K
- * truncate(f, 1024);
- * a = mmap(f, 0, 4096);
- * a[0] = 'a';
- * truncate(f, 4096);
- * we have in the page first buffer_head mapped via page_mkwrite call back
- * but other buffer_heads would be unmapped but dirty (dirty done via the
- * do_wp_page). So writepage should write the first block. If we modify
- * the mmap area beyond 1024 we will again get a page_fault and the
- * page_mkwrite callback will do the block allocation and mark the
- * buffer_heads mapped.
- *
- * We redirty the page if we have any buffer_heads that is either delay or
- * unwritten in the page.
- *
- * We can get recursively called as show below.
- *
- *	ext4_writepage() -> kmalloc() -> __alloc_pages() -> page_launder() ->
- *		ext4_writepage()
- *
- * But since we don't do any block allocation we should not deadlock.
- * Page also have the dirty flag cleared so we don't get recurive page_lock.
+ * This function is now used only when journaling data. We cannot start
+ * transaction directly because transaction start ranks above page lock so we
+ * have to do some magic.
  */
-static int ext4_writepage(struct page *page,
-			  struct writeback_control *wbc)
+static int ext4_writepage(struct page *page, struct writeback_control *wbc,
+			  void *data)
 {
 	struct folio *folio = page_folio(page);
-	int ret = 0;
 	loff_t size;
 	unsigned int len;
-	struct buffer_head *page_bufs = NULL;
 	struct inode *inode = page->mapping->host;
-	struct ext4_io_submit io_submit;
 
 	if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) {
 		folio_invalidate(folio, 0, folio_size(folio));
@@ -2036,60 +1991,16 @@ static int ext4_writepage(struct page *page,
 		return 0;
 	}
 
-	page_bufs = page_buffers(page);
-	/*
-	 * We cannot do block allocation or other extent handling in this
-	 * function. If there are buffers needing that, we have to redirty
-	 * the page. But we may reach here when we do a journal commit via
-	 * journal_submit_inode_data_buffers() and in that case we must write
-	 * allocated buffers to achieve data=ordered mode guarantees.
-	 *
-	 * Also, if there is only one buffer per page (the fs block
-	 * size == the page size), if one buffer needs block
-	 * allocation or needs to modify the extent tree to clear the
-	 * unwritten flag, we know that the page can't be written at
-	 * all, so we might as well refuse the write immediately.
-	 * Unfortunately if the block size != page size, we can't as
-	 * easily detect this case using ext4_walk_page_buffers(), but
-	 * for the extremely common case, this is an optimization that
-	 * skips a useless round trip through ext4_bio_write_page().
-	 */
-	if (ext4_walk_page_buffers(NULL, inode, page_bufs, 0, len, NULL,
-				   ext4_bh_delay_or_unwritten)) {
-		redirty_page_for_writepage(wbc, page);
-		if ((current->flags & PF_MEMALLOC) ||
-		    (inode->i_sb->s_blocksize == PAGE_SIZE)) {
-			/*
-			 * For memory cleaning there's no point in writing only
-			 * some buffers. So just bail out. Warn if we came here
-			 * from direct reclaim.
-			 */
-			WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD))
-							== PF_MEMALLOC);
-			unlock_page(page);
-			return 0;
-		}
-	}
-
-	if (PageChecked(page) && ext4_should_journal_data(inode))
-		/*
-		 * It's mmapped pagecache.  Add buffers and journal it.  There
-		 * doesn't seem much point in redirtying the page here.
-		 */
-		return __ext4_journalled_writepage(page, len);
-
-	ext4_io_submit_init(&io_submit, wbc);
-	io_submit.io_end = ext4_init_io_end(inode, GFP_NOFS);
-	if (!io_submit.io_end) {
-		redirty_page_for_writepage(wbc, page);
+	WARN_ON_ONCE(!ext4_should_journal_data(inode));
+	if (!PageChecked(page)) {
 		unlock_page(page);
-		return -ENOMEM;
+		return 0;
 	}
-	ret = ext4_bio_write_page(&io_submit, page, len);
-	ext4_io_submit(&io_submit);
-	/* Drop io_end reference we got from init */
-	ext4_put_io_end_defer(io_submit.io_end);
-	return ret;
+	/*
+	 * It's mmapped pagecache.  Add buffers and journal it.  There
+	 * doesn't seem much point in redirtying the page here.
+	 */
+	return __ext4_journalled_writepage(page, len);
 }
 
 static int mpage_submit_page(struct mpage_da_data *mpd, struct page *page)
@@ -2705,12 +2616,6 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 	return err;
 }
 
-static int __writepage(struct page *page, struct writeback_control *wbc,
-		       void *data)
-{
-	return ext4_writepage(page, wbc);
-}
-
 static int ext4_do_writepages(struct mpage_da_data *mpd)
 {
 	struct writeback_control *wbc = mpd->wbc;
@@ -2738,7 +2643,7 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
 
 	if (ext4_should_journal_data(inode)) {
 		blk_start_plug(&plug);
-		ret = write_cache_pages(mapping, wbc, __writepage, mapping);
+		ret = write_cache_pages(mapping, wbc, ext4_writepage, mapping);
 		blk_finish_plug(&plug);
 		goto out_writepages;
 	}
@@ -3152,9 +3057,8 @@ static int ext4_da_write_end(struct file *file,
 	 * i_disksize since writeback will push i_disksize upto i_size
 	 * eventually. If the end of the current write is > i_size and
 	 * inside an allocated block (ext4_da_should_update_i_disksize()
-	 * check), we need to update i_disksize here as neither
-	 * ext4_writepage() nor certain ext4_writepages() paths not
-	 * allocating blocks update i_disksize.
+	 * check), we need to update i_disksize here as ext4_writepages() need
+	 * not do it in this case.
 	 *
 	 * Note that we defer inode dirtying to generic_write_end() /
 	 * ext4_da_write_inline_data_end().
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/11] ext4: Stop using ext4_writepage() for writeout of ordered data
  2022-12-02 18:39 [PATCH v2 0/11] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
                   ` (10 preceding siblings ...)
  2022-12-02 18:39 ` [PATCH v2 1/11] ext4: Remove ordered data support from ext4_writepage() Jan Kara
@ 2022-12-03  0:52 ` Ritesh Harjani
  2022-12-05  9:13   ` Jan Kara
  2022-12-04  6:56 ` Christoph Hellwig
  12 siblings, 1 reply; 20+ messages in thread
From: Ritesh Harjani @ 2022-12-03  0:52 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ted Tso, linux-ext4, Christoph Hellwig

On 22/12/02 07:39PM, Jan Kara wrote:
> Hello,
>
> this patch series modifies ext4 so that we stop using ext4_writepage() for
> writeout of ordered data during transaction commit (through
> generic_writepages() from jbd2_journal_submit_inode_data_buffers()). Instead we
> directly call ext4_writepages() from the
> ext4_journal_submit_inode_data_buffers(). This is part of Christoph's effort
> to get rid of the .writepage() callback in all filesystems.
>
> I have also modified ext4_writepages() to use write_cache_pages() instead of
> generic_writepages() so now we don't expose .writepage hook at all. We still
> keep ext4_writepage() as a callback for write_cache_pages(). We should refactor
> that path as well and get rid of ext4_writepage() completely but that is for a
> separate cleanup. Also note that jbd2 still uses generic_writepages() in its
> jbd2_journal_submit_inode_data_buffers() helper because it is still used from
> OCFS2. Again, something to be dealt with in a separate patchset.
>
> Changes since v1:
> * Added Reviewed-by tags from Ritesh
> * Added patch to get rid of generic_writepages() in ext4_writepages()
> * Added patch to get rid of .writepage hook

Oh! And what about the WARN_ON_ONCE in ext4_writepages() while loop, which we
were discussing here [1]. Do you think that will help in catching anything nasty?

[1]: https://lore.kernel.org/linux-ext4/20221201115500.kbxtteft3v4pzqqx@quack3/T/#mcf7b6cc301062e52a3600194b03a9fd872ba52c5


One thing I guess I missed in my previous review is the fast commit path.
In my overnight testing of previous patch series I observed this warning.

WARNING: CPU: 1 PID: 1746936 at fs/ext4/inode.c:1994 ext4_writepage+0x4e6/0x5e0
RIP: 0010:ext4_writepage+0x4e6/0x5e0
Call Trace:
 <TASK>
 __writepage+0x17/0x70
 write_cache_pages+0x166/0x3c0
 ? dirty_background_bytes_handler+0x30/0x30
 ? finish_task_switch.isra.0+0x8e/0x260
 ? _raw_spin_lock_irqsave+0x19/0x50
 ? finish_wait+0x34/0x70
 ? _raw_spin_unlock_irqrestore+0x1e/0x40
 generic_writepages+0x4f/0x80
 jbd2_journal_submit_inode_data_buffers+0x64/0x90
 ext4_fc_commit+0x2e0/0x830
 ? file_check_and_advance_wb_err+0x2e/0xd0
 ? preempt_count_add+0x70/0xa0
 ext4_sync_file+0x15c/0x380
 __do_sys_msync+0x1c1/0x2a0
 do_syscall_64+0x38/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

-ritesh


>
> 								Honza
> Previous versions:
> Link: http://lore.kernel.org/r/20221130162435.2324-1-jack@suse.cz # v1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/11] ext4: Stop using ext4_writepage() for writeout of ordered data
  2022-12-02 18:39 [PATCH v2 0/11] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
                   ` (11 preceding siblings ...)
  2022-12-03  0:52 ` [PATCH v2 0/11] ext4: Stop using ext4_writepage() for writeout of ordered data Ritesh Harjani
@ 2022-12-04  6:56 ` Christoph Hellwig
  12 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2022-12-04  6:56 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ted Tso, linux-ext4, Christoph Hellwig, Ritesh Harjani

On Fri, Dec 02, 2022 at 07:39:25PM +0100, Jan Kara wrote:
> I have also modified ext4_writepages() to use write_cache_pages() instead of
> generic_writepages() so now we don't expose .writepage hook at all. We still
> keep ext4_writepage() as a callback for write_cache_pages().

Nice!

> We should refactor
> that path as well and get rid of ext4_writepage() completely but that is for a
> separate cleanup.

Agreed.

> Also note that jbd2 still uses generic_writepages() in its
> jbd2_journal_submit_inode_data_buffers() helper because it is still used from
> OCFS2. Again, something to be dealt with in a separate patchset.

Indeed.  I think simply moving jbd2_journal_submit_inode_data_buffers to
ocfs2 and then open coding generic_writepages will be the right thing to
do here.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 9/11] ext4: Switch to using write_cache_pages() for data=journal writeout
  2022-12-02 18:39 ` [PATCH v2 9/11] ext4: Switch to using write_cache_pages() for data=journal writeout Jan Kara
@ 2022-12-04  6:58   ` Christoph Hellwig
  2022-12-05 10:07     ` Jan Kara
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2022-12-04  6:58 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ted Tso, linux-ext4, Christoph Hellwig, Ritesh Harjani

> +static int __writepage(struct page *page, struct writeback_control *wbc,
> +		       void *data)

Can we give this a somewhat more descriptive name, e.g.
ext4_writepage_cb or so?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/11] ext4: Stop providing .writepage hook
  2022-12-02 18:39 ` [PATCH v2 0/11] ext4: Stop providing .writepage hook Jan Kara
@ 2022-12-04  6:59   ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2022-12-04  6:59 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ted Tso, linux-ext4, Christoph Hellwig, Ritesh Harjani

On Fri, Dec 02, 2022 at 07:39:35PM +0100, Jan Kara wrote:
> Now we don't need .writepage hook for anything anymore. Reclaim is fine
> with relying on .writepages to clean pages and we often couldn't do much
> from the .writepage callback anyway. We only need to provide
> .migrate_folio callback for the ext4_journalled_aops - let's use
> buffer_migrate_page_norefs() there so that buffers cannot be modified
> under jdb2's hands.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/11] ext4: Remove ordered data support from ext4_writepage()
  2022-12-02 18:39 ` [PATCH v2 1/11] ext4: Remove ordered data support from ext4_writepage() Jan Kara
@ 2022-12-04  7:06   ` Christoph Hellwig
  2022-12-05 10:17     ` Jan Kara
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2022-12-04  7:06 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ted Tso, linux-ext4, Christoph Hellwig, Ritesh Harjani

> + * This function is now used only when journaling data. We cannot start
> + * transaction directly because transaction start ranks above page lock so we
> + * have to do some magic.
>   */
> +static int ext4_writepage(struct page *page, struct writeback_control *wbc,
> +			  void *data)

Maybe call this ext4_journalled_writepage now to make the limitation
more clear?  And/or fold __ext4_journalled_writepage into while we're
at it.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/11] ext4: Stop using ext4_writepage() for writeout of ordered data
  2022-12-03  0:52 ` [PATCH v2 0/11] ext4: Stop using ext4_writepage() for writeout of ordered data Ritesh Harjani
@ 2022-12-05  9:13   ` Jan Kara
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2022-12-05  9:13 UTC (permalink / raw)
  To: Ritesh Harjani; +Cc: Jan Kara, Ted Tso, linux-ext4, Christoph Hellwig

On Sat 03-12-22 06:22:56, Ritesh Harjani wrote:
> On 22/12/02 07:39PM, Jan Kara wrote:
> > Hello,
> >
> > this patch series modifies ext4 so that we stop using ext4_writepage() for
> > writeout of ordered data during transaction commit (through
> > generic_writepages() from jbd2_journal_submit_inode_data_buffers()). Instead we
> > directly call ext4_writepages() from the
> > ext4_journal_submit_inode_data_buffers(). This is part of Christoph's effort
> > to get rid of the .writepage() callback in all filesystems.
> >
> > I have also modified ext4_writepages() to use write_cache_pages() instead of
> > generic_writepages() so now we don't expose .writepage hook at all. We still
> > keep ext4_writepage() as a callback for write_cache_pages(). We should refactor
> > that path as well and get rid of ext4_writepage() completely but that is for a
> > separate cleanup. Also note that jbd2 still uses generic_writepages() in its
> > jbd2_journal_submit_inode_data_buffers() helper because it is still used from
> > OCFS2. Again, something to be dealt with in a separate patchset.
> >
> > Changes since v1:
> > * Added Reviewed-by tags from Ritesh
> > * Added patch to get rid of generic_writepages() in ext4_writepages()
> > * Added patch to get rid of .writepage hook
> 
> Oh! And what about the WARN_ON_ONCE in ext4_writepages() while loop,
> which we were discussing here [1]. Do you think that will help in
> catching anything nasty?
> 
> [1]: https://lore.kernel.org/linux-ext4/20221201115500.kbxtteft3v4pzqqx@quack3/T/#mcf7b6cc301062e52a3600194b03a9fd872ba52c5

Ah, right. Forgot about this. Thanks for reminder.

> One thing I guess I missed in my previous review is the fast commit path.

Good point, I didn't think about that one :)

> In my overnight testing of previous patch series I observed this warning.
> 
> WARNING: CPU: 1 PID: 1746936 at fs/ext4/inode.c:1994 ext4_writepage+0x4e6/0x5e0
> RIP: 0010:ext4_writepage+0x4e6/0x5e0
> Call Trace:
>  <TASK>
>  __writepage+0x17/0x70
>  write_cache_pages+0x166/0x3c0
>  ? dirty_background_bytes_handler+0x30/0x30
>  ? finish_task_switch.isra.0+0x8e/0x260
>  ? _raw_spin_lock_irqsave+0x19/0x50
>  ? finish_wait+0x34/0x70
>  ? _raw_spin_unlock_irqrestore+0x1e/0x40
>  generic_writepages+0x4f/0x80
>  jbd2_journal_submit_inode_data_buffers+0x64/0x90
>  ext4_fc_commit+0x2e0/0x830
>  ? file_check_and_advance_wb_err+0x2e/0xd0
>  ? preempt_count_add+0x70/0xa0
>  ext4_sync_file+0x15c/0x380
>  __do_sys_msync+0x1c1/0x2a0
>  do_syscall_64+0x38/0x90
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Yep, that path needs conversion as well.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 9/11] ext4: Switch to using write_cache_pages() for data=journal writeout
  2022-12-04  6:58   ` Christoph Hellwig
@ 2022-12-05 10:07     ` Jan Kara
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2022-12-05 10:07 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jan Kara, Ted Tso, linux-ext4, Ritesh Harjani

On Sat 03-12-22 22:58:10, Christoph Hellwig wrote:
> > +static int __writepage(struct page *page, struct writeback_control *wbc,
> > +		       void *data)
> 
> Can we give this a somewhat more descriptive name, e.g.
> ext4_writepage_cb or so?

OK, done. Thanks for review!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/11] ext4: Remove ordered data support from ext4_writepage()
  2022-12-04  7:06   ` Christoph Hellwig
@ 2022-12-05 10:17     ` Jan Kara
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2022-12-05 10:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jan Kara, Ted Tso, linux-ext4, Ritesh Harjani

On Sat 03-12-22 23:06:47, Christoph Hellwig wrote:
> > + * This function is now used only when journaling data. We cannot start
> > + * transaction directly because transaction start ranks above page lock so we
> > + * have to do some magic.
> >   */
> > +static int ext4_writepage(struct page *page, struct writeback_control *wbc,
> > +			  void *data)
> 
> Maybe call this ext4_journalled_writepage now to make the limitation
> more clear?  And/or fold __ext4_journalled_writepage into while we're
> at it.

Yeah, I've renamed it to ext4_journalled_writepage. I've refrained from the
folding of __ext4_journalled_writepage() because that is not completely
trivial (due to goto labels, variable naming...) so it would require
separate patch and I'll just spare that for the cleanup of the whole
journalled writeout path.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-12-05 10:17 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-02 18:39 [PATCH v2 0/11] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
2022-12-02 18:39 ` [PATCH v2 1/11] ext4: Handle redirtying in ext4_bio_write_page() Jan Kara
2022-12-02 18:39 ` [PATCH v2 2/11] ext4: Move keep_towrite handling to ext4_bio_write_page() Jan Kara
2022-12-02 18:39 ` [PATCH v2 3/11] ext4: Remove nr_submitted from ext4_bio_write_page() Jan Kara
2022-12-02 18:39 ` [PATCH v2 4/11] ext4: Drop pointless IO submission " Jan Kara
2022-12-02 18:39 ` [PATCH v2 5/11] ext4: Add support for writepages calls that cannot map blocks Jan Kara
2022-12-02 18:39 ` [PATCH v2 6/11] ext4: Provide ext4_do_writepages() Jan Kara
2022-12-02 18:39 ` [PATCH v2 7/11] ext4: Move percpu_rwsem protection into ext4_writepages() Jan Kara
2022-12-02 18:39 ` [PATCH v2 8/11] ext4: Switch to using ext4_do_writepages() for ordered data writeout Jan Kara
2022-12-02 18:39 ` [PATCH v2 9/11] ext4: Switch to using write_cache_pages() for data=journal writeout Jan Kara
2022-12-04  6:58   ` Christoph Hellwig
2022-12-05 10:07     ` Jan Kara
2022-12-02 18:39 ` [PATCH v2 0/11] ext4: Stop providing .writepage hook Jan Kara
2022-12-04  6:59   ` Christoph Hellwig
2022-12-02 18:39 ` [PATCH v2 1/11] ext4: Remove ordered data support from ext4_writepage() Jan Kara
2022-12-04  7:06   ` Christoph Hellwig
2022-12-05 10:17     ` Jan Kara
2022-12-03  0:52 ` [PATCH v2 0/11] ext4: Stop using ext4_writepage() for writeout of ordered data Ritesh Harjani
2022-12-05  9:13   ` Jan Kara
2022-12-04  6:56 ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.