linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/13] ext4: Stop using ext4_writepage() for writeout of ordered data
@ 2022-12-07 11:27 Jan Kara
  2022-12-07 11:27 ` [PATCH v4 01/13] ext4: Handle redirtying in ext4_bio_write_page() Jan Kara
                   ` (12 more replies)
  0 siblings, 13 replies; 29+ messages in thread
From: Jan Kara @ 2022-12-07 11:27 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani (IBM), Jan Kara

Hello,

this patch series modifies ext4 so that we stop using ext4_writepage() for
writeout of ordered data during transaction commit (through
generic_writepages() from jbd2_journal_submit_inode_data_buffers()). Instead we
directly call ext4_writepages() from the
ext4_journal_submit_inode_data_buffers(). This is part of Christoph's effort
to get rid of the .writepage() callback in all filesystems.

I have also modified ext4_writepages() to use write_cache_pages() instead of
generic_writepages() so now we don't expose .writepage hook at all. We still
keep ext4_writepage() as a callback for write_cache_pages(). We should refactor
that path as well and get rid of ext4_writepage() completely but that is for a
separate cleanup. Also note that jbd2 still uses generic_writepages() in its
jbd2_journal_submit_inode_data_buffers() helper because it is still used from
OCFS2. Again, something to be dealt with in a separate patchset.

Changes since v3:
* Added export of buffer_migrate_page_norefs()
* Clarified commit message about page migration a bit

Changes since v2:
* Added Reviewed-by tags from Christoph
* Added WARN_ON to verify we're not trying to start transaction from
  transaction commit writeback
* Converted fastcommit path to use ext4_writepages() for data writeout
* Some minor tweaks suggested by Christoph

Changes since v1:
* Added Reviewed-by tags from Ritesh
* Added patch to get rid of generic_writepages() in ext4_writepages()
* Added patch to get rid of .writepage hook

								Honza
Previous versions:
Link: http://lore.kernel.org/r/20221130162435.2324-1-jack@suse.cz # v1
Link: http://lore.kernel.org/r/20221202163815.22928-1-jack@suse.cz # v2
Link: http://lore.kernel.org/r/20221205122604.25994-1-jack@suse.cz # v3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v4 01/13] ext4: Handle redirtying in ext4_bio_write_page()
  2022-12-07 11:27 [PATCH v4 0/13] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
@ 2022-12-07 11:27 ` Jan Kara
  2022-12-07 11:27 ` [PATCH v4 02/13] ext4: Move keep_towrite handling to ext4_bio_write_page() Jan Kara
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Jan Kara @ 2022-12-07 11:27 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani (IBM), Jan Kara

Since we want to transition transaction commits to use ext4_writepages()
for writing back ordered, add handling of page redirtying into
ext4_bio_write_page(). Also move buffer dirty bit clearing into the same
place other buffer state handling.

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/page-io.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 97fa7b4c645f..4e68ace86f11 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -482,6 +482,13 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 			/* A hole? We can safely clear the dirty bit */
 			if (!buffer_mapped(bh))
 				clear_buffer_dirty(bh);
+			/*
+			 * Keeping dirty some buffer we cannot write? Make
+			 * sure to redirty the page. This happens e.g. when
+			 * doing writeout for transaction commit.
+			 */
+			if (buffer_dirty(bh) && !PageDirty(page))
+				redirty_page_for_writepage(wbc, page);
 			if (io->io_bio)
 				ext4_io_submit(io);
 			continue;
@@ -489,6 +496,7 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 		if (buffer_new(bh))
 			clear_buffer_new(bh);
 		set_buffer_async_write(bh);
+		clear_buffer_dirty(bh);
 		nr_to_submit++;
 	} while ((bh = bh->b_this_page) != head);
 
@@ -532,7 +540,10 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 			printk_ratelimited(KERN_ERR "%s: ret = %d\n", __func__, ret);
 			redirty_page_for_writepage(wbc, page);
 			do {
-				clear_buffer_async_write(bh);
+				if (buffer_async_write(bh)) {
+					clear_buffer_async_write(bh);
+					set_buffer_dirty(bh);
+				}
 				bh = bh->b_this_page;
 			} while (bh != head);
 			goto unlock;
@@ -546,7 +557,6 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 		io_submit_add_bh(io, inode,
 				 bounce_page ? bounce_page : page, bh);
 		nr_submitted++;
-		clear_buffer_dirty(bh);
 	} while ((bh = bh->b_this_page) != head);
 
 unlock:
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 02/13] ext4: Move keep_towrite handling to ext4_bio_write_page()
  2022-12-07 11:27 [PATCH v4 0/13] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
  2022-12-07 11:27 ` [PATCH v4 01/13] ext4: Handle redirtying in ext4_bio_write_page() Jan Kara
@ 2022-12-07 11:27 ` Jan Kara
  2022-12-07 11:27 ` [PATCH v4 03/13] ext4: Remove nr_submitted from ext4_bio_write_page() Jan Kara
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Jan Kara @ 2022-12-07 11:27 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani (IBM), Jan Kara

When we are writing back page but we cannot for some reason write all
its buffers (e.g. because we cannot allocate blocks in current context) we
have to keep TOWRITE tag set in the mapping as otherwise racing
WB_SYNC_ALL writeback that could write these buffers can skip the page
and result in data loss. We will need this logic for writeback during
transaction commit so move the logic from ext4_writepage() to
ext4_bio_write_page().

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/ext4.h    |  3 +--
 fs/ext4/inode.c   |  6 ++----
 fs/ext4/page-io.c | 36 +++++++++++++++++++++---------------
 3 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 8d5453852f98..1b3bffc04fd0 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3756,8 +3756,7 @@ extern void ext4_end_io_rsv_work(struct work_struct *work);
 extern void ext4_io_submit(struct ext4_io_submit *io);
 extern int ext4_bio_write_page(struct ext4_io_submit *io,
 			       struct page *page,
-			       int len,
-			       bool keep_towrite);
+			       int len);
 extern struct ext4_io_end_vec *ext4_alloc_io_end_vec(ext4_io_end_t *io_end);
 extern struct ext4_io_end_vec *ext4_last_io_end_vec(ext4_io_end_t *io_end);
 
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2b5ef1b64249..43eb175d0c1c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2009,7 +2009,6 @@ static int ext4_writepage(struct page *page,
 	struct buffer_head *page_bufs = NULL;
 	struct inode *inode = page->mapping->host;
 	struct ext4_io_submit io_submit;
-	bool keep_towrite = false;
 
 	if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) {
 		folio_invalidate(folio, 0, folio_size(folio));
@@ -2067,7 +2066,6 @@ static int ext4_writepage(struct page *page,
 			unlock_page(page);
 			return 0;
 		}
-		keep_towrite = true;
 	}
 
 	if (PageChecked(page) && ext4_should_journal_data(inode))
@@ -2084,7 +2082,7 @@ static int ext4_writepage(struct page *page,
 		unlock_page(page);
 		return -ENOMEM;
 	}
-	ret = ext4_bio_write_page(&io_submit, page, len, keep_towrite);
+	ret = ext4_bio_write_page(&io_submit, page, len);
 	ext4_io_submit(&io_submit);
 	/* Drop io_end reference we got from init */
 	ext4_put_io_end_defer(io_submit.io_end);
@@ -2118,7 +2116,7 @@ static int mpage_submit_page(struct mpage_da_data *mpd, struct page *page)
 		len = size & ~PAGE_MASK;
 	else
 		len = PAGE_SIZE;
-	err = ext4_bio_write_page(&mpd->io_submit, page, len, false);
+	err = ext4_bio_write_page(&mpd->io_submit, page, len);
 	if (!err)
 		mpd->wbc->nr_to_write--;
 	mpd->first_page++;
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 4e68ace86f11..4f9ecacd10aa 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -430,8 +430,7 @@ static void io_submit_add_bh(struct ext4_io_submit *io,
 
 int ext4_bio_write_page(struct ext4_io_submit *io,
 			struct page *page,
-			int len,
-			bool keep_towrite)
+			int len)
 {
 	struct page *bounce_page = NULL;
 	struct inode *inode = page->mapping->host;
@@ -441,14 +440,11 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 	int nr_submitted = 0;
 	int nr_to_submit = 0;
 	struct writeback_control *wbc = io->io_wbc;
+	bool keep_towrite = false;
 
 	BUG_ON(!PageLocked(page));
 	BUG_ON(PageWriteback(page));
 
-	if (keep_towrite)
-		set_page_writeback_keepwrite(page);
-	else
-		set_page_writeback(page);
 	ClearPageError(page);
 
 	/*
@@ -483,12 +479,17 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 			if (!buffer_mapped(bh))
 				clear_buffer_dirty(bh);
 			/*
-			 * Keeping dirty some buffer we cannot write? Make
-			 * sure to redirty the page. This happens e.g. when
-			 * doing writeout for transaction commit.
+			 * Keeping dirty some buffer we cannot write? Make sure
+			 * to redirty the page and keep TOWRITE tag so that
+			 * racing WB_SYNC_ALL writeback does not skip the page.
+			 * This happens e.g. when doing writeout for
+			 * transaction commit.
 			 */
-			if (buffer_dirty(bh) && !PageDirty(page))
-				redirty_page_for_writepage(wbc, page);
+			if (buffer_dirty(bh)) {
+				if (!PageDirty(page))
+					redirty_page_for_writepage(wbc, page);
+				keep_towrite = true;
+			}
 			if (io->io_bio)
 				ext4_io_submit(io);
 			continue;
@@ -500,6 +501,10 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 		nr_to_submit++;
 	} while ((bh = bh->b_this_page) != head);
 
+	/* Nothing to submit? Just unlock the page... */
+	if (!nr_to_submit)
+		goto unlock;
+
 	bh = head = page_buffers(page);
 
 	/*
@@ -550,6 +555,11 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 		}
 	}
 
+	if (keep_towrite)
+		set_page_writeback_keepwrite(page);
+	else
+		set_page_writeback(page);
+
 	/* Now submit buffers to write */
 	do {
 		if (!buffer_async_write(bh))
@@ -558,11 +568,7 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 				 bounce_page ? bounce_page : page, bh);
 		nr_submitted++;
 	} while ((bh = bh->b_this_page) != head);
-
 unlock:
 	unlock_page(page);
-	/* Nothing submitted - we have to end page writeback */
-	if (!nr_submitted)
-		end_page_writeback(page);
 	return ret;
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 03/13] ext4: Remove nr_submitted from ext4_bio_write_page()
  2022-12-07 11:27 [PATCH v4 0/13] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
  2022-12-07 11:27 ` [PATCH v4 01/13] ext4: Handle redirtying in ext4_bio_write_page() Jan Kara
  2022-12-07 11:27 ` [PATCH v4 02/13] ext4: Move keep_towrite handling to ext4_bio_write_page() Jan Kara
@ 2022-12-07 11:27 ` Jan Kara
  2022-12-07 11:27 ` [PATCH v4 04/13] ext4: Drop pointless IO submission " Jan Kara
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Jan Kara @ 2022-12-07 11:27 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani (IBM), Jan Kara

nr_submitted is the same as nr_to_submit. Drop one of them.

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/page-io.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 4f9ecacd10aa..2bdfb8a046d9 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -437,7 +437,6 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 	unsigned block_start;
 	struct buffer_head *bh, *head;
 	int ret = 0;
-	int nr_submitted = 0;
 	int nr_to_submit = 0;
 	struct writeback_control *wbc = io->io_wbc;
 	bool keep_towrite = false;
@@ -566,7 +565,6 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 			continue;
 		io_submit_add_bh(io, inode,
 				 bounce_page ? bounce_page : page, bh);
-		nr_submitted++;
 	} while ((bh = bh->b_this_page) != head);
 unlock:
 	unlock_page(page);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 04/13] ext4: Drop pointless IO submission from ext4_bio_write_page()
  2022-12-07 11:27 [PATCH v4 0/13] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
                   ` (2 preceding siblings ...)
  2022-12-07 11:27 ` [PATCH v4 03/13] ext4: Remove nr_submitted from ext4_bio_write_page() Jan Kara
@ 2022-12-07 11:27 ` Jan Kara
  2022-12-07 11:27 ` [PATCH v4 05/13] ext4: Add support for writepages calls that cannot map blocks Jan Kara
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Jan Kara @ 2022-12-07 11:27 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani (IBM), Jan Kara

We submit outstanding IO in ext4_bio_write_page() if we find a buffer we
are not going to write. This is however pointless because we already
handle submission of previous IO in case we detect newly added buffer
head is discontiguous. So just delete the pointless IO submission call.

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/page-io.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 2bdfb8a046d9..beaec6d81074 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -489,8 +489,6 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 					redirty_page_for_writepage(wbc, page);
 				keep_towrite = true;
 			}
-			if (io->io_bio)
-				ext4_io_submit(io);
 			continue;
 		}
 		if (buffer_new(bh))
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 05/13] ext4: Add support for writepages calls that cannot map blocks
  2022-12-07 11:27 [PATCH v4 0/13] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
                   ` (3 preceding siblings ...)
  2022-12-07 11:27 ` [PATCH v4 04/13] ext4: Drop pointless IO submission " Jan Kara
@ 2022-12-07 11:27 ` Jan Kara
  2022-12-07 11:27 ` [PATCH v4 06/13] ext4: Provide ext4_do_writepages() Jan Kara
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Jan Kara @ 2022-12-07 11:27 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani (IBM), Jan Kara

Add support for calls to ext4_writepages() than cannot map blocks. These
will be issued from jbd2 transaction commit code.

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 62 ++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 48 insertions(+), 14 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 43eb175d0c1c..9ba843b7a92c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1557,6 +1557,7 @@ struct mpage_da_data {
 	struct ext4_map_blocks map;
 	struct ext4_io_submit io_submit;	/* IO submission data */
 	unsigned int do_map:1;
+	unsigned int can_map:1;	/* Can writepages call map blocks? */
 	unsigned int scanned_until_end:1;
 };
 
@@ -2549,18 +2550,33 @@ static int ext4_da_writepages_trans_blocks(struct inode *inode)
 				MAX_WRITEPAGES_EXTENT_LEN + bpp - 1, bpp);
 }
 
+/* Return true if the page needs to be written as part of transaction commit */
+static bool ext4_page_nomap_can_writeout(struct page *page)
+{
+	struct buffer_head *bh, *head;
+
+	bh = head = page_buffers(page);
+	do {
+		if (buffer_dirty(bh) && buffer_mapped(bh) && !buffer_delay(bh))
+			return true;
+	} while ((bh = bh->b_this_page) != head);
+	return false;
+}
+
 /*
  * mpage_prepare_extent_to_map - find & lock contiguous range of dirty pages
- * 				 and underlying extent to map
+ * 				 needing mapping, submit mapped pages
  *
  * @mpd - where to look for pages
  *
  * Walk dirty pages in the mapping. If they are fully mapped, submit them for
- * IO immediately. When we find a page which isn't mapped we start accumulating
- * extent of buffers underlying these pages that needs mapping (formed by
- * either delayed or unwritten buffers). We also lock the pages containing
- * these buffers. The extent found is returned in @mpd structure (starting at
- * mpd->lblk with length mpd->len blocks).
+ * IO immediately. If we cannot map blocks, we submit just already mapped
+ * buffers in the page for IO and keep page dirty. When we can map blocks and
+ * we find a page which isn't mapped we start accumulating extent of buffers
+ * underlying these pages that needs mapping (formed by either delayed or
+ * unwritten buffers). We also lock the pages containing these buffers. The
+ * extent found is returned in @mpd structure (starting at mpd->lblk with
+ * length mpd->len blocks).
  *
  * Note that this function can attach bios to one io_end structure which are
  * neither logically nor physically contiguous. Although it may seem as an
@@ -2651,14 +2667,30 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 			if (mpd->map.m_len == 0)
 				mpd->first_page = page->index;
 			mpd->next_page = page->index + 1;
-			/* Add all dirty buffers to mpd */
-			lblk = ((ext4_lblk_t)page->index) <<
-				(PAGE_SHIFT - blkbits);
-			head = page_buffers(page);
-			err = mpage_process_page_bufs(mpd, head, head, lblk);
-			if (err <= 0)
-				goto out;
-			err = 0;
+			/*
+			 * Writeout for transaction commit where we cannot
+			 * modify metadata is simple. Just submit the page.
+			 */
+			if (!mpd->can_map) {
+				if (ext4_page_nomap_can_writeout(page)) {
+					err = mpage_submit_page(mpd, page);
+					if (err < 0)
+						goto out;
+				} else {
+					unlock_page(page);
+					mpd->first_page++;
+				}
+			} else {
+				/* Add all dirty buffers to mpd */
+				lblk = ((ext4_lblk_t)page->index) <<
+					(PAGE_SHIFT - blkbits);
+				head = page_buffers(page);
+				err = mpage_process_page_bufs(mpd, head, head,
+							      lblk);
+				if (err <= 0)
+					goto out;
+				err = 0;
+			}
 			left--;
 		}
 		pagevec_release(&pvec);
@@ -2778,6 +2810,7 @@ static int ext4_writepages(struct address_space *mapping,
 	 */
 	mpd.do_map = 0;
 	mpd.scanned_until_end = 0;
+	mpd.can_map = 1;
 	mpd.io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
 	if (!mpd.io_submit.io_end) {
 		ret = -ENOMEM;
@@ -2801,6 +2834,7 @@ static int ext4_writepages(struct address_space *mapping,
 			break;
 		}
 
+		WARN_ON_ONCE(!mpd->can_map);
 		/*
 		 * We have two constraints: We find one extent to map and we
 		 * must always write out whole page (makes a difference when
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 06/13] ext4: Provide ext4_do_writepages()
  2022-12-07 11:27 [PATCH v4 0/13] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
                   ` (4 preceding siblings ...)
  2022-12-07 11:27 ` [PATCH v4 05/13] ext4: Add support for writepages calls that cannot map blocks Jan Kara
@ 2022-12-07 11:27 ` Jan Kara
  2022-12-07 11:27 ` [PATCH v4 07/13] ext4: Move percpu_rwsem protection into ext4_writepages() Jan Kara
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Jan Kara @ 2022-12-07 11:27 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani (IBM), Jan Kara

Provide ext4_do_writepages() function that takes mpage_da_data as an
argument and make ext4_writepages() just a simple wrapper around it. No
functional changes.

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 96 +++++++++++++++++++++++++++----------------------
 1 file changed, 54 insertions(+), 42 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9ba843b7a92c..99c66c768cd0 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1543,9 +1543,12 @@ void ext4_da_release_space(struct inode *inode, int to_free)
  */
 
 struct mpage_da_data {
+	/* These are input fields for ext4_do_writepages() */
 	struct inode *inode;
 	struct writeback_control *wbc;
+	unsigned int can_map:1;	/* Can writepages call map blocks? */
 
+	/* These are internal state of ext4_do_writepages() */
 	pgoff_t first_page;	/* The first page to write */
 	pgoff_t next_page;	/* Current page to examine */
 	pgoff_t last_page;	/* Last page to examine */
@@ -1557,7 +1560,6 @@ struct mpage_da_data {
 	struct ext4_map_blocks map;
 	struct ext4_io_submit io_submit;	/* IO submission data */
 	unsigned int do_map:1;
-	unsigned int can_map:1;	/* Can writepages call map blocks? */
 	unsigned int scanned_until_end:1;
 };
 
@@ -2703,16 +2705,16 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 	return err;
 }
 
-static int ext4_writepages(struct address_space *mapping,
-			   struct writeback_control *wbc)
+static int ext4_do_writepages(struct mpage_da_data *mpd)
 {
+	struct writeback_control *wbc = mpd->wbc;
 	pgoff_t	writeback_index = 0;
 	long nr_to_write = wbc->nr_to_write;
 	int range_whole = 0;
 	int cycled = 1;
 	handle_t *handle = NULL;
-	struct mpage_da_data mpd;
-	struct inode *inode = mapping->host;
+	struct inode *inode = mpd->inode;
+	struct address_space *mapping = inode->i_mapping;
 	int needed_blocks, rsv_blocks = 0, ret = 0;
 	struct ext4_sb_info *sbi = EXT4_SB(mapping->host->i_sb);
 	struct blk_plug plug;
@@ -2787,19 +2789,18 @@ static int ext4_writepages(struct address_space *mapping,
 		writeback_index = mapping->writeback_index;
 		if (writeback_index)
 			cycled = 0;
-		mpd.first_page = writeback_index;
-		mpd.last_page = -1;
+		mpd->first_page = writeback_index;
+		mpd->last_page = -1;
 	} else {
-		mpd.first_page = wbc->range_start >> PAGE_SHIFT;
-		mpd.last_page = wbc->range_end >> PAGE_SHIFT;
+		mpd->first_page = wbc->range_start >> PAGE_SHIFT;
+		mpd->last_page = wbc->range_end >> PAGE_SHIFT;
 	}
 
-	mpd.inode = inode;
-	mpd.wbc = wbc;
-	ext4_io_submit_init(&mpd.io_submit, wbc);
+	ext4_io_submit_init(&mpd->io_submit, wbc);
 retry:
 	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
-		tag_pages_for_writeback(mapping, mpd.first_page, mpd.last_page);
+		tag_pages_for_writeback(mapping, mpd->first_page,
+					mpd->last_page);
 	blk_start_plug(&plug);
 
 	/*
@@ -2808,28 +2809,27 @@ static int ext4_writepages(struct address_space *mapping,
 	 * in the block layer on device congestion while having transaction
 	 * started.
 	 */
-	mpd.do_map = 0;
-	mpd.scanned_until_end = 0;
-	mpd.can_map = 1;
-	mpd.io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
-	if (!mpd.io_submit.io_end) {
+	mpd->do_map = 0;
+	mpd->scanned_until_end = 0;
+	mpd->io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
+	if (!mpd->io_submit.io_end) {
 		ret = -ENOMEM;
 		goto unplug;
 	}
-	ret = mpage_prepare_extent_to_map(&mpd);
+	ret = mpage_prepare_extent_to_map(mpd);
 	/* Unlock pages we didn't use */
-	mpage_release_unused_pages(&mpd, false);
+	mpage_release_unused_pages(mpd, false);
 	/* Submit prepared bio */
-	ext4_io_submit(&mpd.io_submit);
-	ext4_put_io_end_defer(mpd.io_submit.io_end);
-	mpd.io_submit.io_end = NULL;
+	ext4_io_submit(&mpd->io_submit);
+	ext4_put_io_end_defer(mpd->io_submit.io_end);
+	mpd->io_submit.io_end = NULL;
 	if (ret < 0)
 		goto unplug;
 
-	while (!mpd.scanned_until_end && wbc->nr_to_write > 0) {
+	while (!mpd->scanned_until_end && wbc->nr_to_write > 0) {
 		/* For each extent of pages we use new io_end */
-		mpd.io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
-		if (!mpd.io_submit.io_end) {
+		mpd->io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
+		if (!mpd->io_submit.io_end) {
 			ret = -ENOMEM;
 			break;
 		}
@@ -2854,16 +2854,16 @@ static int ext4_writepages(struct address_space *mapping,
 			       "%ld pages, ino %lu; err %d", __func__,
 				wbc->nr_to_write, inode->i_ino, ret);
 			/* Release allocated io_end */
-			ext4_put_io_end(mpd.io_submit.io_end);
-			mpd.io_submit.io_end = NULL;
+			ext4_put_io_end(mpd->io_submit.io_end);
+			mpd->io_submit.io_end = NULL;
 			break;
 		}
-		mpd.do_map = 1;
+		mpd->do_map = 1;
 
-		trace_ext4_da_write_pages(inode, mpd.first_page, mpd.wbc);
-		ret = mpage_prepare_extent_to_map(&mpd);
-		if (!ret && mpd.map.m_len)
-			ret = mpage_map_and_submit_extent(handle, &mpd,
+		trace_ext4_da_write_pages(inode, mpd->first_page, wbc);
+		ret = mpage_prepare_extent_to_map(mpd);
+		if (!ret && mpd->map.m_len)
+			ret = mpage_map_and_submit_extent(handle, mpd,
 					&give_up_on_write);
 		/*
 		 * Caution: If the handle is synchronous,
@@ -2878,12 +2878,12 @@ static int ext4_writepages(struct address_space *mapping,
 		if (!ext4_handle_valid(handle) || handle->h_sync == 0) {
 			ext4_journal_stop(handle);
 			handle = NULL;
-			mpd.do_map = 0;
+			mpd->do_map = 0;
 		}
 		/* Unlock pages we didn't use */
-		mpage_release_unused_pages(&mpd, give_up_on_write);
+		mpage_release_unused_pages(mpd, give_up_on_write);
 		/* Submit prepared bio */
-		ext4_io_submit(&mpd.io_submit);
+		ext4_io_submit(&mpd->io_submit);
 
 		/*
 		 * Drop our io_end reference we got from init. We have
@@ -2893,11 +2893,11 @@ static int ext4_writepages(struct address_space *mapping,
 		 * up doing unwritten extent conversion.
 		 */
 		if (handle) {
-			ext4_put_io_end_defer(mpd.io_submit.io_end);
+			ext4_put_io_end_defer(mpd->io_submit.io_end);
 			ext4_journal_stop(handle);
 		} else
-			ext4_put_io_end(mpd.io_submit.io_end);
-		mpd.io_submit.io_end = NULL;
+			ext4_put_io_end(mpd->io_submit.io_end);
+		mpd->io_submit.io_end = NULL;
 
 		if (ret == -ENOSPC && sbi->s_journal) {
 			/*
@@ -2917,8 +2917,8 @@ static int ext4_writepages(struct address_space *mapping,
 	blk_finish_plug(&plug);
 	if (!ret && !cycled && wbc->nr_to_write > 0) {
 		cycled = 1;
-		mpd.last_page = writeback_index - 1;
-		mpd.first_page = 0;
+		mpd->last_page = writeback_index - 1;
+		mpd->first_page = 0;
 		goto retry;
 	}
 
@@ -2928,7 +2928,7 @@ static int ext4_writepages(struct address_space *mapping,
 		 * Set the writeback_index so that range_cyclic
 		 * mode will write it back later
 		 */
-		mapping->writeback_index = mpd.first_page;
+		mapping->writeback_index = mpd->first_page;
 
 out_writepages:
 	trace_ext4_writepages_result(inode, wbc, ret,
@@ -2937,6 +2937,18 @@ static int ext4_writepages(struct address_space *mapping,
 	return ret;
 }
 
+static int ext4_writepages(struct address_space *mapping,
+			   struct writeback_control *wbc)
+{
+	struct mpage_da_data mpd = {
+		.inode = mapping->host,
+		.wbc = wbc,
+		.can_map = 1,
+	};
+
+	return ext4_do_writepages(&mpd);
+}
+
 static int ext4_dax_writepages(struct address_space *mapping,
 			       struct writeback_control *wbc)
 {
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 07/13] ext4: Move percpu_rwsem protection into ext4_writepages()
  2022-12-07 11:27 [PATCH v4 0/13] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
                   ` (5 preceding siblings ...)
  2022-12-07 11:27 ` [PATCH v4 06/13] ext4: Provide ext4_do_writepages() Jan Kara
@ 2022-12-07 11:27 ` Jan Kara
  2022-12-07 11:27 ` [PATCH v4 08/13] ext4: Switch to using ext4_do_writepages() for ordered data writeout Jan Kara
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Jan Kara @ 2022-12-07 11:27 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani (IBM), Jan Kara

Move protection by percpu_rwsem from ext4_do_writepages() to
ext4_writepages(). We will not want to grab this protection during
transaction commits as that would be prone to deadlocks and the
protection is not needed. Move the shutdown state checking as well since
we want to be able to complete commit while the shutdown is in progress.

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 99c66c768cd0..4f8f4959524c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2720,10 +2720,6 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
 	struct blk_plug plug;
 	bool give_up_on_write = false;
 
-	if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
-		return -EIO;
-
-	percpu_down_read(&sbi->s_writepages_rwsem);
 	trace_ext4_writepages(inode, wbc);
 
 	/*
@@ -2933,20 +2929,28 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
 out_writepages:
 	trace_ext4_writepages_result(inode, wbc, ret,
 				     nr_to_write - wbc->nr_to_write);
-	percpu_up_read(&sbi->s_writepages_rwsem);
 	return ret;
 }
 
 static int ext4_writepages(struct address_space *mapping,
 			   struct writeback_control *wbc)
 {
+	struct super_block *sb = mapping->host->i_sb;
 	struct mpage_da_data mpd = {
 		.inode = mapping->host,
 		.wbc = wbc,
 		.can_map = 1,
 	};
+	int ret;
+
+	if (unlikely(ext4_forced_shutdown(EXT4_SB(sb))))
+		return -EIO;
 
-	return ext4_do_writepages(&mpd);
+	percpu_down_read(&EXT4_SB(sb)->s_writepages_rwsem);
+	ret = ext4_do_writepages(&mpd);
+	percpu_up_read(&EXT4_SB(sb)->s_writepages_rwsem);
+
+	return ret;
 }
 
 static int ext4_dax_writepages(struct address_space *mapping,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 08/13] ext4: Switch to using ext4_do_writepages() for ordered data writeout
  2022-12-07 11:27 [PATCH v4 0/13] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
                   ` (6 preceding siblings ...)
  2022-12-07 11:27 ` [PATCH v4 07/13] ext4: Move percpu_rwsem protection into ext4_writepages() Jan Kara
@ 2022-12-07 11:27 ` Jan Kara
  2022-12-07 11:27 ` [PATCH v4 09/13] jbd2: Switch jbd2_submit_inode_data() to use fs-provided hook for " Jan Kara
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Jan Kara @ 2022-12-07 11:27 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani (IBM), Jan Kara

Use the standard writepages method (ext4_do_writepages()) to perform
writeout of ordered data during journal commit.

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/ext4.h  |  1 +
 fs/ext4/inode.c | 16 ++++++++++++++++
 fs/ext4/super.c |  3 +--
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 1b3bffc04fd0..07b55cc48578 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2999,6 +2999,7 @@ extern void ext4_set_inode_flags(struct inode *, bool init);
 extern int ext4_alloc_da_blocks(struct inode *inode);
 extern void ext4_set_aops(struct inode *inode);
 extern int ext4_writepage_trans_blocks(struct inode *);
+extern int ext4_normal_submit_inode_data_buffers(struct jbd2_inode *jinode);
 extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks);
 extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
 			     loff_t lstart, loff_t lend);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 4f8f4959524c..b93a436bf5bc 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2953,6 +2953,22 @@ static int ext4_writepages(struct address_space *mapping,
 	return ret;
 }
 
+int ext4_normal_submit_inode_data_buffers(struct jbd2_inode *jinode)
+{
+	struct writeback_control wbc = {
+		.sync_mode = WB_SYNC_ALL,
+		.nr_to_write = LONG_MAX,
+		.range_start = jinode->i_dirty_start,
+		.range_end = jinode->i_dirty_end,
+	};
+	struct mpage_da_data mpd = {
+		.inode = jinode->i_vfs_inode,
+		.wbc = &wbc,
+		.can_map = 0,
+	};
+	return ext4_do_writepages(&mpd);
+}
+
 static int ext4_dax_writepages(struct address_space *mapping,
 			       struct writeback_control *wbc)
 {
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 7cdd2138c897..c02329dd7574 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -540,8 +540,7 @@ static int ext4_journal_submit_inode_data_buffers(struct jbd2_inode *jinode)
 	if (ext4_should_journal_data(jinode->i_vfs_inode))
 		ret = ext4_journalled_submit_inode_data_buffers(jinode);
 	else
-		ret = jbd2_journal_submit_inode_data_buffers(jinode);
-
+		ret = ext4_normal_submit_inode_data_buffers(jinode);
 	return ret;
 }
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 09/13] jbd2: Switch jbd2_submit_inode_data() to use fs-provided hook for data writeout
  2022-12-07 11:27 [PATCH v4 0/13] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
                   ` (7 preceding siblings ...)
  2022-12-07 11:27 ` [PATCH v4 08/13] ext4: Switch to using ext4_do_writepages() for ordered data writeout Jan Kara
@ 2022-12-07 11:27 ` Jan Kara
  2022-12-08 15:28   ` Ritesh Harjani (IBM)
  2022-12-07 11:27 ` [PATCH v4 10/13] ext4: Switch to using write_cache_pages() for data=journal writeout Jan Kara
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2022-12-07 11:27 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani (IBM), Jan Kara

jbd2_submit_inode_data() hardcoded use of
jbd2_journal_submit_inode_data_buffers() for submission of data pages.
Make it use j_submit_inode_data_buffers hook instead. This effectively
switches ext4 fastcommits to use ext4_writepages() for data writeout
instead of generic_writepages().

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/fast_commit.c | 2 +-
 fs/jbd2/commit.c      | 5 ++---
 include/linux/jbd2.h  | 2 +-
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 0f6d0a80467d..7c6694593497 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -986,7 +986,7 @@ static int ext4_fc_submit_inode_data_all(journal_t *journal)
 			finish_wait(&ei->i_fc_wait, &wait);
 		}
 		spin_unlock(&sbi->s_fc_lock);
-		ret = jbd2_submit_inode_data(ei->jinode);
+		ret = jbd2_submit_inode_data(journal, ei->jinode);
 		if (ret)
 			return ret;
 		spin_lock(&sbi->s_fc_lock);
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 885a7a6cc53e..4810438b7856 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -207,14 +207,13 @@ int jbd2_journal_submit_inode_data_buffers(struct jbd2_inode *jinode)
 }
 
 /* Send all the data buffers related to an inode */
-int jbd2_submit_inode_data(struct jbd2_inode *jinode)
+int jbd2_submit_inode_data(journal_t *journal, struct jbd2_inode *jinode)
 {
-
 	if (!jinode || !(jinode->i_flags & JI_WRITE_DATA))
 		return 0;
 
 	trace_jbd2_submit_inode_data(jinode->i_vfs_inode);
-	return jbd2_journal_submit_inode_data_buffers(jinode);
+	return journal->j_submit_inode_data_buffers(jinode);
 
 }
 EXPORT_SYMBOL(jbd2_submit_inode_data);
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 0b7242370b56..2170e0cc279d 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1662,7 +1662,7 @@ int jbd2_fc_begin_commit(journal_t *journal, tid_t tid);
 int jbd2_fc_end_commit(journal_t *journal);
 int jbd2_fc_end_commit_fallback(journal_t *journal);
 int jbd2_fc_get_buf(journal_t *journal, struct buffer_head **bh_out);
-int jbd2_submit_inode_data(struct jbd2_inode *jinode);
+int jbd2_submit_inode_data(journal_t *journal, struct jbd2_inode *jinode);
 int jbd2_wait_inode_data(journal_t *journal, struct jbd2_inode *jinode);
 int jbd2_fc_wait_bufs(journal_t *journal, int num_blks);
 int jbd2_fc_release_bufs(journal_t *journal);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 10/13] ext4: Switch to using write_cache_pages() for data=journal writeout
  2022-12-07 11:27 [PATCH v4 0/13] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
                   ` (8 preceding siblings ...)
  2022-12-07 11:27 ` [PATCH v4 09/13] jbd2: Switch jbd2_submit_inode_data() to use fs-provided hook for " Jan Kara
@ 2022-12-07 11:27 ` Jan Kara
  2022-12-08 15:40   ` Ritesh Harjani (IBM)
  2022-12-07 11:27 ` [PATCH v4 11/13] mm: Export buffer_migrate_folio_norefs() Jan Kara
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2022-12-07 11:27 UTC (permalink / raw)
  To: Ted Tso
  Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani (IBM),
	Jan Kara, Christoph Hellwig

Instead of using generic_writepages(), let's use write_cache_pages() for
writeout of journalled data. It will allow us to stop providing
.writepage callback. Our data=journal writeback path would benefit from
a larger cleanup and refactoring but that's for a separate cleanup
series.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index b93a436bf5bc..f3b3792c1c96 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2705,6 +2705,12 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 	return err;
 }
 
+static int ext4_writepage_cb(struct page *page, struct writeback_control *wbc,
+			     void *data)
+{
+	return ext4_writepage(page, wbc);
+}
+
 static int ext4_do_writepages(struct mpage_da_data *mpd)
 {
 	struct writeback_control *wbc = mpd->wbc;
@@ -2731,7 +2737,9 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
 		goto out_writepages;
 
 	if (ext4_should_journal_data(inode)) {
-		ret = generic_writepages(mapping, wbc);
+		blk_start_plug(&plug);
+		ret = write_cache_pages(mapping, wbc, ext4_writepage_cb, NULL);
+		blk_finish_plug(&plug);
 		goto out_writepages;
 	}
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 11/13] mm: Export buffer_migrate_folio_norefs()
  2022-12-07 11:27 [PATCH v4 0/13] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
                   ` (9 preceding siblings ...)
  2022-12-07 11:27 ` [PATCH v4 10/13] ext4: Switch to using write_cache_pages() for data=journal writeout Jan Kara
@ 2022-12-07 11:27 ` Jan Kara
  2022-12-07 11:59   ` Christoph Hellwig
  2022-12-07 11:27 ` [PATCH v4 12/13] ext4: Stop providing .writepage hook Jan Kara
  2022-12-07 11:27 ` [PATCH v4 13/13] ext4: Remove ordered data support from ext4_writepage() Jan Kara
  12 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2022-12-07 11:27 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani (IBM), Jan Kara

Ext4 needs this function to allow safe migration for journalled data
pages.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/migrate.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/migrate.c b/mm/migrate.c
index dff333593a8a..5e4ca21da712 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -820,6 +820,7 @@ int buffer_migrate_folio_norefs(struct address_space *mapping,
 {
 	return __buffer_migrate_folio(mapping, dst, src, mode, true);
 }
+EXPORT_SYMBOL_GPL(buffer_migrate_folio_norefs);
 #endif
 
 int filemap_migrate_folio(struct address_space *mapping,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 12/13] ext4: Stop providing .writepage hook
  2022-12-07 11:27 [PATCH v4 0/13] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
                   ` (10 preceding siblings ...)
  2022-12-07 11:27 ` [PATCH v4 11/13] mm: Export buffer_migrate_folio_norefs() Jan Kara
@ 2022-12-07 11:27 ` Jan Kara
  2023-05-08 17:51   ` youling257
  2022-12-07 11:27 ` [PATCH v4 13/13] ext4: Remove ordered data support from ext4_writepage() Jan Kara
  12 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2022-12-07 11:27 UTC (permalink / raw)
  To: Ted Tso
  Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani (IBM),
	Jan Kara, Christoph Hellwig

Now we don't need .writepage hook for anything anymore. Reclaim is fine
with relying on .writepages to clean pages and we often couldn't do much
from the .writepage callback anyway. We only need to provide
.migrate_folio callback for the ext4_journalled_aops - let's use
buffer_migrate_page_norefs() there so that buffers cannot be modified
under jdb2's hands as that can cause data corruption. For example when
commit code does writeout of transaction buffers in
jbd2_journal_write_metadata_buffer(), we don't hold page lock or have
page writeback bit set or have the buffer locked. So page migration code
would go and happily migrate the page elsewhere while the copy is
running thus corrupting data.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f3b3792c1c96..acf9d23c1cfb 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3718,7 +3718,6 @@ static int ext4_iomap_swap_activate(struct swap_info_struct *sis,
 static const struct address_space_operations ext4_aops = {
 	.read_folio		= ext4_read_folio,
 	.readahead		= ext4_readahead,
-	.writepage		= ext4_writepage,
 	.writepages		= ext4_writepages,
 	.write_begin		= ext4_write_begin,
 	.write_end		= ext4_write_end,
@@ -3736,7 +3735,6 @@ static const struct address_space_operations ext4_aops = {
 static const struct address_space_operations ext4_journalled_aops = {
 	.read_folio		= ext4_read_folio,
 	.readahead		= ext4_readahead,
-	.writepage		= ext4_writepage,
 	.writepages		= ext4_writepages,
 	.write_begin		= ext4_write_begin,
 	.write_end		= ext4_journalled_write_end,
@@ -3745,6 +3743,7 @@ static const struct address_space_operations ext4_journalled_aops = {
 	.invalidate_folio	= ext4_journalled_invalidate_folio,
 	.release_folio		= ext4_release_folio,
 	.direct_IO		= noop_direct_IO,
+	.migrate_folio		= buffer_migrate_folio_norefs,
 	.is_partially_uptodate  = block_is_partially_uptodate,
 	.error_remove_page	= generic_error_remove_page,
 	.swap_activate		= ext4_iomap_swap_activate,
@@ -3753,7 +3752,6 @@ static const struct address_space_operations ext4_journalled_aops = {
 static const struct address_space_operations ext4_da_aops = {
 	.read_folio		= ext4_read_folio,
 	.readahead		= ext4_readahead,
-	.writepage		= ext4_writepage,
 	.writepages		= ext4_writepages,
 	.write_begin		= ext4_da_write_begin,
 	.write_end		= ext4_da_write_end,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 13/13] ext4: Remove ordered data support from ext4_writepage()
  2022-12-07 11:27 [PATCH v4 0/13] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
                   ` (11 preceding siblings ...)
  2022-12-07 11:27 ` [PATCH v4 12/13] ext4: Stop providing .writepage hook Jan Kara
@ 2022-12-07 11:27 ` Jan Kara
  2022-12-08 14:23   ` Theodore Ts'o
  12 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2022-12-07 11:27 UTC (permalink / raw)
  To: Ted Tso
  Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani (IBM),
	Jan Kara, Christoph Hellwig

ext4_writepage() should not be called for ordered data anymore. Remove
support for it from the function.

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 150 +++++++++---------------------------------------
 1 file changed, 26 insertions(+), 124 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index acf9d23c1cfb..c62614f6eacf 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1006,11 +1006,6 @@ int ext4_walk_page_buffers(handle_t *handle, struct inode *inode,
  * and the commit_write().  So doing the jbd2_journal_start at the start of
  * prepare_write() is the right place.
  *
- * Also, this function can nest inside ext4_writepage().  In that case, we
- * *know* that ext4_writepage() has generated enough buffer credits to do the
- * whole page.  So we won't block on the journal in that case, which is good,
- * because the caller may be PF_MEMALLOC.
- *
  * By accident, ext4 can be reentered when a transaction is open via
  * quota file writes.  If we were to commit the transaction while thus
  * reentered, there can be a deadlock - we would be holding a quota
@@ -1642,12 +1637,6 @@ static void ext4_print_free_blocks(struct inode *inode)
 	return;
 }
 
-static int ext4_bh_delay_or_unwritten(handle_t *handle, struct inode *inode,
-				      struct buffer_head *bh)
-{
-	return (buffer_delay(bh) || buffer_unwritten(bh)) && buffer_dirty(bh);
-}
-
 /*
  * ext4_insert_delayed_block - adds a delayed block to the extents status
  *                             tree, incrementing the reserved cluster/block
@@ -1962,56 +1951,18 @@ static int __ext4_journalled_writepage(struct page *page,
 }
 
 /*
- * Note that we don't need to start a transaction unless we're journaling data
- * because we should have holes filled from ext4_page_mkwrite(). We even don't
- * need to file the inode to the transaction's list in ordered mode because if
- * we are writing back data added by write(), the inode is already there and if
- * we are writing back data modified via mmap(), no one guarantees in which
- * transaction the data will hit the disk. In case we are journaling data, we
- * cannot start transaction directly because transaction start ranks above page
- * lock so we have to do some magic.
- *
- * This function can get called via...
- *   - ext4_writepages after taking page lock (have journal handle)
- *   - journal_submit_inode_data_buffers (no journal handle)
- *   - shrink_page_list via the kswapd/direct reclaim (no journal handle)
- *   - grab_page_cache when doing write_begin (have journal handle)
- *
- * We don't do any block allocation in this function. If we have page with
- * multiple blocks we need to write those buffer_heads that are mapped. This
- * is important for mmaped based write. So if we do with blocksize 1K
- * truncate(f, 1024);
- * a = mmap(f, 0, 4096);
- * a[0] = 'a';
- * truncate(f, 4096);
- * we have in the page first buffer_head mapped via page_mkwrite call back
- * but other buffer_heads would be unmapped but dirty (dirty done via the
- * do_wp_page). So writepage should write the first block. If we modify
- * the mmap area beyond 1024 we will again get a page_fault and the
- * page_mkwrite callback will do the block allocation and mark the
- * buffer_heads mapped.
- *
- * We redirty the page if we have any buffer_heads that is either delay or
- * unwritten in the page.
- *
- * We can get recursively called as show below.
- *
- *	ext4_writepage() -> kmalloc() -> __alloc_pages() -> page_launder() ->
- *		ext4_writepage()
- *
- * But since we don't do any block allocation we should not deadlock.
- * Page also have the dirty flag cleared so we don't get recurive page_lock.
+ * This function is now used only when journaling data. We cannot start
+ * transaction directly because transaction start ranks above page lock so we
+ * have to do some magic.
  */
-static int ext4_writepage(struct page *page,
-			  struct writeback_control *wbc)
+static int ext4_journalled_writepage(struct page *page,
+				     struct writeback_control *wbc,
+				     void *data)
 {
 	struct folio *folio = page_folio(page);
-	int ret = 0;
 	loff_t size;
 	unsigned int len;
-	struct buffer_head *page_bufs = NULL;
 	struct inode *inode = page->mapping->host;
-	struct ext4_io_submit io_submit;
 
 	if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) {
 		folio_invalidate(folio, 0, folio_size(folio));
@@ -2036,60 +1987,16 @@ static int ext4_writepage(struct page *page,
 		return 0;
 	}
 
-	page_bufs = page_buffers(page);
-	/*
-	 * We cannot do block allocation or other extent handling in this
-	 * function. If there are buffers needing that, we have to redirty
-	 * the page. But we may reach here when we do a journal commit via
-	 * journal_submit_inode_data_buffers() and in that case we must write
-	 * allocated buffers to achieve data=ordered mode guarantees.
-	 *
-	 * Also, if there is only one buffer per page (the fs block
-	 * size == the page size), if one buffer needs block
-	 * allocation or needs to modify the extent tree to clear the
-	 * unwritten flag, we know that the page can't be written at
-	 * all, so we might as well refuse the write immediately.
-	 * Unfortunately if the block size != page size, we can't as
-	 * easily detect this case using ext4_walk_page_buffers(), but
-	 * for the extremely common case, this is an optimization that
-	 * skips a useless round trip through ext4_bio_write_page().
-	 */
-	if (ext4_walk_page_buffers(NULL, inode, page_bufs, 0, len, NULL,
-				   ext4_bh_delay_or_unwritten)) {
-		redirty_page_for_writepage(wbc, page);
-		if ((current->flags & PF_MEMALLOC) ||
-		    (inode->i_sb->s_blocksize == PAGE_SIZE)) {
-			/*
-			 * For memory cleaning there's no point in writing only
-			 * some buffers. So just bail out. Warn if we came here
-			 * from direct reclaim.
-			 */
-			WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD))
-							== PF_MEMALLOC);
-			unlock_page(page);
-			return 0;
-		}
-	}
-
-	if (PageChecked(page) && ext4_should_journal_data(inode))
-		/*
-		 * It's mmapped pagecache.  Add buffers and journal it.  There
-		 * doesn't seem much point in redirtying the page here.
-		 */
-		return __ext4_journalled_writepage(page, len);
-
-	ext4_io_submit_init(&io_submit, wbc);
-	io_submit.io_end = ext4_init_io_end(inode, GFP_NOFS);
-	if (!io_submit.io_end) {
-		redirty_page_for_writepage(wbc, page);
+	WARN_ON_ONCE(!ext4_should_journal_data(inode));
+	if (!PageChecked(page)) {
 		unlock_page(page);
-		return -ENOMEM;
+		return 0;
 	}
-	ret = ext4_bio_write_page(&io_submit, page, len);
-	ext4_io_submit(&io_submit);
-	/* Drop io_end reference we got from init */
-	ext4_put_io_end_defer(io_submit.io_end);
-	return ret;
+	/*
+	 * It's mmapped pagecache.  Add buffers and journal it.  There
+	 * doesn't seem much point in redirtying the page here.
+	 */
+	return __ext4_journalled_writepage(page, len);
 }
 
 static int mpage_submit_page(struct mpage_da_data *mpd, struct page *page)
@@ -2705,12 +2612,6 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 	return err;
 }
 
-static int ext4_writepage_cb(struct page *page, struct writeback_control *wbc,
-			     void *data)
-{
-	return ext4_writepage(page, wbc);
-}
-
 static int ext4_do_writepages(struct mpage_da_data *mpd)
 {
 	struct writeback_control *wbc = mpd->wbc;
@@ -2738,7 +2639,8 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
 
 	if (ext4_should_journal_data(inode)) {
 		blk_start_plug(&plug);
-		ret = write_cache_pages(mapping, wbc, ext4_writepage_cb, NULL);
+		ret = write_cache_pages(mapping, wbc, ext4_journalled_writepage,
+					NULL);
 		blk_finish_plug(&plug);
 		goto out_writepages;
 	}
@@ -3153,9 +3055,8 @@ static int ext4_da_write_end(struct file *file,
 	 * i_disksize since writeback will push i_disksize upto i_size
 	 * eventually. If the end of the current write is > i_size and
 	 * inside an allocated block (ext4_da_should_update_i_disksize()
-	 * check), we need to update i_disksize here as neither
-	 * ext4_writepage() nor certain ext4_writepages() paths not
-	 * allocating blocks update i_disksize.
+	 * check), we need to update i_disksize here as ext4_writepages() need
+	 * not do it in this case.
 	 *
 	 * Note that we defer inode dirtying to generic_write_end() /
 	 * ext4_da_write_inline_data_end().
@@ -5357,13 +5258,14 @@ static void ext4_wait_for_tail_page_commit(struct inode *inode)
 
 	offset = inode->i_size & (PAGE_SIZE - 1);
 	/*
-	 * If the folio is fully truncated, we don't need to wait for any commit
-	 * (and we even should not as __ext4_journalled_invalidate_folio() may
-	 * strip all buffers from the folio but keep the folio dirty which can then
-	 * confuse e.g. concurrent ext4_writepage() seeing dirty folio without
-	 * buffers). Also we don't need to wait for any commit if all buffers in
-	 * the folio remain valid. This is most beneficial for the common case of
-	 * blocksize == PAGESIZE.
+	 * If the folio is fully truncated, we don't need to wait for any
+	 * commit (and we even should not as
+	 * __ext4_journalled_invalidate_folio() may strip all buffers from the
+	 * folio but keep the folio dirty which can then confuse e.g.
+	 * concurrent ext4_journalled_writepage() seeing dirty folio without
+	 * buffers). Also we don't need to wait for any commit if all buffers
+	 * in the folio remain valid. This is most beneficial for the common
+	 * case of blocksize == PAGESIZE.
 	 */
 	if (!offset || offset > (PAGE_SIZE - i_blocksize(inode)))
 		return;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 11/13] mm: Export buffer_migrate_folio_norefs()
  2022-12-07 11:27 ` [PATCH v4 11/13] mm: Export buffer_migrate_folio_norefs() Jan Kara
@ 2022-12-07 11:59   ` Christoph Hellwig
  0 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2022-12-07 11:59 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ted Tso, linux-ext4, Christoph Hellwig, Ritesh Harjani (IBM)

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 13/13] ext4: Remove ordered data support from ext4_writepage()
  2022-12-07 11:27 ` [PATCH v4 13/13] ext4: Remove ordered data support from ext4_writepage() Jan Kara
@ 2022-12-08 14:23   ` Theodore Ts'o
  2022-12-08 15:40     ` Jan Kara
  2022-12-08 15:57     ` Theodore Ts'o
  0 siblings, 2 replies; 29+ messages in thread
From: Theodore Ts'o @ 2022-12-08 14:23 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani (IBM), Christoph Hellwig

On Wed, Dec 07, 2022 at 12:27:16PM +0100, Jan Kara wrote:
> ext4_writepage() should not be called for ordered data anymore. Remove
> support for it from the function.
> 
> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jan Kara <jack@suse.cz>

This commit (determined via bisection) is causing a large number of
test failures for ext4/data_journal case:

ext4/data_journal: 521 tests, 33 failures, 98 skipped, 4248 seconds
  Failures: ext4/004 generic/012 generic/016 generic/021 generic/022 
    generic/029 generic/030 generic/031 generic/032 generic/058 
    generic/060 generic/061 generic/063 generic/071 generic/075 
    generic/098 generic/112 generic/127 generic/231 generic/393 
    generic/397 generic/404 generic/439 generic/455 generic/477 
    generic/491 generic/567 generic/572 generic/574 generic/577 
    generic/634 generic/639 generic/679

Many/most of these failures appear to be data plane failures.  For
example ext4/004 is a "dump | restore" followed by a "diff -r
$DUMP_DIR $RESTORE_DIR".  The generic/012, generic/021, generic/022
failures are md5sum checksum failures after testing the punch hole
operation.  The generic/029 failure are hexdump mismatches after
calling a combination of truncate, pwrites, and mmap'ed writes, etc.

Since this is the last patch in the series, and we've already dropped
the writepage hook (which is one of the things Christoph was going
for), so one approach might be drop this patch from the series at
least for this upcoming merge window.

Jan, what do you think?

					- Ted

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 09/13] jbd2: Switch jbd2_submit_inode_data() to use fs-provided hook for data writeout
  2022-12-07 11:27 ` [PATCH v4 09/13] jbd2: Switch jbd2_submit_inode_data() to use fs-provided hook for " Jan Kara
@ 2022-12-08 15:28   ` Ritesh Harjani (IBM)
  0 siblings, 0 replies; 29+ messages in thread
From: Ritesh Harjani (IBM) @ 2022-12-08 15:28 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ted Tso, linux-ext4, Christoph Hellwig

On 22/12/07 12:27PM, Jan Kara wrote:
> jbd2_submit_inode_data() hardcoded use of
> jbd2_journal_submit_inode_data_buffers() for submission of data pages.
> Make it use j_submit_inode_data_buffers hook instead. This effectively
> switches ext4 fastcommits to use ext4_writepages() for data writeout
> instead of generic_writepages().

Very neat!! I agree, that jbd2_submit_inode_data() should have always used
journal->j_submit_inode_data_buffers().

Looks good to me. Please feel free to add:
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>


>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/fast_commit.c | 2 +-
>  fs/jbd2/commit.c      | 5 ++---
>  include/linux/jbd2.h  | 2 +-
>  3 files changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index 0f6d0a80467d..7c6694593497 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -986,7 +986,7 @@ static int ext4_fc_submit_inode_data_all(journal_t *journal)
>  			finish_wait(&ei->i_fc_wait, &wait);
>  		}
>  		spin_unlock(&sbi->s_fc_lock);
> -		ret = jbd2_submit_inode_data(ei->jinode);
> +		ret = jbd2_submit_inode_data(journal, ei->jinode);
>  		if (ret)
>  			return ret;
>  		spin_lock(&sbi->s_fc_lock);
> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index 885a7a6cc53e..4810438b7856 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -207,14 +207,13 @@ int jbd2_journal_submit_inode_data_buffers(struct jbd2_inode *jinode)
>  }
>
>  /* Send all the data buffers related to an inode */
> -int jbd2_submit_inode_data(struct jbd2_inode *jinode)
> +int jbd2_submit_inode_data(journal_t *journal, struct jbd2_inode *jinode)
>  {
> -
>  	if (!jinode || !(jinode->i_flags & JI_WRITE_DATA))
>  		return 0;
>
>  	trace_jbd2_submit_inode_data(jinode->i_vfs_inode);
> -	return jbd2_journal_submit_inode_data_buffers(jinode);
> +	return journal->j_submit_inode_data_buffers(jinode);
>
>  }
>  EXPORT_SYMBOL(jbd2_submit_inode_data);
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 0b7242370b56..2170e0cc279d 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1662,7 +1662,7 @@ int jbd2_fc_begin_commit(journal_t *journal, tid_t tid);
>  int jbd2_fc_end_commit(journal_t *journal);
>  int jbd2_fc_end_commit_fallback(journal_t *journal);
>  int jbd2_fc_get_buf(journal_t *journal, struct buffer_head **bh_out);
> -int jbd2_submit_inode_data(struct jbd2_inode *jinode);
> +int jbd2_submit_inode_data(journal_t *journal, struct jbd2_inode *jinode);
>  int jbd2_wait_inode_data(journal_t *journal, struct jbd2_inode *jinode);
>  int jbd2_fc_wait_bufs(journal_t *journal, int num_blks);
>  int jbd2_fc_release_bufs(journal_t *journal);
> --
> 2.35.3
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 13/13] ext4: Remove ordered data support from ext4_writepage()
  2022-12-08 14:23   ` Theodore Ts'o
@ 2022-12-08 15:40     ` Jan Kara
  2022-12-08 15:57     ` Theodore Ts'o
  1 sibling, 0 replies; 29+ messages in thread
From: Jan Kara @ 2022-12-08 15:40 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Jan Kara, linux-ext4, Christoph Hellwig, Ritesh Harjani (IBM),
	Christoph Hellwig

On Thu 08-12-22 09:23:56, Theodore Ts'o wrote:
> On Wed, Dec 07, 2022 at 12:27:16PM +0100, Jan Kara wrote:
> > ext4_writepage() should not be called for ordered data anymore. Remove
> > support for it from the function.
> > 
> > Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> This commit (determined via bisection) is causing a large number of
> test failures for ext4/data_journal case:
> 
> ext4/data_journal: 521 tests, 33 failures, 98 skipped, 4248 seconds
>   Failures: ext4/004 generic/012 generic/016 generic/021 generic/022 
>     generic/029 generic/030 generic/031 generic/032 generic/058 
>     generic/060 generic/061 generic/063 generic/071 generic/075 
>     generic/098 generic/112 generic/127 generic/231 generic/393 
>     generic/397 generic/404 generic/439 generic/455 generic/477 
>     generic/491 generic/567 generic/572 generic/574 generic/577 
>     generic/634 generic/639 generic/679
> 
> Many/most of these failures appear to be data plane failures.  For
> example ext4/004 is a "dump | restore" followed by a "diff -r
> $DUMP_DIR $RESTORE_DIR".  The generic/012, generic/021, generic/022
> failures are md5sum checksum failures after testing the punch hole
> operation.  The generic/029 failure are hexdump mismatches after
> calling a combination of truncate, pwrites, and mmap'ed writes, etc.
> 
> Since this is the last patch in the series, and we've already dropped
> the writepage hook (which is one of the things Christoph was going
> for), so one approach might be drop this patch from the series at
> least for this upcoming merge window.
> 
> Jan, what do you think?

Yes, please drop it and I'm sorry for the breakage. I did only very limited
testing with data=journal mode (which passed) and apparently something
unexpected breaks. I suspect some path doing writeout of committed (but not
yet checkpointed) data pages breaks. I'll have a look.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 10/13] ext4: Switch to using write_cache_pages() for data=journal writeout
  2022-12-07 11:27 ` [PATCH v4 10/13] ext4: Switch to using write_cache_pages() for data=journal writeout Jan Kara
@ 2022-12-08 15:40   ` Ritesh Harjani (IBM)
  2022-12-08 16:33     ` Jan Kara
  0 siblings, 1 reply; 29+ messages in thread
From: Ritesh Harjani (IBM) @ 2022-12-08 15:40 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ted Tso, linux-ext4, Christoph Hellwig, Christoph Hellwig

On 22/12/07 12:27PM, Jan Kara wrote:
> Instead of using generic_writepages(), let's use write_cache_pages() for
> writeout of journalled data. It will allow us to stop providing
> .writepage callback.

Ok. Just one quick query. I didn't look too deep for this and thought will
directly check it here.
What about marking an error via mapping_set_error() which earlier the
__writepage() call was handling in case of any error during writeback?

> Our data=journal writeback path would benefit from
> a larger cleanup and refactoring but that's for a separate cleanup
> series.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/inode.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index b93a436bf5bc..f3b3792c1c96 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2705,6 +2705,12 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
>  	return err;
>  }
>
> +static int ext4_writepage_cb(struct page *page, struct writeback_control *wbc,
> +			     void *data)
> +{
> +	return ext4_writepage(page, wbc);
> +}
> +
>  static int ext4_do_writepages(struct mpage_da_data *mpd)
>  {
>  	struct writeback_control *wbc = mpd->wbc;
> @@ -2731,7 +2737,9 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
>  		goto out_writepages;
>
>  	if (ext4_should_journal_data(inode)) {
> -		ret = generic_writepages(mapping, wbc);
> +		blk_start_plug(&plug);
> +		ret = write_cache_pages(mapping, wbc, ext4_writepage_cb, NULL);
> +		blk_finish_plug(&plug);
>  		goto out_writepages;
>  	}
>
> --
> 2.35.3
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 13/13] ext4: Remove ordered data support from ext4_writepage()
  2022-12-08 14:23   ` Theodore Ts'o
  2022-12-08 15:40     ` Jan Kara
@ 2022-12-08 15:57     ` Theodore Ts'o
  1 sibling, 0 replies; 29+ messages in thread
From: Theodore Ts'o @ 2022-12-08 15:57 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, Christoph Hellwig, Ritesh Harjani (IBM), Christoph Hellwig

On Thu, Dec 08, 2022 at 09:23:56AM -0500, Theodore Ts'o wrote:
> Since this is the last patch in the series, and we've already dropped
> the writepage hook (which is one of the things Christoph was going
> for), so one approach might be drop this patch from the series at
> least for this upcoming merge window.

And I've confirmed that dropping thie last patch resolves all of the
test regressions in the ext4/data_journal config.  So that's going to
my plan of record for the ext4 dev branch, unless I hear
differently from Jan.

							- Ted

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 10/13] ext4: Switch to using write_cache_pages() for data=journal writeout
  2022-12-08 15:40   ` Ritesh Harjani (IBM)
@ 2022-12-08 16:33     ` Jan Kara
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Kara @ 2022-12-08 16:33 UTC (permalink / raw)
  To: Ritesh Harjani (IBM)
  Cc: Jan Kara, Ted Tso, linux-ext4, Christoph Hellwig, Christoph Hellwig

On Thu 08-12-22 21:10:46, Ritesh Harjani (IBM) wrote:
> On 22/12/07 12:27PM, Jan Kara wrote:
> > Instead of using generic_writepages(), let's use write_cache_pages() for
> > writeout of journalled data. It will allow us to stop providing
> > .writepage callback.
> 
> Ok. Just one quick query. I didn't look too deep for this and thought will
> directly check it here.
> What about marking an error via mapping_set_error() which earlier the
> __writepage() call was handling in case of any error during writeback?

So yes, I have noticed we loose that call and decided we'll stay compatible
(arguably bug-to-bug) with what we do for data=ordered mode. If error
happens in ext4_writepage() (i.e., during adding buffers to transaction) we
will propagate it back up to the ->writepages() caller. I agree we should
probably tweak ext4_writepages() to also do mapping_set_error() just in
case this is writeback from flush worker so that application can learn
about the problem.

I'll add this to the larger cleanup of our writepages path. Thanks for the
comment.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v4 12/13] ext4: Stop providing .writepage hook
  2022-12-07 11:27 ` [PATCH v4 12/13] ext4: Stop providing .writepage hook Jan Kara
@ 2023-05-08 17:51   ` youling257
  2023-05-09  0:25     ` Jan Kara
  2023-05-09  5:02     ` Eric Biggers
  0 siblings, 2 replies; 29+ messages in thread
From: youling257 @ 2023-05-08 17:51 UTC (permalink / raw)
  To: jack; +Cc: hch, hch, linux-ext4, ritesh.list, tytso

I using linux mainline kernel on android. https://github.com/youling257/android-mainline/commits/6.4  https://github.com/youling257/android-mainline/commits/6.3
"ext4: Stop providing .writepage hook" cause some android app unable to read storage/emulated/0 files, i need to say android esdfs file system storage/emulated is ext4 data/media bind mount.
I want to ask, why android storage/emulated need .writepage hook?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 12/13] ext4: Stop providing .writepage hook
  2023-05-08 17:51   ` youling257
@ 2023-05-09  0:25     ` Jan Kara
  2023-05-09  5:02     ` Eric Biggers
  1 sibling, 0 replies; 29+ messages in thread
From: Jan Kara @ 2023-05-09  0:25 UTC (permalink / raw)
  To: youling257; +Cc: jack, hch, hch, linux-ext4, ritesh.list, tytso

Hello,

On Tue 09-05-23 01:51:08, youling257 wrote:
> I using linux mainline kernel on android.
> https://github.com/youling257/android-mainline/commits/6.4
> https://github.com/youling257/android-mainline/commits/6.3 "ext4: Stop
> providing .writepage hook" cause some android app unable to read
> storage/emulated/0 files, i need to say android esdfs file system
> storage/emulated is ext4 data/media bind mount.  I want to ask, why
> android storage/emulated need .writepage hook?

Honestly, I don't know. I guess you need to look into implementation of
esdfs in the Android kernel and why it needs filesystem's .writepage
hook...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 12/13] ext4: Stop providing .writepage hook
  2023-05-08 17:51   ` youling257
  2023-05-09  0:25     ` Jan Kara
@ 2023-05-09  5:02     ` Eric Biggers
  2023-05-09 18:36       ` Theodore Ts'o
  1 sibling, 1 reply; 29+ messages in thread
From: Eric Biggers @ 2023-05-09  5:02 UTC (permalink / raw)
  To: youling257; +Cc: jack, hch, hch, linux-ext4, ritesh.list, tytso

On Tue, May 09, 2023 at 01:51:08AM +0800, youling257 wrote:
> I using linux mainline kernel on android. https://github.com/youling257/android-mainline/commits/6.4  https://github.com/youling257/android-mainline/commits/6.3
> "ext4: Stop providing .writepage hook" cause some android app unable to read storage/emulated/0 files, i need to say android esdfs file system storage/emulated is ext4 data/media bind mount.
> I want to ask, why android storage/emulated need .writepage hook?

"esdfs" doesn't exist upstream, so linux-ext4 can't provide support for it.

Also, it doesn't exist in the Android Common Kernels either, so the Android team
cannot help you either.

- Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 12/13] ext4: Stop providing .writepage hook
  2023-05-09  5:02     ` Eric Biggers
@ 2023-05-09 18:36       ` Theodore Ts'o
  2023-05-10  5:17         ` youling 257
  0 siblings, 1 reply; 29+ messages in thread
From: Theodore Ts'o @ 2023-05-09 18:36 UTC (permalink / raw)
  To: Eric Biggers
  Cc: youling257, jack, hch, hch, linux-ext4, ritesh.list, keescook

On Mon, May 08, 2023 at 10:02:27PM -0700, Eric Biggers wrote:
> On Tue, May 09, 2023 at 01:51:08AM +0800, youling257 wrote:
> > I using linux mainline kernel on android. https://github.com/youling257/android-mainline/commits/6.4  https://github.com/youling257/android-mainline/commits/6.3
> > "ext4: Stop providing .writepage hook" cause some android app unable to read storage/emulated/0 files, i need to say android esdfs file system storage/emulated is ext4 data/media bind mount.
> > I want to ask, why android storage/emulated need .writepage hook?
> 
> "esdfs" doesn't exist upstream, so linux-ext4 can't provide support for it.
> 
> Also, it doesn't exist in the Android Common Kernels either, so the Android team
> cannot help you either.

The problem with esdfs is that it's based on the old stackable file
system paradigm which is filled with races and is inherently
unreliable (just for fun, try running fsstress on the upper and lower
file systems of a stackable file system simultaneously, and watch the
kernel crash and burn).  For that reason, some number of us have been
working for a while to eliminate the need for stacking file systems,
such as sdcardfs. esdfs, etc. from the Android kernel.

The other thing I would add is that upstream has been working[1] on
getting rid of writepage function.  So out-of-tree file systems are
going to need to adapt --- or die.

[1] https://lore.kernel.org/all/20221202102644.770505-1-hch@lst.de/

It looks like esdfs is coming from the Chromium kernel?  The latest
Chromium kernel I can find is 5.15 based, and it has esdfs in it.  I'm
sad to see that esdfs hasn't been removed from the Chromium kernel
yet, and replaced with something more stable and reliable, but maybe
we can find someone who is more familiar with the Chromium kernel to
comment.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 12/13] ext4: Stop providing .writepage hook
  2023-05-09 18:36       ` Theodore Ts'o
@ 2023-05-10  5:17         ` youling 257
  2023-05-10  5:47           ` youling 257
  0 siblings, 1 reply; 29+ messages in thread
From: youling 257 @ 2023-05-10  5:17 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Eric Biggers, jack, hch, hch, linux-ext4, ritesh.list, keescook

It isn't android storage emulated esdfs or sdcardfs problem, i test
mount bind /data/media on /storage/emulated, can reproduce my problem.
Let me say clear my problem, "ext4: Stop providing .writepage hook"
cause android gallery app unable read pictures thumbnails.

I test linux kernel 6.3 Revert "ext4: Stop providing .writepage hook",
android gallery can read pictures thumbnails,

this is mount bind /data/media on /storage/emulated
/dev/nvme0n1p1 on /storage/emulated type ext4 (rw,seclabel,noatime)
/dev/nvme0n1p1 on /mnt/runtime/default/emulated type ext4 (rw,seclabel,noatime)
/dev/nvme0n1p1 on /mnt/runtime/read/emulated type ext4 (rw,seclabel,noatime)
/dev/nvme0n1p1 on /mnt/runtime/write/emulated type ext4 (rw,seclabel,noatime)
/dev/nvme0n1p1 on /mnt/runtime/full/emulated type ext4 (rw,seclabel,noatime)

this is esdfs,
/data/media on /mnt/runtime/default/emulated type esdfs
(rw,nosuid,nodev,noexec,noatime,lower=1023:1023:664:775,upper=0:1015:771:771,derive=multi,noconfine,derive_gid,default_normal,unshared_obb)
/data/media on /storage/emulated type esdfs
(rw,nosuid,nodev,noexec,noatime,lower=1023:1023:664:775,upper=0:1015:771:771,derive=multi,noconfine,derive_gid,default_normal,unshared_obb)
/data/media on /mnt/runtime/read/emulated type esdfs
(rw,nosuid,nodev,noexec,noatime,lower=1023:1023:664:775,upper=0:9997:750:750,derive=multi,noconfine,derive_gid,default_normal,unshared_obb)
/data/media on /mnt/runtime/write/emulated type esdfs
(rw,nosuid,nodev,noexec,noatime,lower=1023:1023:664:775,upper=0:9997:770:770,derive=multi,noconfine,derive_gid,default_normal,unshared_obb)
/data/media on /mnt/runtime/full/emulated type esdfs
(rw,nosuid,nodev,noexec,noatime,lower=1023:1023:664:775,upper=0:9997:770:770,derive=multi,noconfine,derive_gid,default_normal,unshared_obb)

if i do mount bind data/media on storage/emulated, take a screenshot,
will create /data/media/0/Pictures/Screenshots/Screenshot_20230510-130020.png
file,
chmod -R 0777 /data/media/0/Pictures/Screenshots,
on linux 6.3 Revert "ext4: Stop providing .writepage hook", android
gallery app can read
/storage/emulated/0/Pictures/Screenshots/Screenshot_20230510-130020.png
thumbnail,
linux 6.4 removed .writepage hook, android gallery unable read thumbnail.


2023-05-10 2:36 GMT+08:00, Theodore Ts'o <tytso@mit.edu>:
> On Mon, May 08, 2023 at 10:02:27PM -0700, Eric Biggers wrote:
>> On Tue, May 09, 2023 at 01:51:08AM +0800, youling257 wrote:
>> > I using linux mainline kernel on android.
>> > https://github.com/youling257/android-mainline/commits/6.4
>> > https://github.com/youling257/android-mainline/commits/6.3
>> > "ext4: Stop providing .writepage hook" cause some android app unable to
>> > read storage/emulated/0 files, i need to say android esdfs file system
>> > storage/emulated is ext4 data/media bind mount.
>> > I want to ask, why android storage/emulated need .writepage hook?
>>
>> "esdfs" doesn't exist upstream, so linux-ext4 can't provide support for
>> it.
>>
>> Also, it doesn't exist in the Android Common Kernels either, so the
>> Android team
>> cannot help you either.
>
> The problem with esdfs is that it's based on the old stackable file
> system paradigm which is filled with races and is inherently
> unreliable (just for fun, try running fsstress on the upper and lower
> file systems of a stackable file system simultaneously, and watch the
> kernel crash and burn).  For that reason, some number of us have been
> working for a while to eliminate the need for stacking file systems,
> such as sdcardfs. esdfs, etc. from the Android kernel.
>
> The other thing I would add is that upstream has been working[1] on
> getting rid of writepage function.  So out-of-tree file systems are
> going to need to adapt --- or die.
>
> [1] https://lore.kernel.org/all/20221202102644.770505-1-hch@lst.de/
>
> It looks like esdfs is coming from the Chromium kernel?  The latest
> Chromium kernel I can find is 5.15 based, and it has esdfs in it.  I'm
> sad to see that esdfs hasn't been removed from the Chromium kernel
> yet, and replaced with something more stable and reliable, but maybe
> we can find someone who is more familiar with the Chromium kernel to
> comment.
>
> Cheers,
>
> 						- Ted
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 12/13] ext4: Stop providing .writepage hook
  2023-05-10  5:17         ` youling 257
@ 2023-05-10  5:47           ` youling 257
  2023-05-10  6:50             ` Eric Biggers
  0 siblings, 1 reply; 29+ messages in thread
From: youling 257 @ 2023-05-10  5:47 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Eric Biggers, jack, hch, hch, linux-ext4, ritesh.list, keescook

I do more test, it is android esdfs or sdcardfs
/storage/emulated/0/Android/data problem,
"ext4: Stop providing .writepage hook" cause
/storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.0
/storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.1
/storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.idx
unable read,

on linux 6.4, i use mount bind data/media on storage/emulated, chmod
-R 0777 /data/media/0, rm
/storage/emulated/0/Android/data/com.android.gallery3d/cache/*, open
gallery app can read pictures thumbnail,
/storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.idx
/storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.0
/storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.1
available read.


2023-05-10 13:17 GMT+08:00, youling 257 <youling257@gmail.com>:
> It isn't android storage emulated esdfs or sdcardfs problem, i test
> mount bind /data/media on /storage/emulated, can reproduce my problem.
> Let me say clear my problem, "ext4: Stop providing .writepage hook"
> cause android gallery app unable read pictures thumbnails.
>
> I test linux kernel 6.3 Revert "ext4: Stop providing .writepage hook",
> android gallery can read pictures thumbnails,
>
> this is mount bind /data/media on /storage/emulated
> /dev/nvme0n1p1 on /storage/emulated type ext4 (rw,seclabel,noatime)
> /dev/nvme0n1p1 on /mnt/runtime/default/emulated type ext4
> (rw,seclabel,noatime)
> /dev/nvme0n1p1 on /mnt/runtime/read/emulated type ext4
> (rw,seclabel,noatime)
> /dev/nvme0n1p1 on /mnt/runtime/write/emulated type ext4
> (rw,seclabel,noatime)
> /dev/nvme0n1p1 on /mnt/runtime/full/emulated type ext4
> (rw,seclabel,noatime)
>
> this is esdfs,
> /data/media on /mnt/runtime/default/emulated type esdfs
> (rw,nosuid,nodev,noexec,noatime,lower=1023:1023:664:775,upper=0:1015:771:771,derive=multi,noconfine,derive_gid,default_normal,unshared_obb)
> /data/media on /storage/emulated type esdfs
> (rw,nosuid,nodev,noexec,noatime,lower=1023:1023:664:775,upper=0:1015:771:771,derive=multi,noconfine,derive_gid,default_normal,unshared_obb)
> /data/media on /mnt/runtime/read/emulated type esdfs
> (rw,nosuid,nodev,noexec,noatime,lower=1023:1023:664:775,upper=0:9997:750:750,derive=multi,noconfine,derive_gid,default_normal,unshared_obb)
> /data/media on /mnt/runtime/write/emulated type esdfs
> (rw,nosuid,nodev,noexec,noatime,lower=1023:1023:664:775,upper=0:9997:770:770,derive=multi,noconfine,derive_gid,default_normal,unshared_obb)
> /data/media on /mnt/runtime/full/emulated type esdfs
> (rw,nosuid,nodev,noexec,noatime,lower=1023:1023:664:775,upper=0:9997:770:770,derive=multi,noconfine,derive_gid,default_normal,unshared_obb)
>
> if i do mount bind data/media on storage/emulated, take a screenshot,
> will create
> /data/media/0/Pictures/Screenshots/Screenshot_20230510-130020.png
> file,
> chmod -R 0777 /data/media/0/Pictures/Screenshots,
> on linux 6.3 Revert "ext4: Stop providing .writepage hook", android
> gallery app can read
> /storage/emulated/0/Pictures/Screenshots/Screenshot_20230510-130020.png
> thumbnail,
> linux 6.4 removed .writepage hook, android gallery unable read thumbnail.
>
>
> 2023-05-10 2:36 GMT+08:00, Theodore Ts'o <tytso@mit.edu>:
>> On Mon, May 08, 2023 at 10:02:27PM -0700, Eric Biggers wrote:
>>> On Tue, May 09, 2023 at 01:51:08AM +0800, youling257 wrote:
>>> > I using linux mainline kernel on android.
>>> > https://github.com/youling257/android-mainline/commits/6.4
>>> > https://github.com/youling257/android-mainline/commits/6.3
>>> > "ext4: Stop providing .writepage hook" cause some android app unable
>>> > to
>>> > read storage/emulated/0 files, i need to say android esdfs file system
>>> > storage/emulated is ext4 data/media bind mount.
>>> > I want to ask, why android storage/emulated need .writepage hook?
>>>
>>> "esdfs" doesn't exist upstream, so linux-ext4 can't provide support for
>>> it.
>>>
>>> Also, it doesn't exist in the Android Common Kernels either, so the
>>> Android team
>>> cannot help you either.
>>
>> The problem with esdfs is that it's based on the old stackable file
>> system paradigm which is filled with races and is inherently
>> unreliable (just for fun, try running fsstress on the upper and lower
>> file systems of a stackable file system simultaneously, and watch the
>> kernel crash and burn).  For that reason, some number of us have been
>> working for a while to eliminate the need for stacking file systems,
>> such as sdcardfs. esdfs, etc. from the Android kernel.
>>
>> The other thing I would add is that upstream has been working[1] on
>> getting rid of writepage function.  So out-of-tree file systems are
>> going to need to adapt --- or die.
>>
>> [1] https://lore.kernel.org/all/20221202102644.770505-1-hch@lst.de/
>>
>> It looks like esdfs is coming from the Chromium kernel?  The latest
>> Chromium kernel I can find is 5.15 based, and it has esdfs in it.  I'm
>> sad to see that esdfs hasn't been removed from the Chromium kernel
>> yet, and replaced with something more stable and reliable, but maybe
>> we can find someone who is more familiar with the Chromium kernel to
>> comment.
>>
>> Cheers,
>>
>> 						- Ted
>>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 12/13] ext4: Stop providing .writepage hook
  2023-05-10  5:47           ` youling 257
@ 2023-05-10  6:50             ` Eric Biggers
  2023-05-10 22:00               ` Theodore Ts'o
  0 siblings, 1 reply; 29+ messages in thread
From: Eric Biggers @ 2023-05-10  6:50 UTC (permalink / raw)
  To: youling 257
  Cc: Theodore Ts'o, jack, hch, hch, linux-ext4, ritesh.list, keescook

On Wed, May 10, 2023 at 01:47:58PM +0800, youling 257 wrote:
> I do more test, it is android esdfs or sdcardfs
> /storage/emulated/0/Android/data problem,
> "ext4: Stop providing .writepage hook" cause
> /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.0
> /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.1
> /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.idx
> unable read,
> 
> on linux 6.4, i use mount bind data/media on storage/emulated, chmod
> -R 0777 /data/media/0, rm
> /storage/emulated/0/Android/data/com.android.gallery3d/cache/*, open
> gallery app can read pictures thumbnail,
> /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.idx
> /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.0
> /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.1
> available read.

Maybe try reverting your commit that added esdfs to your kernel?  It should not
be needed at all.

- Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 12/13] ext4: Stop providing .writepage hook
  2023-05-10  6:50             ` Eric Biggers
@ 2023-05-10 22:00               ` Theodore Ts'o
  0 siblings, 0 replies; 29+ messages in thread
From: Theodore Ts'o @ 2023-05-10 22:00 UTC (permalink / raw)
  To: Eric Biggers
  Cc: youling 257, jack, hch, hch, linux-ext4, ritesh.list, keescook

On Tue, May 09, 2023 at 11:50:36PM -0700, Eric Biggers wrote:
> On Wed, May 10, 2023 at 01:47:58PM +0800, youling 257 wrote:
> > I do more test, it is android esdfs or sdcardfs
> > /storage/emulated/0/Android/data problem,
> > "ext4: Stop providing .writepage hook" cause
> > /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.0
> > /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.1
> > /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.idx
> > unable read,
> > 
> > on linux 6.4, i use mount bind data/media on storage/emulated, chmod
> > -R 0777 /data/media/0, rm
> > /storage/emulated/0/Android/data/com.android.gallery3d/cache/*, open
> > gallery app can read pictures thumbnail,
> > /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.idx
> > /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.0
> > /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.1
> > available read.
> 
> Maybe try reverting your commit that added esdfs to your kernel?  It should not
> be needed at all.

Youling, what version of Android are you trying to run with the latest
bleeding edge kernel?  Starting with Android 11, sdcardfs was
deprecated[1].

    SDCardFS is deprecated on devices that launch with Android 11 or
    higher and run kernel version 5.4 or higher. On such devices, VTS
    testing doesn't allow mounted file systems listed as
    SDCardFS. Devices that launch with Android 11 or higher but run
    kernel version 4.19 or lower can continue to use SDCardFS, but
    Google doesn't provide additional support.

[1] https://source.android.com/docs/core/storage/sdcardfs-deprecate

With newer versions of Android, use of something like sdcardfs or
esdfs is not necessary.  If you are using an older version of Android,
and you insist on use a bleeding edge kernel where the writepage is
getting deprecated, then someone will need to update esdfs or deal
with the changing internal interfaces of the upstream kernel.  This is
not the ext4 upstream developer's problem.

Personally, I would recommend that you *not* try to fix esdfs; that's
because stacking file systems like sdcardfs and esdfs are inherently
unreliable.   See the section in [1] entitled, "Why deprecate sdcardfs?".

Instead, what I would recommend is upgrading to a newer version of
Android, and then dropping esdfs from your kernel repository.

	     	  	   	      	   	  - Ted

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-05-10 22:00 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-07 11:27 [PATCH v4 0/13] ext4: Stop using ext4_writepage() for writeout of ordered data Jan Kara
2022-12-07 11:27 ` [PATCH v4 01/13] ext4: Handle redirtying in ext4_bio_write_page() Jan Kara
2022-12-07 11:27 ` [PATCH v4 02/13] ext4: Move keep_towrite handling to ext4_bio_write_page() Jan Kara
2022-12-07 11:27 ` [PATCH v4 03/13] ext4: Remove nr_submitted from ext4_bio_write_page() Jan Kara
2022-12-07 11:27 ` [PATCH v4 04/13] ext4: Drop pointless IO submission " Jan Kara
2022-12-07 11:27 ` [PATCH v4 05/13] ext4: Add support for writepages calls that cannot map blocks Jan Kara
2022-12-07 11:27 ` [PATCH v4 06/13] ext4: Provide ext4_do_writepages() Jan Kara
2022-12-07 11:27 ` [PATCH v4 07/13] ext4: Move percpu_rwsem protection into ext4_writepages() Jan Kara
2022-12-07 11:27 ` [PATCH v4 08/13] ext4: Switch to using ext4_do_writepages() for ordered data writeout Jan Kara
2022-12-07 11:27 ` [PATCH v4 09/13] jbd2: Switch jbd2_submit_inode_data() to use fs-provided hook for " Jan Kara
2022-12-08 15:28   ` Ritesh Harjani (IBM)
2022-12-07 11:27 ` [PATCH v4 10/13] ext4: Switch to using write_cache_pages() for data=journal writeout Jan Kara
2022-12-08 15:40   ` Ritesh Harjani (IBM)
2022-12-08 16:33     ` Jan Kara
2022-12-07 11:27 ` [PATCH v4 11/13] mm: Export buffer_migrate_folio_norefs() Jan Kara
2022-12-07 11:59   ` Christoph Hellwig
2022-12-07 11:27 ` [PATCH v4 12/13] ext4: Stop providing .writepage hook Jan Kara
2023-05-08 17:51   ` youling257
2023-05-09  0:25     ` Jan Kara
2023-05-09  5:02     ` Eric Biggers
2023-05-09 18:36       ` Theodore Ts'o
2023-05-10  5:17         ` youling 257
2023-05-10  5:47           ` youling 257
2023-05-10  6:50             ` Eric Biggers
2023-05-10 22:00               ` Theodore Ts'o
2022-12-07 11:27 ` [PATCH v4 13/13] ext4: Remove ordered data support from ext4_writepage() Jan Kara
2022-12-08 14:23   ` Theodore Ts'o
2022-12-08 15:40     ` Jan Kara
2022-12-08 15:57     ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).