linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Biggers <ebiggers@kernel.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: cluster-devel@redhat.com, linux-kernel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-xfs@vger.kernel.org, linux-mm@kvack.org,
	linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-ext4@vger.kernel.org, linux-erofs@lists.ozlabs.org,
	ocfs2-devel@oss.oracle.com
Subject: Re: [PATCH v6 08/19] mm: Add readahead address space operation
Date: Tue, 18 Feb 2020 19:35:34 -0800	[thread overview]
Message-ID: <20200219033534.GD1075@sol.localdomain> (raw)
In-Reply-To: <20200219031044.GA1075@sol.localdomain>

On Tue, Feb 18, 2020 at 07:10:44PM -0800, Eric Biggers wrote:
> On Mon, Feb 17, 2020 at 10:45:54AM -0800, Matthew Wilcox wrote:
> > diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
> > index 7d4d09dd5e6d..81ab30fbe45c 100644
> > --- a/Documentation/filesystems/vfs.rst
> > +++ b/Documentation/filesystems/vfs.rst
> > @@ -706,6 +706,7 @@ cache in your filesystem.  The following members are defined:
> >  		int (*readpage)(struct file *, struct page *);
> >  		int (*writepages)(struct address_space *, struct writeback_control *);
> >  		int (*set_page_dirty)(struct page *page);
> > +		void (*readahead)(struct readahead_control *);
> >  		int (*readpages)(struct file *filp, struct address_space *mapping,
> >  				 struct list_head *pages, unsigned nr_pages);
> >  		int (*write_begin)(struct file *, struct address_space *mapping,
> > @@ -781,12 +782,24 @@ cache in your filesystem.  The following members are defined:
> >  	If defined, it should set the PageDirty flag, and the
> >  	PAGECACHE_TAG_DIRTY tag in the radix tree.
> >  
> > +``readahead``
> > +	Called by the VM to read pages associated with the address_space
> > +	object.  The pages are consecutive in the page cache and are
> > +	locked.  The implementation should decrement the page refcount
> > +	after starting I/O on each page.  Usually the page will be
> > +	unlocked by the I/O completion handler.  If the function does
> > +	not attempt I/O on some pages, the caller will decrement the page
> > +	refcount and unlock the pages for you.	Set PageUptodate if the
> > +	I/O completes successfully.  Setting PageError on any page will
> > +	be ignored; simply unlock the page if an I/O error occurs.
> > +
> 
> This is unclear about how "not attempting I/O" works and how that affects who is
> responsible for putting and unlocking the pages.  How does the caller know which
> pages were not attempted?  Can any arbitrary subset of pages be not attempted,
> or just the last N pages?
> 
> In the code, the caller actually uses readahead_for_each() to iterate through
> and put+unlock the pages.  That implies that ->readahead() must also use
> readahead_for_each() as well, in order for the iterator to be advanced
> correctly... Right?  And the ownership of each page is transferred to the callee
> when the callee advances the iterator past that page.
> 
> I don't see how ext4_readahead() and f2fs_readahead() can work at all, given
> that they don't advance the iterator.
> 

Yep, this patchset immediately crashes on boot with:

BUG: Bad page state in process swapper/0  pfn:02176
page:ffffea00000751d0 refcount:0 mapcount:0 mapping:ffff88807cba0400 index:0x0
ext4_da_aops name:"systemd"
flags: 0x100000000020001(locked|mappedtodisk)
raw: 0100000000020001 dead000000000100 dead000000000122 ffff88807cba0400
raw: 0000000000000000 0000000000000000 00000000ffffffff
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags: 0x1(locked)
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc2-00019-g7203ed9018cb9 #18
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x7a/0xaa lib/dump_stack.c:118
 bad_page.cold+0x89/0xba mm/page_alloc.c:649
 free_pages_check_bad+0x5d/0x60 mm/page_alloc.c:1050
 free_pages_check mm/page_alloc.c:1059 [inline]
 free_pages_prepare mm/page_alloc.c:1157 [inline]
 free_pcp_prepare+0x1c1/0x200 mm/page_alloc.c:1198
 free_unref_page_prepare mm/page_alloc.c:3011 [inline]
 free_unref_page+0x16/0x70 mm/page_alloc.c:3060
 __put_single_page mm/swap.c:81 [inline]
 __put_page+0x31/0x40 mm/swap.c:115
 put_page include/linux/mm.h:1029 [inline]
 ext4_mpage_readpages+0x778/0x9b0 fs/ext4/readpage.c:405
 ext4_readahead+0x2f/0x40 fs/ext4/inode.c:3242
 read_pages+0x4c/0x200 mm/readahead.c:126
 page_cache_readahead_limit+0x224/0x250 mm/readahead.c:241
 __do_page_cache_readahead mm/readahead.c:266 [inline]
 ra_submit mm/internal.h:62 [inline]
 ondemand_readahead+0x1df/0x4d0 mm/readahead.c:544
 page_cache_sync_readahead+0x2d/0x40 mm/readahead.c:579
 generic_file_buffered_read+0x77e/0xa90 mm/filemap.c:2029
 generic_file_read_iter+0xd4/0x130 mm/filemap.c:2302
 ext4_file_read_iter fs/ext4/file.c:131 [inline]
 ext4_file_read_iter+0x53/0x180 fs/ext4/file.c:114
 call_read_iter include/linux/fs.h:1897 [inline]
 new_sync_read+0x113/0x1a0 fs/read_write.c:414
 __vfs_read+0x21/0x30 fs/read_write.c:427
 vfs_read+0xcb/0x160 fs/read_write.c:461
 kernel_read+0x2c/0x40 fs/read_write.c:440
 prepare_binprm+0x14f/0x190 fs/exec.c:1589
 __do_execve_file.isra.0+0x4c0/0x800 fs/exec.c:1806
 do_execveat_common fs/exec.c:1871 [inline]
 do_execve+0x20/0x30 fs/exec.c:1888
 run_init_process+0xc8/0xcd init/main.c:1279
 try_to_run_init_process+0x10/0x36 init/main.c:1288
 kernel_init+0xac/0xfd init/main.c:1385
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Disabling lock debugging due to kernel taint
page:ffffea00000751d0 refcount:0 mapcount:0 mapping:ffff88807cba0400 index:0x0
ext4_da_aops name:"systemd"
flags: 0x100000000020001(locked|mappedtodisk)
raw: 0100000000020001 dead000000000100 dead000000000122 ffff88807cba0400
raw: 0000000000000000 0000000000000000 00000000ffffffff
page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)


I had to add:

diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index e14841ade6122..cb982088b5225 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -401,8 +401,10 @@ int ext4_mpage_readpages(struct address_space *mapping,
 		else
 			unlock_page(page);
 	next_page:
-		if (rac)
+		if (rac) {
 			put_page(page);
+			readahead_next(rac);
+		}
 	}
 	if (bio)
 		submit_bio(bio);
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 87964e4cb6b81..e16b0fe42e2e5 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2238,8 +2238,10 @@ int f2fs_mpage_readpages(struct inode *inode, struct readahead_control *rac,
 #ifdef CONFIG_F2FS_FS_COMPRESSION
 next_page:
 #endif
-		if (rac)
+		if (rac) {
 			put_page(page);
+			readahead_next(rac);
+		}
 
 #ifdef CONFIG_F2FS_FS_COMPRESSION
 		if (f2fs_compressed_file(inode)) {



  reply	other threads:[~2020-02-19  3:35 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-17 18:45 [PATCH v6 00/19] Change readahead API Matthew Wilcox
2020-02-17 18:45 ` [PATCH v6 01/19] mm: Return void from various readahead functions Matthew Wilcox
2020-02-18  4:47   ` Dave Chinner
2020-02-18 21:05   ` John Hubbard
2020-02-18 21:21     ` Matthew Wilcox
2020-02-18 21:52       ` John Hubbard
2020-02-17 18:45 ` [PATCH v6 02/19] mm: Ignore return value of ->readpages Matthew Wilcox
2020-02-18  4:48   ` Dave Chinner
2020-02-18 21:33   ` John Hubbard
2020-02-17 18:45 ` [PATCH v6 03/19] mm: Use readahead_control to pass arguments Matthew Wilcox
2020-02-18  5:03   ` Dave Chinner
2020-02-18 13:56     ` Matthew Wilcox
2020-02-18 22:46       ` Dave Chinner
2020-02-18 22:52         ` Matthew Wilcox
2020-02-18 22:22   ` John Hubbard
2020-02-17 18:45 ` [PATCH v6 04/19] mm: Rearrange readahead loop Matthew Wilcox
2020-02-18  5:08   ` Dave Chinner
2020-02-18 13:57     ` Matthew Wilcox
2020-02-18 22:48       ` Dave Chinner
2020-02-18 22:33   ` John Hubbard
2020-02-17 18:45 ` [PATCH v6 04/16] mm: Tweak readahead loop slightly Matthew Wilcox
2020-02-18 22:57   ` John Hubbard
2020-02-18 23:00     ` John Hubbard
2020-02-17 18:45 ` [PATCH v6 05/16] mm: Put readahead pages in cache earlier Matthew Wilcox
2020-02-17 18:45 ` [PATCH v6 05/19] mm: Remove 'page_offset' from readahead loop Matthew Wilcox
2020-02-18  5:14   ` Dave Chinner
2020-02-18 23:08   ` John Hubbard
2020-02-17 18:45 ` [PATCH v6 06/16] mm: Add readahead address space operation Matthew Wilcox
2020-02-17 18:45 ` [PATCH v6 06/19] mm: rename readahead loop variable to 'i' Matthew Wilcox
2020-02-18  5:33   ` Dave Chinner
2020-02-18 23:11   ` John Hubbard
2020-02-17 18:45 ` [PATCH v6 07/16] mm: Add page_cache_readahead_limit Matthew Wilcox
2020-02-17 18:45 ` [PATCH v6 07/19] mm: Put readahead pages in cache earlier Matthew Wilcox
2020-02-18  6:14   ` Dave Chinner
2020-02-18 15:42     ` Matthew Wilcox
2020-02-19  0:59       ` Dave Chinner
2020-02-19  0:01   ` John Hubbard
2020-02-19  1:02     ` Matthew Wilcox
2020-02-19  1:13       ` John Hubbard
2020-02-19  3:24       ` John Hubbard
2020-02-19 14:41     ` Matthew Wilcox
2020-02-19 14:52       ` Christoph Hellwig
2020-02-19 15:01         ` Matthew Wilcox
2020-02-19 20:24           ` John Hubbard
2020-02-17 18:45 ` [PATCH v6 08/16] fs: Convert mpage_readpages to mpage_readahead Matthew Wilcox
2020-02-17 18:45 ` [PATCH v6 08/19] mm: Add readahead address space operation Matthew Wilcox
2020-02-18  6:21   ` Dave Chinner
2020-02-18 16:10     ` Matthew Wilcox
2020-02-19  1:04       ` Dave Chinner
2020-02-19  0:12   ` John Hubbard
2020-02-19  3:10   ` Eric Biggers
2020-02-19  3:35     ` Eric Biggers [this message]
2020-02-19 16:52     ` Matthew Wilcox
2020-02-17 18:45 ` [PATCH v6 09/16] btrfs: Convert from readpages to readahead Matthew Wilcox
2020-02-17 18:45 ` [PATCH v6 09/19] mm: Add page_cache_readahead_limit Matthew Wilcox
2020-02-18  6:31   ` Dave Chinner
2020-02-18 19:54     ` Matthew Wilcox
2020-02-19  1:08       ` Dave Chinner
2020-02-19  1:32   ` John Hubbard
2020-02-19  2:23     ` Matthew Wilcox
2020-02-19  2:46       ` John Hubbard
2020-02-17 18:45 ` [PATCH v6 10/16] erofs: Convert uncompressed files from readpages to readahead Matthew Wilcox
2020-02-17 18:45 ` [PATCH v6 10/19] fs: Convert mpage_readpages to mpage_readahead Matthew Wilcox
2020-02-18  1:51   ` [Ocfs2-devel] " Joseph Qi
2020-02-18  6:37   ` Dave Chinner
2020-02-19  2:48   ` John Hubbard
2020-02-19  3:28   ` Eric Biggers
2020-02-19  3:47     ` Matthew Wilcox
2020-02-19  3:55       ` Eric Biggers
2020-02-17 18:45 ` [PATCH v6 11/19] btrfs: Convert from readpages to readahead Matthew Wilcox
2020-02-18  6:57   ` Dave Chinner
2020-02-18 21:12     ` Matthew Wilcox
2020-02-19  1:23       ` Dave Chinner
2020-02-17 18:46 ` [PATCH v6 11/16] erofs: Convert compressed files " Matthew Wilcox
2020-02-19  2:34   ` Gao Xiang
2020-02-17 18:46 ` [PATCH v6 12/19] erofs: Convert uncompressed " Matthew Wilcox
2020-02-19  2:39   ` Gao Xiang
2020-02-19  3:04   ` Dave Chinner
2020-02-17 18:46 ` [PATCH v6 12/16] ext4: Convert " Matthew Wilcox
2020-02-17 18:46 ` [PATCH v6 13/19] erofs: Convert compressed files " Matthew Wilcox
2020-02-19  3:08   ` Dave Chinner
2020-02-17 18:46 ` [PATCH v6 13/16] f2fs: Convert " Matthew Wilcox
2020-02-17 18:46 ` [PATCH v6 14/19] ext4: " Matthew Wilcox
2020-02-19  3:16   ` Dave Chinner
2020-02-19  3:29   ` Eric Biggers
2020-02-17 18:46 ` [PATCH v6 14/16] fuse: " Matthew Wilcox
2020-02-17 18:46 ` [PATCH v6 15/19] f2fs: " Matthew Wilcox
2020-02-17 18:46 ` [PATCH v6 15/16] iomap: " Matthew Wilcox
2020-02-17 18:46 ` [PATCH v6 16/19] fuse: " Matthew Wilcox
2020-02-19  3:22   ` Dave Chinner
2020-02-17 18:46 ` [PATCH v6 16/16] mm: Use memalloc_nofs_save in readahead path Matthew Wilcox
2020-02-17 18:46 ` [PATCH v6 17/19] iomap: Restructure iomap_readpages_actor Matthew Wilcox
2020-02-19  3:17   ` John Hubbard
2020-02-19  5:35     ` Matthew Wilcox
2020-02-19  3:29   ` Dave Chinner
2020-02-19  6:04     ` Matthew Wilcox
2020-02-19  6:40       ` Dave Chinner
2020-02-19 17:06         ` Matthew Wilcox
2020-02-17 18:46 ` [PATCH v6 18/19] iomap: Convert from readpages to readahead Matthew Wilcox
2020-02-19  3:40   ` Dave Chinner
2020-02-17 18:46 ` [PATCH v6 19/19] mm: Use memalloc_nofs_save in readahead path Matthew Wilcox
2020-02-19  3:43   ` Dave Chinner
2020-02-19  5:22     ` Matthew Wilcox
2020-02-17 18:48 ` [PATCH v6 00/19] Change readahead API Matthew Wilcox
2020-02-18  4:56 ` Dave Chinner
2020-02-18 13:42   ` Matthew Wilcox
2020-02-18 21:26     ` Dave Chinner
2020-02-19  3:45       ` Dave Chinner
2020-02-19  3:48         ` Matthew Wilcox
2020-02-19  3:57           ` Dave Chinner
2020-02-18 20:49 ` John Hubbard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200219033534.GD1075@sol.localdomain \
    --to=ebiggers@kernel.org \
    --cc=cluster-devel@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ocfs2-devel@oss.oracle.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).