Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] mm/readahead: Handle ractl nr_pages being modified
@ 2021-04-20 20:01 Matthew Wilcox (Oracle)
  2021-04-20 20:06 ` Matthew Wilcox
  2021-04-20 20:12 ` Jeff Layton
  0 siblings, 2 replies; 7+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-04-20 20:01 UTC (permalink / raw)
  To: David Howells, Jeff Layton, linux-fsdevel; +Cc: Matthew Wilcox (Oracle)

The BUG_ON that checks whether the ractl is still in sync with the
local variables can trigger under some fairly unusual circumstances.
Remove the BUG_ON and resync the loop counter after every call to
read_pages().

One way I've seen to trigger it is:

 - Start out with a partially populated range in the page cache
 - Allocate some pages and run into an existing page
 - Send the read request off to the filesystem
 - The page we ran into is removed from the page cache
 - readahead_expand() succeeds in expanding upwards
 - Return to page_cache_ra_unbounded() and we hit the BUG_ON, as nr_pages
   has been adjusted upwards.

Reported-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/readahead.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index f02dbebf1cef..989a8e710100 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -198,8 +198,6 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
 	for (i = 0; i < nr_to_read; i++) {
 		struct page *page = xa_load(&mapping->i_pages, index + i);
 
-		BUG_ON(index + i != ractl->_index + ractl->_nr_pages);
-
 		if (page && !xa_is_value(page)) {
 			/*
 			 * Page already present?  Kick off the current batch
@@ -210,6 +208,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
 			 * not worth getting one just for that.
 			 */
 			read_pages(ractl, &page_pool, true);
+			i = ractl->_index + ractl->_nr_pages - index;
 			continue;
 		}
 
@@ -223,6 +222,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
 					gfp_mask) < 0) {
 			put_page(page);
 			read_pages(ractl, &page_pool, true);
+			i = ractl->_index + ractl->_nr_pages - index;
 			continue;
 		}
 		if (i == nr_to_read - lookahead_size)
-- 
2.30.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/readahead: Handle ractl nr_pages being modified
  2021-04-20 20:01 [PATCH] mm/readahead: Handle ractl nr_pages being modified Matthew Wilcox (Oracle)
@ 2021-04-20 20:06 ` Matthew Wilcox
  2021-04-20 20:12 ` Jeff Layton
  1 sibling, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2021-04-20 20:06 UTC (permalink / raw)
  To: David Howells, Jeff Layton, linux-fsdevel

On Tue, Apr 20, 2021 at 09:01:15PM +0100, Matthew Wilcox (Oracle) wrote:
> The BUG_ON that checks whether the ractl is still in sync with the
> local variables can trigger under some fairly unusual circumstances.
> Remove the BUG_ON and resync the loop counter after every call to
> read_pages().
> 
> One way I've seen to trigger it is:
> 
>  - Start out with a partially populated range in the page cache
>  - Allocate some pages and run into an existing page
>  - Send the read request off to the filesystem
>  - The page we ran into is removed from the page cache
>  - readahead_expand() succeeds in expanding upwards
>  - Return to page_cache_ra_unbounded() and we hit the BUG_ON, as nr_pages
>    has been adjusted upwards.

(nb: this has only been reported for a kernel which has readahead_expand().
there is no indication this BUG_ON can be hit by a released kernel)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/readahead: Handle ractl nr_pages being modified
  2021-04-20 20:01 [PATCH] mm/readahead: Handle ractl nr_pages being modified Matthew Wilcox (Oracle)
  2021-04-20 20:06 ` Matthew Wilcox
@ 2021-04-20 20:12 ` Jeff Layton
  2021-04-20 21:03   ` Matthew Wilcox
  2021-04-21 10:34   ` David Howells
  1 sibling, 2 replies; 7+ messages in thread
From: Jeff Layton @ 2021-04-20 20:12 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), David Howells, linux-fsdevel

On Tue, 2021-04-20 at 21:01 +0100, Matthew Wilcox (Oracle) wrote:
> The BUG_ON that checks whether the ractl is still in sync with the
> local variables can trigger under some fairly unusual circumstances.
> Remove the BUG_ON and resync the loop counter after every call to
> read_pages().
> 
> One way I've seen to trigger it is:
> 
>  - Start out with a partially populated range in the page cache
>  - Allocate some pages and run into an existing page
>  - Send the read request off to the filesystem
>  - The page we ran into is removed from the page cache
>  - readahead_expand() succeeds in expanding upwards
>  - Return to page_cache_ra_unbounded() and we hit the BUG_ON, as nr_pages
>    has been adjusted upwards.
> 
> Reported-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/readahead.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/readahead.c b/mm/readahead.c
> index f02dbebf1cef..989a8e710100 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -198,8 +198,6 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
>  	for (i = 0; i < nr_to_read; i++) {
>  		struct page *page = xa_load(&mapping->i_pages, index + i);
>  
> 
> 
> 
> -		BUG_ON(index + i != ractl->_index + ractl->_nr_pages);
> -
>  		if (page && !xa_is_value(page)) {
>  			/*
>  			 * Page already present?  Kick off the current batch
> @@ -210,6 +208,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
>  			 * not worth getting one just for that.
>  			 */
>  			read_pages(ractl, &page_pool, true);
> +			i = ractl->_index + ractl->_nr_pages - index;
>  			continue;
>  		}
>  
> 
> 
> 
> @@ -223,6 +222,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
>  					gfp_mask) < 0) {
>  			put_page(page);
>  			read_pages(ractl, &page_pool, true);
> +			i = ractl->_index + ractl->_nr_pages - index;
>  			continue;
>  		}
>  		if (i == nr_to_read - lookahead_size)

Thanks Willy, but I think this may not be quite right. A kernel with
this patch failed to boot for me:

[  OK  ] Reached target Basic System.
[   17.431421] virtio_net virtio1 enp1s0: renamed from eth0
[   17.453001] page:00000000d076b336 refcount:2 mapcount:0 mapping:00000000fa98b961 index:0x4 pfn:0x100ff8
[   17.454762] memcg:ffff888115934000
[   17.455337] aops:def_blk_aops ino:fc00000
[   17.456163] flags: 0x17ffffc0020014(uptodate|lru|mappedtodisk)
[   17.457239] raw: 0017ffffc0020014 ffffea0004030048 ffffea0004045f08 ffff8881064d95a0
[   17.458628] raw: 0000000000000004 0000000000000000 00000002ffffffff ffff888115934000
[   17.460032] page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
[   17.461149] ------------[ cut here ]------------
[   17.462070] kernel BUG at include/linux/pagemap.h:912!
[   17.463027] invalid opcode: 0000 [#1] SMP KASAN NOPTI
[   17.463881] CPU: 15 PID: 491 Comm: systemd-udevd Tainted: G            E   T 5.12.0-rc4+ #96
[   17.465205] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2.fc34 04/01/2014
[   17.466549] RIP: 0010:mpage_readahead+0x39e/0x3e0
[   17.472766] Code: f6 fe ff ff a8 01 48 c7 c6 c0 cf b8 a9 4c 0f 45 e2 4c 89 e7 e8 a3 9f e7 ff 0f 0b 48 c7 c6 00 d1 b8 a9 4c 89 e7 e8 92 9f e7 ff <0f> 0b 48 c7 c6 20 cf b8 a9 4c 89 e7 e8 81 9f e7 ff 0f 0b 48 c7 c6
[   17.472772] RSP: 0018:ffff8881202c7718 EFLAGS: 00010292
[   17.472777] RAX: 0000000000000000 RBX: ffff8881202c7b50 RCX: 0000000000000000
[   17.472781] RDX: 1ffff110840bd851 RSI: 0000000000000008 RDI: ffffed1024058e80
[   17.472784] RBP: ffffea000403fe08 R08: 0000000000000036 R09: ffff8884205f57a7
[   17.472787] R10: ffffed10840beaf4 R11: 0000000000000001 R12: ffffea000403fe00
[   17.472790] R13: ffff8881202c7b70 R14: ffff8881202c7b74 R15: ffffea000403fe00
[   17.522835] FS:  00007f15cb4c2380(0000) GS:ffff888420400000(0000) knlGS:0000000000000000
[   17.522841] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   17.522845] CR2: 00007f15cb46e000 CR3: 00000001202e0000 CR4: 00000000003506e0
[   17.522850] Call Trace:
[   17.522856]  ? do_mpage_readpage+0xd80/0xd80
[   17.522873]  ? bdev_disk_changed+0x1d0/0x1d0
[   17.557948]  ? lock_release+0x1e1/0x6b0
[   17.557958]  ? lock_downgrade+0x360/0x360
[   17.557964]  read_pages+0x115/0x3e0
[   17.557972]  ? readahead_expand+0x3a0/0x3a0
[   17.557978]  ? __xa_clear_mark+0xc0/0xc0
[   17.557987]  page_cache_ra_unbounded+0x289/0x420
[   17.590717]  ? read_pages+0x3e0/0x3e0
[   17.590737]  force_page_cache_ra+0x1ae/0x230
[   17.590755]  filemap_get_pages+0x1bf/0xb20
[   17.606466]  ? copy_user_generic_string+0x2c/0x40
[   17.606476]  ? __lock_page_async+0x200/0x200
[   17.606480]  ? copyout+0x7e/0xa0
[   17.606489]  filemap_read+0x195/0x6d0
[   17.606497]  ? filemap_get_pages+0xb20/0xb20
[   17.631844]  ? kvm_sched_clock_read+0x14/0x30
[   17.631852]  ? sched_clock+0x5/0x10
[   17.631858]  ? sched_clock_cpu+0x18/0x110
[   17.631864]  ? __lock_acquire+0x88d/0x2cd0
[   17.631870]  ? generic_file_read_iter+0x3c/0x220
[   17.656724]  new_sync_read+0x257/0x360
[   17.661213]  ? __ia32_sys_llseek+0x1d0/0x1d0
[   17.665642]  ? __cond_resched+0x15/0x30
[   17.669896]  ? inode_security+0x6f/0x90
[   17.674187]  ? avc_policy_seqno+0x28/0x30
[   17.678458]  vfs_read+0x22b/0x290
[   17.682522]  ksys_read+0xb1/0x140




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/readahead: Handle ractl nr_pages being modified
  2021-04-20 20:12 ` Jeff Layton
@ 2021-04-20 21:03   ` Matthew Wilcox
  2021-04-21 16:49     ` Jeff Layton
  2021-04-21 10:34   ` David Howells
  1 sibling, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2021-04-20 21:03 UTC (permalink / raw)
  To: Jeff Layton; +Cc: David Howells, linux-fsdevel

On Tue, Apr 20, 2021 at 04:12:57PM -0400, Jeff Layton wrote:
> > @@ -210,6 +208,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
> >  			 * not worth getting one just for that.
> >  			 */
> >  			read_pages(ractl, &page_pool, true);
> > +			i = ractl->_index + ractl->_nr_pages - index;

			i = ractl->_index + ractl->_nr_pages - index - 1;

> > @@ -223,6 +222,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
> >  					gfp_mask) < 0) {
> >  			put_page(page);
> >  			read_pages(ractl, &page_pool, true);
> > +			i = ractl->_index + ractl->_nr_pages - index;

			i = ractl->_index + ractl->_nr_pages - index - 1;

> Thanks Willy, but I think this may not be quite right. A kernel with
> this patch failed to boot for me:

Silly off-by-one errors.  xfstests running against xfs is up to generic/278
with the off-by-one fixed.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/readahead: Handle ractl nr_pages being modified
  2021-04-20 20:12 ` Jeff Layton
  2021-04-20 21:03   ` Matthew Wilcox
@ 2021-04-21 10:34   ` David Howells
  2021-04-21 12:27     ` Matthew Wilcox
  1 sibling, 1 reply; 7+ messages in thread
From: David Howells @ 2021-04-21 10:34 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: dhowells, Jeff Layton, linux-fsdevel

Matthew Wilcox <willy@infradead.org> wrote:

> On Tue, Apr 20, 2021 at 04:12:57PM -0400, Jeff Layton wrote:
> > > @@ -210,6 +208,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
> > >  			 * not worth getting one just for that.
> > >  			 */
> > >  			read_pages(ractl, &page_pool, true);
> > > +			i = ractl->_index + ractl->_nr_pages - index;
> 
> 			i = ractl->_index + ractl->_nr_pages - index - 1;
> 
> > > @@ -223,6 +222,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
> > >  					gfp_mask) < 0) {
> > >  			put_page(page);
> > >  			read_pages(ractl, &page_pool, true);
> > > +			i = ractl->_index + ractl->_nr_pages - index;
> 
> 			i = ractl->_index + ractl->_nr_pages - index - 1;
> 
> > Thanks Willy, but I think this may not be quite right. A kernel with
> > this patch failed to boot for me:
> 
> Silly off-by-one errors.  xfstests running against xfs is up to generic/278
> with the off-by-one fixed.

You can add my Tested-by - or do you want me to add it to my patchset?

David


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/readahead: Handle ractl nr_pages being modified
  2021-04-21 10:34   ` David Howells
@ 2021-04-21 12:27     ` Matthew Wilcox
  0 siblings, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2021-04-21 12:27 UTC (permalink / raw)
  To: David Howells; +Cc: Jeff Layton, linux-fsdevel

On Wed, Apr 21, 2021 at 11:34:44AM +0100, David Howells wrote:
> Matthew Wilcox <willy@infradead.org> wrote:
> 
> > On Tue, Apr 20, 2021 at 04:12:57PM -0400, Jeff Layton wrote:
> > > > @@ -210,6 +208,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
> > > >  			 * not worth getting one just for that.
> > > >  			 */
> > > >  			read_pages(ractl, &page_pool, true);
> > > > +			i = ractl->_index + ractl->_nr_pages - index;
> > 
> > 			i = ractl->_index + ractl->_nr_pages - index - 1;
> > 
> > > > @@ -223,6 +222,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
> > > >  					gfp_mask) < 0) {
> > > >  			put_page(page);
> > > >  			read_pages(ractl, &page_pool, true);
> > > > +			i = ractl->_index + ractl->_nr_pages - index;
> > 
> > 			i = ractl->_index + ractl->_nr_pages - index - 1;
> > 
> > > Thanks Willy, but I think this may not be quite right. A kernel with
> > > this patch failed to boot for me:
> > 
> > Silly off-by-one errors.  xfstests running against xfs is up to generic/278
> > with the off-by-one fixed.
> 
> You can add my Tested-by - or do you want me to add it to my patchset?

I think you need it as part of your patchset, ordered before
readahead_expand().  It probably needs a rewritten description ...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/readahead: Handle ractl nr_pages being modified
  2021-04-20 21:03   ` Matthew Wilcox
@ 2021-04-21 16:49     ` Jeff Layton
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2021-04-21 16:49 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: David Howells, linux-fsdevel

On Tue, 2021-04-20 at 22:03 +0100, Matthew Wilcox wrote:
> On Tue, Apr 20, 2021 at 04:12:57PM -0400, Jeff Layton wrote:
> > > @@ -210,6 +208,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
> > >  			 * not worth getting one just for that.
> > >  			 */
> > >  			read_pages(ractl, &page_pool, true);
> > > +			i = ractl->_index + ractl->_nr_pages - index;
> 
> 			i = ractl->_index + ractl->_nr_pages - index - 1;
> 
> > > @@ -223,6 +222,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
> > >  					gfp_mask) < 0) {
> > >  			put_page(page);
> > >  			read_pages(ractl, &page_pool, true);
> > > +			i = ractl->_index + ractl->_nr_pages - index;
> 
> 			i = ractl->_index + ractl->_nr_pages - index - 1;
> 
> > Thanks Willy, but I think this may not be quite right. A kernel with
> > this patch failed to boot for me:
> 
> Silly off-by-one errors.  xfstests running against xfs is up to generic/278
> with the off-by-one fixed.


It worked fine with that change in place. You can add:

Tested-by: Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-20 20:01 [PATCH] mm/readahead: Handle ractl nr_pages being modified Matthew Wilcox (Oracle)
2021-04-20 20:06 ` Matthew Wilcox
2021-04-20 20:12 ` Jeff Layton
2021-04-20 21:03   ` Matthew Wilcox
2021-04-21 16:49     ` Jeff Layton
2021-04-21 10:34   ` David Howells
2021-04-21 12:27     ` Matthew Wilcox

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git