linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: willy@infradead.org, hughd@google.com, cfijalkovich@google.com,
	song@kernel.org, akpm@linux-foundation.org
Cc: shy828301@gmail.com, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH] fs: buffer: check huge page size instead of single page for invalidatepage
Date: Fri, 17 Sep 2021 13:57:31 -0700	[thread overview]
Message-ID: <20210917205731.262693-1-shy828301@gmail.com> (raw)

Hao Sun reported the below BUG on v5.15-rc1 kernel:

kernel BUG at fs/buffer.c:1510!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.14.0+ #15
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
Workqueue: events delayed_fput
RIP: 0010:block_invalidatepage+0x27f/0x2a0 -origin/fs/buffer.c:1510
Code: ff ff e8 b4 07 d7 ff b9 02 00 00 00 be 02 00 00 00 4c 89 ff 48
c7 c2 40 4e 25 84 e8 2b c2 c4 02 e9 c9 fe ff ff e8 91 07 d7 ff <0f> 0b
e8 8a 07 d7 ff 0f 0b e8 83 07 d7 ff 48 8d 5d ff e9 57 ff ff
RSP: 0018:ffffc9000065bb60 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffffea0000670000 RCX: 0000000000000000
RDX: ffff8880097fa240 RSI: ffffffff81608a9f RDI: ffffea0000670000
RBP: ffffea0000670000 R08: 0000000000000001 R09: 0000000000000000
R10: ffffc9000065b9f8 R11: 0000000000000003 R12: ffffffff81608820
R13: ffffc9000065bc68 R14: 0000000000000000 R15: ffffc9000065bbf0
FS:  0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4aef93fb08 CR3: 0000000108cf2000 CR4: 0000000000750ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 do_invalidatepage -origin/mm/truncate.c:157 [inline]
 truncate_cleanup_page+0x15c/0x280 -origin/mm/truncate.c:176
 truncate_inode_pages_range+0x169/0xc30 -origin/mm/truncate.c:325
 kill_bdev.isra.29+0x28/0x30
 blkdev_flush_mapping+0x4c/0x130 -origin/block/bdev.c:658
 blkdev_put_whole+0x54/0x60 -origin/block/bdev.c:689
 blkdev_put+0x6f/0x210 -origin/block/bdev.c:953
 blkdev_close+0x25/0x30 -origin/block/fops.c:459
 __fput+0xdf/0x380 -origin/fs/file_table.c:280
 delayed_fput+0x25/0x40 -origin/fs/file_table.c:308
 process_one_work+0x359/0x850 -origin/kernel/workqueue.c:2297
 worker_thread+0x41/0x4d0 -origin/kernel/workqueue.c:2444
 kthread+0x178/0x1b0 -origin/kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 -origin/arch/x86/entry/entry_64.S:295
Modules linked in:
Dumping ftrace buffer:
   (ftrace buffer empty)
---[ end trace 9dbb8f58f2109f10 ]---
RIP: 0010:block_invalidatepage+0x27f/0x2a0 -origin/fs/buffer.c:1510
Code: ff ff e8 b4 07 d7 ff b9 02 00 00 00 be 02 00 00 00 4c 89 ff 48
c7 c2 40 4e 25 84 e8 2b c2 c4 02 e9 c9 fe ff ff e8 91 07 d7 ff <0f> 0b
e8 8a 07 d7 ff 0f 0b e8 83 07 d7 ff 48 8d 5d ff e9 57 ff ff
RSP: 0018:ffffc9000065bb60 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffffea0000670000 RCX: 0000000000000000
RDX: ffff8880097fa240 RSI: ffffffff81608a9f RDI: ffffea0000670000
RBP: ffffea0000670000 R08: 0000000000000001 R09: 0000000000000000
R10: ffffc9000065b9f8 R11: 0000000000000003 R12: ffffffff81608820
R13: ffffc9000065bc68 R14: 0000000000000000 R15: ffffc9000065bbf0
FS:  0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff98674f000 CR3: 0000000106b2e000 CR4: 0000000000750ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554

The debugging showed the page passed to invalidatepage is a huge page
and the length is the size of huge page instead of single page due to
read only FS THP support.  But block_invalidatepage() would throw BUG if
the size is greater than single page.

However there is actually a bigger problem in invalidatepage().  All the
implementations are *NOT* THP aware and hardcoded PAGE_SIZE.  Some triggers
BUG(), like block_invalidatepage(), some just returns error if length is
greater than PAGE_SIZE.

Converting PAGE_SIZE to thp_size() actually is not enough since the actual
invalidation part just assumes single page is passed in.  Since other subpages
may have private too because PG_private is per subpage so there may be
multiple subpages have private.  This may prevent the THP from splitting
and reclaiming since the extra refcount pins from private of subpages.

The complete fix seems not trivial and involve how to deal with huge
page in page cache.  So the scope of this patch is limited to close the
BUG at the moment.

Fixes: eb6ecbed0aa2 ("mm, thp: relax the VM_DENYWRITE constraint on file-backed THPs")
Reported-by: Hao Sun <sunhao.th@gmail.com>
Tested-by: Hao Sun <sunhao.th@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 fs/buffer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index ab7573d72dd7..4bcb54c4d1be 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1507,7 +1507,7 @@ void block_invalidatepage(struct page *page, unsigned int offset,
 	/*
 	 * Check for overflow
 	 */
-	BUG_ON(stop > PAGE_SIZE || stop < length);
+	BUG_ON(stop > thp_size(page) || stop < length);
 
 	head = page_buffers(page);
 	bh = head;
@@ -1535,7 +1535,7 @@ void block_invalidatepage(struct page *page, unsigned int offset,
 	 * The get_block cached value has been unconditionally invalidated,
 	 * so real IO is not possible anymore.
 	 */
-	if (length == PAGE_SIZE)
+	if (length >= PAGE_SIZE)
 		try_to_release_page(page, 0);
 out:
 	return;
-- 
2.26.2



             reply	other threads:[~2021-09-17 20:57 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-17 20:57 Yang Shi [this message]
2021-09-18  0:07 ` [PATCH] fs: buffer: check huge page size instead of single page for invalidatepage Yang Shi
2021-09-19 14:40   ` Matthew Wilcox
2021-09-20 21:23     ` Yang Shi
2021-09-20 21:50       ` Matthew Wilcox
2021-09-20 22:35         ` Yang Shi
2021-10-11 19:57           ` Yang Shi
2021-10-20 23:38             ` Yang Shi
2021-10-20 23:49               ` Matthew Wilcox
2021-10-21  0:24                 ` Yang Shi
2021-10-21  1:36                   ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210917205731.262693-1-shy828301@gmail.com \
    --to=shy828301@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cfijalkovich@google.com \
    --cc=hughd@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=song@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).