linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Edwards <gedwards@ddn.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Hugh Dickins <hughd@google.com>,
	linux-mm@kvack.org, linux-block@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org, qemu-devel@nongnu.org
Subject: Re: [PATCH] block: Remove special-casing of compound pages
Date: Thu, 29 Feb 2024 11:25:13 -0700	[thread overview]
Message-ID: <20240229182513.GA17355@bobdog.home.arpa> (raw)
In-Reply-To: <170198306635.1954272.10907610290128291539.b4-ty@kernel.dk>

On Thu, Dec 07, 2023 at 02:04:26PM -0700, Jens Axboe wrote:
> On Mon, 14 Aug 2023 15:41:00 +0100, Matthew Wilcox (Oracle) wrote:
>> The special casing was originally added in pre-git history; reproducing
>> the commit log here:
>>
>>> commit a318a92567d77
>>> Author: Andrew Morton <akpm@osdl.org>
>>> Date:   Sun Sep 21 01:42:22 2003 -0700
>>>
>>>     [PATCH] Speed up direct-io hugetlbpage handling
>>>
>>>     This patch short-circuits all the direct-io page dirtying logic for
>>>     higher-order pages.  Without this, we pointlessly bounce BIOs up to
>>>     keventd all the time.
>>
>> [...]
>
> Applied, thanks!
>
> [1/1] block: Remove special-casing of compound pages
>       commit: 1b151e2435fc3a9b10c8946c6aebe9f3e1938c55

This commit results in a change of behavior for QEMU VMs backed by hugepages
that open their VM disk image file with O_DIRECT (QEMU cache=none or
cache.direct=on options).  When the VM shuts down and the QEMU process exits,
one or two hugepages may fail to free correctly.  It appears to be a race, as
it doesn't happen every time.

From debugging on 6.8-rc6, when it occurs, the hugepage that fails to free has
a non-zero refcount when it hits the folio_put_testzero(folio) test in
release_pages().  On a failure test iteration with 1 GiB hugepages, the failing
folio had a mapcount of 0, refcount of 35, and folio_maybe_dma_pinned was true.

The problem only occurs when the VM disk image file is opened with O_DIRECT.
When using QEMU cache=writeback or cache.direct=off options, it does not occur.
We first noticed it on the 6.1.y stable kernel when this commit landed there
(6.1.75).

A very simple reproducer without KVM (just boot VM up, then shut it down):

echo 512 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
qemu-system-x86_64 \
	-cpu qemu64 \
	-m 1024 \
	-nographic \
	-mem-path /dev/hugepages/vm00 \
	-mem-prealloc \
	-drive file=test.qcow2,if=none,cache=none,id=drive0 \
	-device virtio-blk-pci,drive=drive0,id=disk0,bootindex=1
rm -f /dev/hugepages/vm00

Some testing notes:

  * occurs with 6.1.75, 6.6.14, 6.8-rc6, and linux-next-20240229
  * occurs with 1 GiB and 2 MiB huge pages, with both hugetlbfs and memfd
  * occurs with QEMU 8.0.y, 8.1.y, 8.2.y, and master
  * occurs with (-enable-kvm -cpu host) or without (-cpu qemu64) KVM

Thanks for your time!

Greg

  reply	other threads:[~2024-02-29 18:25 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-14 14:41 [PATCH] block: Remove special-casing of compound pages Matthew Wilcox (Oracle)
2023-08-14 14:48 ` Hannes Reinecke
2023-08-16 17:03 ` Fix rare user data corruption when using THP Matthew Wilcox
2023-08-16 20:27 ` [PATCH] block: Remove special-casing of compound pages Hugh Dickins
2023-09-15 14:21   ` Matthew Wilcox
2023-09-15 22:48     ` Hugh Dickins
2023-12-07 21:04 ` Jens Axboe
2024-02-29 18:25   ` Greg Edwards [this message]
2024-02-29 19:37     ` Matthew Wilcox
2024-02-29 20:05       ` Greg Edwards
2023-12-07 22:10 ` Keith Busch
2023-12-07 23:57   ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240229182513.GA17355@bobdog.home.arpa \
    --to=gedwards@ddn.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stable@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).