archive mirror
 help / color / mirror / Atom feed
From: Thorsten Leemhuis <>
	"" <>
Subject: Re: Commit 'iomap: add support for dma aligned direct-io' causes qemu/KVM boot failures
Date: Fri, 30 Sep 2022 13:52:55 +0200	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

TWIMC: this mail is primarily send for documentation purposes and for
regzbot, my Linux kernel regression tracking bot. These mails usually
contain '#forregzbot' in the subject, to make them easy to spot and filter.

[TLDR: I'm adding this regression report to the list of tracked
regressions; all text from me you find below is based on a few templates
paragraphs you might have encountered already already in similar form.]

Hi, this is your Linux kernel regression tracker. This might be a Qemu
bug, but it's exposed by kernel change, so I at least want to have it in
the tracking. I'll simply remove it in a few weeks, if it turns out that
nobody except Maxim hits this.

On 29.09.22 17:41, Maxim Levitsky wrote:
> Hi!
> Recently I noticed that this commit broke the boot of some of the VMs that I run on my dev machine.
> It seems that I am not the first to notice this but in my case it is a bit different
> My VM is a normal x86 VM, and it uses virtio-blk in the guest to access the virtual disk,
> which is a qcow2 file stored on ext4 filesystem which is stored on NVME drive with 4K sectors.
> (however I was also able to reproduce this on a raw file)
> It seems that the only two things that is needed to reproduce the issue are:
> 1. The qcow2/raw file has to be located on a drive which has 4K hardware block size.
> 2. Qemu needs to use direct IO (both aio and 'threads' reproduce this). 
> I did some debugging and I isolated the kernel change in behavior from qemu point of view:
> Qemu, when using direct IO, 'probes' the underlying file.
> It probes two things:
> 1. It probes the minimum block size it can read.
>    It does so by trying to read 1, 512, 1024, 2048 and 4096 bytes at offset 0,
>    using a 4096 bytes aligned buffer, and notes the first read that works as the hardware block size.
>    (The relevant function is 'raw_probe_alignment' in src/block/file-posix.c in qemu source code).
> 2. It probes the buffer alignment by reading 4096 bytes also at file offset 0,
>    this time using a buffer that is 1, 512, 1024, 2048 and 4096 aligned
>    (this is done by allocating a buffer which is 4K aligned and adding 1/512 and so on to its address)
>    First successful read is saved as the required buffer alignment. 
> Before the patch, both probes would yield 4096 and everything would work fine.
> (The file in question is stored on 4K block device)
> After the patch the buffer alignment probe succeeds at 512 bytes.
> This means that the kernel now allows to read 4K of data at file offset 0 with a buffer that
> is only 512 bytes aligned. 
> It is worth to note that the probe was done using 'pread' syscall.
> Later on, qemu likely reads the 1st 512 sector of the drive.
> It uses preadv with 2 io vectors:
> First one is for 512 bytes and it seems to have 0xC00 offset into page 
> (likely depends on debug session but seems to be consistent)
> Second one is for 3584 bytes and also has a buffer that is not 4K aligned.
> (0x200 page offset this time)
> This means that the qemu does respect the 4K block size but only respects 512 bytes buffer alignment,
> which is consistent with the result of the probing.
> And that preadv fails with -EINVAL
> Forcing qemu to use 4K buffer size fixes the issue, as well as reverting the offending commit.
> Any patches, suggestions are welcome.
> I use 6.0-rc7, using mainline master branch as yesterday.
> Best regards,
> 	Maxim Levitsky
Thanks for the report. To be sure below issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression
tracking bot:

#regzbot ^introduced bf8d08532bc1
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply -- ideally with also
telling regzbot about it, as explained here:

Reminder for developers: When fixing the issue, add 'Link:' tags
pointing to the report (the mail this one replies to), as explained for
in the Linux kernel's documentation; above webpage explains why this is
important for tracked regressions.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

  parent reply	other threads:[~2022-09-30 11:53 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-29 15:41 Commit 'iomap: add support for dma aligned direct-io' causes qemu/KVM boot failures Maxim Levitsky
2022-09-29 15:48 ` Keith Busch
2022-09-29 16:16   ` Maxim Levitsky
2022-09-29 16:37     ` Keith Busch
2022-09-29 16:39       ` Christoph Hellwig
2022-09-29 17:35         ` Paolo Bonzini
2022-10-02  8:59           ` Maxim Levitsky
2022-10-02 13:56             ` Keith Busch
2022-10-03  7:06               ` Maxim Levitsky
2022-09-30 11:52 ` Thorsten Leemhuis [this message]
2022-11-04 11:59   ` Commit 'iomap: add support for dma aligned direct-io' causes qemu/KVM boot failures #forregzbot Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).