All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 209025] New: The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back
@ 2020-08-24 16:35 bugzilla-daemon
  2020-08-24 20:43 ` [Bug 209025] " bugzilla-daemon
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: bugzilla-daemon @ 2020-08-24 16:35 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209025

            Bug ID: 209025
           Summary: The "VFIO_MAP_DMA failed: Cannot allocate memory" bug
                    is back
           Product: Virtualization
           Version: unspecified
    Kernel Version: 5.9 rc1 and rc2
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: kvm
          Assignee: virtualization_kvm@kernel-bugs.osdl.org
          Reporter: rmuncrief@humanavance.com
        Regression: No

Created attachment 292143
  --> https://bugzilla.kernel.org/attachment.cgi?id=292143&action=edit
VM Fail Log

My primary Windows 10 VM uses GPU/SATA/USB passthrough and it appears a
regression has been introduced in kernel 5.9. The VM will not start with both
rc1 and rc2 because of the "VFIO_MAP_DMA failed: Cannot allocate memory" error
on all passthrough devices.

I'm running an R7 3700X CPU, ASUS TUF Gaming X570-Plus MB, 16GB PC3200 DDR4,
and two RX 580 GPUs (one dedicated to the VM passthrough). I've attached the
relevant log excerpt to this report.

By the way, the log only shows one device failing because QEMU/KVM exits on the
first error. But I switched around all VM devices so they were the first
encountered and each one failed in the same way, so it's not device related. 

The VM works fantastically with kernels 5.8.3 and 5.4.60.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 209025] The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back
  2020-08-24 16:35 [Bug 209025] New: The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back bugzilla-daemon
@ 2020-08-24 20:43 ` bugzilla-daemon
  2020-08-24 20:57 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2020-08-24 20:43 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209025

Alex Williamson (alex.williamson@redhat.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |alex.williamson@redhat.com

--- Comment #1 from Alex Williamson (alex.williamson@redhat.com) ---
There's another similar report here:

https://lore.kernel.org/kvm/6d0a5da6-0deb-17c5-f8f5-f8113437c2d6@linux.ibm.com/

I don't seem to be able to reproduce on EPYC.  Is there any chance you could
bisect it?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 209025] The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back
  2020-08-24 16:35 [Bug 209025] New: The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back bugzilla-daemon
  2020-08-24 20:43 ` [Bug 209025] " bugzilla-daemon
@ 2020-08-24 20:57 ` bugzilla-daemon
  2020-08-25  0:53 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2020-08-24 20:57 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209025

--- Comment #2 from muncrief (rmuncrief@humanavance.com) ---
Oh, that's interesting. Well, at least we know it doesn't have anything to do
with pinning since my VM is pinless :)

In any case I'll go ahead and bisect it and see if I can identify the bad
commit.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 209025] The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back
  2020-08-24 16:35 [Bug 209025] New: The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back bugzilla-daemon
  2020-08-24 20:43 ` [Bug 209025] " bugzilla-daemon
  2020-08-24 20:57 ` bugzilla-daemon
@ 2020-08-25  0:53 ` bugzilla-daemon
  2020-08-25  3:14 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2020-08-25  0:53 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209025

--- Comment #3 from muncrief (rmuncrief@humanavance.com) ---
Unfortunately bisect failed, and in a very odd way. I've bisected the kernel
numerous times over the decades, but this time it didn't work correctly from
the start because of module directories it wanted to delete that weren't there.

To make a long story short, after the initial compile of 5.9-rc1 I did the
normal bisect start and good/bad version definition. But when making the first
bisect it failed during the final phase of the modules process with errors
saying it couldn't delete
"pkg/linux-bisect/usr/lib/modules/5.9.0-rc1-1-bisect/source" or
"pkg/linux-bisect/usr/lib/modules/5.9.0-rc1-1-bisect/build". And when I looked
they didn't exist.

So I spent a few hours trying numerous things, but could never get bisect to
work as expected. In the end I just timed the manual creation of the
directories correctly as the modules process was completing, but then bisect
complained they were directories. So I tried again but this time just touched
to make files instead of directories, and bisect completed.

Perplexed but undeterred I installed and ran the bisected kernel, the VM
worked, and I marked the bisect as good. But when compiling the next bisect the
same thing happened, and I did the same thing to fix it. However this time when
I installed and ran the kernel my VM seemed to boot, but actually didn't.
Neither the QXL or passthrough GPU displays came on, and I couldn't shut it
down. I just had to do a power off.

So I rebooted with my working 5.8.3 kernel and was surprised that my entire VM
disk was completely erased. There were no partitions at all, it was just blank.
Of course I made a backup before doing all this so it was easy to restore, but
it's the first time I've ever seen anything like it.

In any case, the disk is attached to a passthrough Phison NVME controller, so I
assume there was some kind of different, silent, VFIO error that wiped out the
disk.

In summary, I have no idea what's going on. Of course sometimes bisect works
and sometimes it doesn't, and the kernel is the most difficult and dangerous to
bisect, but I've never seen actual process errors like this before. Compilation
errors yes, but not missing source or package files and directories.

I'm hoping, and assuming, this is some kind of pilot error on my part. If so,
and someone knows what it is, just tell me what it is and I'll give it another
try. By the way, I'm running Arch with all the latest updates.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 209025] The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back
  2020-08-24 16:35 [Bug 209025] New: The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back bugzilla-daemon
                   ` (2 preceding siblings ...)
  2020-08-25  0:53 ` bugzilla-daemon
@ 2020-08-25  3:14 ` bugzilla-daemon
  2020-08-25  7:26 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2020-08-25  3:14 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209025

--- Comment #4 from muncrief (rmuncrief@humanavance.com) ---
Oh yeah, as I assumed it was pilot error. After I finished my other tasks for
the day and had a few minutes to concentrate on the bisect output I realized I
probably had to compile the exact initial version for bisect to work in Arch.
And indeed once I created a custom PKGBUILD the first bisect compilation
completed without error.

It's too late to continue this evening, but I don't have any tasks scheduled
for the first part of the day tomorrow so I'll concentrate solely on the bisect
and hopefully get a bit further this time.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 209025] The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back
  2020-08-24 16:35 [Bug 209025] New: The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back bugzilla-daemon
                   ` (3 preceding siblings ...)
  2020-08-25  3:14 ` bugzilla-daemon
@ 2020-08-25  7:26 ` bugzilla-daemon
  2020-08-25 14:32 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2020-08-25  7:26 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209025

Niklas Schnelle (niklas@komani.de) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |axboe@kernel.dk,
                   |                            |niklas@komani.de

--- Comment #5 from Niklas Schnelle (niklas@komani.de) ---
Hi,

it's me Niklas from the KVM mailinglist discussion and yes
this is a very old pre-IBM, pre any work, Bugzilla account :D

I too did a bisect yesterday and also
encountered a few commits that had KVM in a very weird state
where not even the UEFI in the VM would boot, funnily enough
a BIOS based FreeBSD VM did still boot.

Anyway my bisect was successful and reverting the found
commit makes things work even on v5.9-rc2.

That said it is quite a strange result but I guess it makes
sense as that also deals with locked/pinned memory.
I'm assuming this might use the same accounting mechanism?

f74441e6311a28f0ee89b9c8e296a33730f812fc is the first bad commit
commit f74441e6311a28f0ee89b9c8e296a33730f812fc
Author: Jens Axboe <axboe@kernel.dk>
Date:   Wed Aug 5 13:00:44 2020 -0600

    io_uring: account locked memory before potential error case

    The tear down path will always unaccount the memory, so ensure that we
    have accounted it before hitting any of them.

    Reported-by: Tomáš Chaloupka <chalucha@gmail.com>
    Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

 fs/io_uring.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

I've added Jens to the Bugzilla CC list not sure if he'll see
that though.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 209025] The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back
  2020-08-24 16:35 [Bug 209025] New: The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back bugzilla-daemon
                   ` (4 preceding siblings ...)
  2020-08-25  7:26 ` bugzilla-daemon
@ 2020-08-25 14:32 ` bugzilla-daemon
  2020-08-25 14:32 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2020-08-25 14:32 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209025

--- Comment #6 from Jens Axboe (axboe@kernel.dk) ---
(In reply to muncrief from comment #4)

I'm attaching the patch that should fix this. muncrief, I like to provide
proper attribution in patches, would you be willing to share your name and
email so I can add it to the patch? If you prefer not to that's totally fine as
well, just wanted to give you the option.

Attaching the patch after this comment.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 209025] The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back
  2020-08-24 16:35 [Bug 209025] New: The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back bugzilla-daemon
                   ` (5 preceding siblings ...)
  2020-08-25 14:32 ` bugzilla-daemon
@ 2020-08-25 14:32 ` bugzilla-daemon
  2020-08-25 16:31 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2020-08-25 14:32 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209025

--- Comment #7 from Jens Axboe (axboe@kernel.dk) ---
Created attachment 292167
  --> https://bugzilla.kernel.org/attachment.cgi?id=292167&action=edit
Fix sqo_mm accounting

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 209025] The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back
  2020-08-24 16:35 [Bug 209025] New: The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back bugzilla-daemon
                   ` (6 preceding siblings ...)
  2020-08-25 14:32 ` bugzilla-daemon
@ 2020-08-25 16:31 ` bugzilla-daemon
  2020-08-25 17:09 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2020-08-25 16:31 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209025

--- Comment #8 from Robert M. Muncrief (rmuncrief@humanavance.com) ---
(In reply to Jens Axboe from comment #6)
> (In reply to muncrief from comment #4)
> 
> I'm attaching the patch that should fix this. muncrief, I like to provide
> proper attribution in patches, would you be willing to share your name and
> email so I can add it to the patch? If you prefer not to that's totally fine
> as well, just wanted to give you the option.
> 
> Attaching the patch after this comment.

Awesome Jens! Thank you for figuring this thing out. I'll try the patch as soon
as I'm done with breakfast. And sharing my name and email is fine. I changed my
account to my full name (Robert M. Muncrief).

By the way, for future reference was my assumption that I have to compile the
exact initial kernel version before starting the bisect correct? I switched to
Manjaro three or four years ago, and then Arch about two years ago, but I don't
recall having to do it that way before. But then again I'm not sure if I've
ever bisected the kernel on Arch, it may just have been on Manjaro and Xubuntu.

And hey, don't laugh! I'm old! And my memory sure isn't what it used to be ...
:)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 209025] The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back
  2020-08-24 16:35 [Bug 209025] New: The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back bugzilla-daemon
                   ` (7 preceding siblings ...)
  2020-08-25 16:31 ` bugzilla-daemon
@ 2020-08-25 17:09 ` bugzilla-daemon
  2020-08-25 17:13 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2020-08-25 17:09 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209025

--- Comment #9 from Niklas Schnelle (niklas@komani.de) ---
Hi Robert,

git does not know what you compiled so you could just
do "git bisect;git good v5.8;git bad v5.9-rc1"
with that said it is of course best to always compile the versions
you tell git are (not) working.

I'm a fellow Arch Linux user (on all my private machines) and actually suspect
its current QEMU and other package versions were necessary to expose this bug
and are the reason Alex could not reproduce this.

I did not do the git bisect with PKGBUILDs though, instead I
have a custom systemd-boot entry and in the .config set LOCALVERSION="-niklas"
then I used the following commands:

cd linux
zcat /proc/config.gz > .config # once ton get Arch Config
make oldconfig
make -j 24 
sudo make modules_install -j INSTALL_MOD_STRIP=1
sudo cp arch/x86_64/boot/bzImage /boot/vmlinuz-linux-niklas
sudo mkinitcpio -p linux-niklas 

The last part is arch specific, on other distros there is a special
installkernel script that does the copy to /boot and rebuilds the initramfs and
also creates bootloader entries.
Also only add the strip flag if you don't need debug symbols
in modules.

The manual cp/modules_install of course means I have
to delete the /usr/lib/modules/.. folders manually but they all have "niklas"
in the name so that's easy enough ;-)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 209025] The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back
  2020-08-24 16:35 [Bug 209025] New: The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back bugzilla-daemon
                   ` (8 preceding siblings ...)
  2020-08-25 17:09 ` bugzilla-daemon
@ 2020-08-25 17:13 ` bugzilla-daemon
  2020-08-25 17:47 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2020-08-25 17:13 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209025

--- Comment #10 from Jens Axboe (axboe@kernel.dk) ---
> I'm a fellow Arch Linux user (on all my private machines) and actually
> suspect
> its current QEMU and other package versions were necessary to expose this bug
> and are the reason Alex could not reproduce this.

Newer qemu versions use io_uring for faster IO, hence that's why you'd see it.
If you're not using io_uring at all, you would not trigger the imbalance.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 209025] The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back
  2020-08-24 16:35 [Bug 209025] New: The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back bugzilla-daemon
                   ` (9 preceding siblings ...)
  2020-08-25 17:13 ` bugzilla-daemon
@ 2020-08-25 17:47 ` bugzilla-daemon
  2020-08-25 17:55 ` bugzilla-daemon
  2020-08-25 18:06 ` bugzilla-daemon
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2020-08-25 17:47 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209025

--- Comment #11 from Robert M. Muncrief (rmuncrief@humanavance.com) ---
Fantastic work Jens! I just tested this patch and my VM ran perfectly, and
there were zero dmesg error or fail messages.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 209025] The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back
  2020-08-24 16:35 [Bug 209025] New: The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back bugzilla-daemon
                   ` (10 preceding siblings ...)
  2020-08-25 17:47 ` bugzilla-daemon
@ 2020-08-25 17:55 ` bugzilla-daemon
  2020-08-25 18:06 ` bugzilla-daemon
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2020-08-25 17:55 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209025

--- Comment #12 from Robert M. Muncrief (rmuncrief@humanavance.com) ---
Thank you for the information Niklas. I could swear I'd bisected the kernel at
least once on Arch, and know I did a few times on Manjaro, and I always started
by compiling the bad version. But I must at least be wrong about Arch, because
if I'd done anything special I would have written it down in my install notes.

In any case I've written your info in my notes and I'll also try to pay more
attention to what I'm doing next time.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 209025] The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back
  2020-08-24 16:35 [Bug 209025] New: The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back bugzilla-daemon
                   ` (11 preceding siblings ...)
  2020-08-25 17:55 ` bugzilla-daemon
@ 2020-08-25 18:06 ` bugzilla-daemon
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2020-08-25 18:06 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=209025

--- Comment #13 from Jens Axboe (axboe@kernel.dk) ---
Thanks everyone, fix is queued up:

https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.9&id=6b7898eb180df12767933466b7855b23103ad489

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-08-25 18:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-24 16:35 [Bug 209025] New: The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back bugzilla-daemon
2020-08-24 20:43 ` [Bug 209025] " bugzilla-daemon
2020-08-24 20:57 ` bugzilla-daemon
2020-08-25  0:53 ` bugzilla-daemon
2020-08-25  3:14 ` bugzilla-daemon
2020-08-25  7:26 ` bugzilla-daemon
2020-08-25 14:32 ` bugzilla-daemon
2020-08-25 14:32 ` bugzilla-daemon
2020-08-25 16:31 ` bugzilla-daemon
2020-08-25 17:09 ` bugzilla-daemon
2020-08-25 17:13 ` bugzilla-daemon
2020-08-25 17:47 ` bugzilla-daemon
2020-08-25 17:55 ` bugzilla-daemon
2020-08-25 18:06 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.