All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 209733] New: Starting new KVM virtual machines on PPC64 starts to hang after box is up for a while
@ 2020-10-18 23:09 bugzilla-daemon
  2020-10-18 23:13 ` [Bug 209733] " bugzilla-daemon
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: bugzilla-daemon @ 2020-10-18 23:09 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=209733

            Bug ID: 209733
           Summary: Starting new KVM virtual machines on PPC64 starts to
                    hang after box is up for a while
           Product: Platform Specific/Hardware
           Version: 2.5
    Kernel Version: >=5.8
          Hardware: PPC-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: PPC-64
          Assignee: platform_ppc-64@kernel-bugs.osdl.org
          Reporter: cam@neo-zeon.de
        Regression: No

Issue occurs with 5.8.14, 5.8.16, and 5.9.1.  Does NOT occur with 5.7.x. I
suspect it occurs with all of 5.8, but I haven't confirmed this yet.

After the box has been up for a "while", starting new VM's fails. Completely
shutting down existing VM's and then starting them back up will also fail in
the same way.

What is a while? Could be 2 days, might be 9. I'll update as the pattern
becomes more clear.

libvirt is generally used, but when running kvm manually with strace, kvm
always gets stuck here:
ioctl(11, KVM_PPC_ALLOCATE_HTAB, 0x7fffea0bade4

Maybe the kernel is trying to find the memory needed to allocate the Hashed
Page Table but is unable to do so? Maybe there's a memory leak?

Before this issue starts occurring, I have confirmed I am able to run the exact
same kvm command manually:
sudo -u libvirt-qemu qemu-system-ppc64 -enable-kvm -m 8192 -nographic -vga none
-drive file=/var/lib/libvirt/images/test.qcow2,format=qcow2 -mem-prealloc -smp
4

Nothing in dmesg, nothing useful in the logs.

This box's configuration:
Debian 10 stable
2x 18 core POWER9 (144 threads)
512g physical memory
Raptor Talos II motherboard
radix MMU disabled

Unfortunately, I cannot test the affected box with the Radix MMU enabled
because I have some important VM's that won't run unless it is disabled.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 209733] Starting new KVM virtual machines on PPC64 starts to hang after box is up for a while
  2020-10-18 23:09 [Bug 209733] New: Starting new KVM virtual machines on PPC64 starts to hang after box is up for a while bugzilla-daemon
@ 2020-10-18 23:13 ` bugzilla-daemon
  2020-10-30 17:46 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2020-10-18 23:13 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=209733

Cameron (cam@neo-zeon.de) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|PPC-64                      |kvm
            Version|2.5                         |unspecified
            Product|Platform Specific/Hardware  |Virtualization

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 209733] Starting new KVM virtual machines on PPC64 starts to hang after box is up for a while
  2020-10-18 23:09 [Bug 209733] New: Starting new KVM virtual machines on PPC64 starts to hang after box is up for a while bugzilla-daemon
  2020-10-18 23:13 ` [Bug 209733] " bugzilla-daemon
@ 2020-10-30 17:46 ` bugzilla-daemon
  2020-11-07 16:36 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2020-10-30 17:46 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=209733

--- Comment #1 from Cameron (cam@neo-zeon.de) ---
Still happens with 5.9.2.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 209733] Starting new KVM virtual machines on PPC64 starts to hang after box is up for a while
  2020-10-18 23:09 [Bug 209733] New: Starting new KVM virtual machines on PPC64 starts to hang after box is up for a while bugzilla-daemon
  2020-10-18 23:13 ` [Bug 209733] " bugzilla-daemon
  2020-10-30 17:46 ` bugzilla-daemon
@ 2020-11-07 16:36 ` bugzilla-daemon
  2020-11-08 16:33 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2020-11-07 16:36 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=209733

--- Comment #2 from Cameron (cam@neo-zeon.de) ---
Verified this happens with 5.9.6 and and Debian vendor kernel of
linux-image-5.9.0-1-powerpc64le.

Might also be worth mentioning this is occurring with qemu-system-ppc package
version 1:3.1+dfsg-8+deb10u8.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 209733] Starting new KVM virtual machines on PPC64 starts to hang after box is up for a while
  2020-10-18 23:09 [Bug 209733] New: Starting new KVM virtual machines on PPC64 starts to hang after box is up for a while bugzilla-daemon
                   ` (2 preceding siblings ...)
  2020-11-07 16:36 ` bugzilla-daemon
@ 2020-11-08 16:33 ` bugzilla-daemon
  2020-11-26 17:26 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2020-11-08 16:33 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=209733

--- Comment #3 from Cameron (cam@neo-zeon.de) ---
Same issue now that I'm running with qemu-system-ppc version 1:5.0-14~bpo10+1
from Debian backports.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 209733] Starting new KVM virtual machines on PPC64 starts to hang after box is up for a while
  2020-10-18 23:09 [Bug 209733] New: Starting new KVM virtual machines on PPC64 starts to hang after box is up for a while bugzilla-daemon
                   ` (3 preceding siblings ...)
  2020-11-08 16:33 ` bugzilla-daemon
@ 2020-11-26 17:26 ` bugzilla-daemon
  2020-11-26 17:26 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2020-11-26 17:26 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=209733

--- Comment #4 from Cameron (cam@neo-zeon.de) ---
After enough testing, I feel confident that this issue was fixed in 5.9.9.
However, I encountered issues with XFS with 5.9.9 and 5.9.10 (mainly on POWER,
but to a lesser extent they seemed to happen for me on amd64 at least). 5.9.11
has the weird hang fixed and no other issues (XFS or otherwise) in over 2 days!

I feel confident in closing this issue.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 209733] Starting new KVM virtual machines on PPC64 starts to hang after box is up for a while
  2020-10-18 23:09 [Bug 209733] New: Starting new KVM virtual machines on PPC64 starts to hang after box is up for a while bugzilla-daemon
                   ` (4 preceding siblings ...)
  2020-11-26 17:26 ` bugzilla-daemon
@ 2020-11-26 17:26 ` bugzilla-daemon
  2020-11-26 23:16 ` bugzilla-daemon
  2020-11-27  2:26 ` bugzilla-daemon
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2020-11-26 17:26 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=209733

Cameron (cam@neo-zeon.de) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |CODE_FIX

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 209733] Starting new KVM virtual machines on PPC64 starts to hang after box is up for a while
  2020-10-18 23:09 [Bug 209733] New: Starting new KVM virtual machines on PPC64 starts to hang after box is up for a while bugzilla-daemon
                   ` (5 preceding siblings ...)
  2020-11-26 17:26 ` bugzilla-daemon
@ 2020-11-26 23:16 ` bugzilla-daemon
  2020-11-27  2:26 ` bugzilla-daemon
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2020-11-26 23:16 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=209733

Michael Ellerman (michael@ellerman.id.au) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |michael@ellerman.id.au

--- Comment #5 from Michael Ellerman (michael@ellerman.id.au) ---
Thanks for persisting with the testing.

I wonder if it was fixed by:

c4629e4e7e09 ("mm/compaction: stop isolation if too many pages are isolated and
we have pages to migrate")
or
38935861d85a ("mm/compaction: count pages and stop correctly during page
isolation")

They fix a potential infinte loop in a path that's used by the HTAB allocation.

Those landed in v5.9.9, and fix a commit that was introduced in v5.7 (which
doesn't match your observation that v5.7.x was OK).

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 209733] Starting new KVM virtual machines on PPC64 starts to hang after box is up for a while
  2020-10-18 23:09 [Bug 209733] New: Starting new KVM virtual machines on PPC64 starts to hang after box is up for a while bugzilla-daemon
                   ` (6 preceding siblings ...)
  2020-11-26 23:16 ` bugzilla-daemon
@ 2020-11-27  2:26 ` bugzilla-daemon
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2020-11-27  2:26 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=209733

Michael Ellerman (michael@ellerman.id.au) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |CLOSED

--- Comment #6 from Michael Ellerman (michael@ellerman.id.au) ---
Nick pointed out that it was actually:
  2da9f6305f30 ("mm/vmscan: fix NR_ISOLATED_FILE corruption on 64-bit")

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-11-27  2:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-18 23:09 [Bug 209733] New: Starting new KVM virtual machines on PPC64 starts to hang after box is up for a while bugzilla-daemon
2020-10-18 23:13 ` [Bug 209733] " bugzilla-daemon
2020-10-30 17:46 ` bugzilla-daemon
2020-11-07 16:36 ` bugzilla-daemon
2020-11-08 16:33 ` bugzilla-daemon
2020-11-26 17:26 ` bugzilla-daemon
2020-11-26 17:26 ` bugzilla-daemon
2020-11-26 23:16 ` bugzilla-daemon
2020-11-27  2:26 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.