From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA368C433EF for ; Thu, 21 Jul 2022 12:58:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232925AbiGUM6t (ORCPT ); Thu, 21 Jul 2022 08:58:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58272 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229967AbiGUM6r (ORCPT ); Thu, 21 Jul 2022 08:58:47 -0400 X-Greylist: delayed 567 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Thu, 21 Jul 2022 05:58:46 PDT Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5360B459BE for ; Thu, 21 Jul 2022 05:58:46 -0700 (PDT) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 45A2040569; Thu, 21 Jul 2022 14:49:15 +0200 (CEST) Message-ID: Date: Thu, 21 Jul 2022 14:49:14 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Content-Language: en-US From: Fabian Ebner To: kvm@vger.kernel.org, qemu-devel@nongnu.org Cc: Thomas Lamprecht , Mira Limbeck Subject: Guest reboot issues since QEMU 6.0 and Linux 5.11 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Hi, since about half a year ago, we're getting user reports about guest reboot issues with KVM/QEMU[0]. The most common scenario is a Windows Server VM (2012R2/2016/2019, UEFI/OVMF and SeaBIOS) getting stuck during the screen with the Windows logo and the spinning circles after a reboot was triggered from within the guest. Quitting the kvm process and booting with a fresh instance works. The issue seems to become more likely, the longer the kvm instance runs. We did not get such reports while we were providing Linux 5.4 and QEMU 5.2.0, but we do with Linux 5.11/5.13/5.15 and QEMU 6.x. I'm just wondering if anybody has seen this issue before or might have a hunch what it's about? Any tips on what to look out for when debugging are also greatly appreciated! We do have debug access to a user's test VM and the VM state was saved before a problematic reboot, but I can't modify the host system there. AFAICT QEMU just executes guest code as usual, but I'm really not sure what to look out for. That VM has CPU type host, and a colleague did have a similar enough CPU to load the VM state, but for him, the reboot went through normally. On the user's system, it triggers consistently after loading the VM state and rebooting. So unfortunately, we didn't manage to reproduce the issue locally yet. With two other images provided by users, we ran into a boot loop, where QEMU resets the CPUs and does a few KVM_RUNs before the exit reason is KVM_EXIT_SHUTDOWN (which to my understanding indicates a triple fault) and then it repeats. It's not clear if the issues are related. There are also a few reports about non-Windows VMs, mostly Ubuntu 20.04 with UEFI/OVMF, but again, it's not clear if the issues are related. [0]: https://forum.proxmox.com/threads/100744/ (the forum thread is a bit chaotic unfortunately). Best Regards, Fabi