From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47514) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cvPSJ-0008Qr-CU for qemu-devel@nongnu.org; Tue, 04 Apr 2017 10:29:12 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cvPSG-0004vi-93 for qemu-devel@nongnu.org; Tue, 04 Apr 2017 10:29:11 -0400 Received: from mail.windriver.com ([147.11.1.11]:58797) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cvPSG-0004tB-05 for qemu-devel@nongnu.org; Tue, 04 Apr 2017 10:29:08 -0400 Message-ID: <58E3ADA3.2040305@windriver.com> Date: Tue, 4 Apr 2017 08:28:51 -0600 From: Chris Friesen MIME-Version: 1.0 References: <58DEB834.6060405@windriver.com> <20170403191158.GI3539@stefanha-x1.localdomain> In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] hitting intermittent issue with live migration from qemu-kvm-ev 2.3.0 to qemu-kvm-ev 2.6.0 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ladi Prosek , Stefan Hajnoczi Cc: qemu-devel , "Dr. David Alan Gilbert" On 04/04/2017 07:56 AM, Ladi Prosek wrote: > On Mon, Apr 3, 2017 at 9:11 PM, Stefan Hajnoczi wrote: >> On Fri, Mar 31, 2017 at 02:12:36PM -0600, Chris Friesen wrote: >>> Initially we have a bunch of guests running on compute-2 (which is running >>> qemu-kvm-ev 2.3.0). We then started live-migrating them one at a time to >>> compute-0 (which is running qemu-kvm-ev 2.6.0). Three of them migrated >>> successfully. The fourth (which was essentially identical in configuration >>> to the first three) failed, as per the following logs in >>> /var/log/libvirt/qemu/instance-0000000e.log: >>> >>> >>> 2017-03-29T06:38:37.886940Z qemu-kvm: VQ 2 size 0x80 < last_avail_idx 0x47b >>> - used_idx 0x47c >>> 2017-03-29T06:38:37.886974Z qemu-kvm: error while loading state for instance >>> 0x0 of device '0000:00:07.0/virtio-balloon' >>> 2017-03-29T06:38:37.888684Z qemu-kvm: load of migration failed: Operation >>> not permitted >>> 2017-03-29 06:38:37.896+0000: shutting down >>> >>> >>> Does anyone know of an existing bug report covering this issue? (I took a >>> look and didn't see anything obviously related.) >> >> This is the virtio-balloon device. If you remove the device the live >> migration should work reliably. >> >> Alternatively, you can temporarily rmmod virtio_balloon inside the guest >> for live migration. After migration you can modprobe virtio_balloon >> again. >> >> last_avail_idx 0x47b with used_idx 0x47c is an invalid device state. >> I've diffed qemu-kvm-ev 2.6.0-27.1 hw/virtio/virtio-balloon.c against >> qemu.git/master and do not see an obvious bug. I also compared >> qemu-kvm-ev 2.3.0-31 with qemu-kvm-ev 2.6.0-27.1. > > The device likely got into the invalid state as part of a previous > migration to an unfixed QEMU. I second Stefan's suggestion to > temporarily remove the device or unload the driver. I'll give that a try (been busy with a separate issue). If I have a guest already running, can I unilaterally hot-remove the device from the host side or does the guest need to be involved as well? (I'm just trying to figure out how to deal with existing guests.) Thanks, Chris