From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50612) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cxvg2-000151-GD for qemu-devel@nongnu.org; Tue, 11 Apr 2017 09:17:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cxvfz-0006vA-AT for qemu-devel@nongnu.org; Tue, 11 Apr 2017 09:17:46 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56740) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cxvfz-0006uZ-4l for qemu-devel@nongnu.org; Tue, 11 Apr 2017 09:17:43 -0400 From: Laurent Vivier Date: Tue, 11 Apr 2017 15:17:31 +0200 Message-Id: <20170411131733.27542-1-lvivier@redhat.com> Subject: [Qemu-devel] [PATCH 0/2] migration: fix virtio-rng List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr . David Alan Gilbert" Cc: "Michael S . Tsirkin" , Stefan Hajnoczi , Amit Shah , qemu-devel@nongnu.org When post-copy migration is enabled, the destination guest can ask for memory from the source when the vmstate is restored. In the case of virtio, a part of the virtqueue is migrated by the vmstate structure (last_avail_idx) another part is migrated inside the RAM (used_idx). On the source side, the virtqueue can be modified whereas the vmstate is already migrated, and the destination side can ask for the value in RAM. In this case we have an inconsistency that can generate this kind of error: "VQ 0 size 0x8 < last_avail_idx 0xa - used_idx 0" in hw/virtio/virtio.c:2180, virtio_load(). This happens with virtio-rng as the chr_read() function which modifies the virqueue is called by the rng backend and the rng backend continues to run while the migration is running and the CPU is stopped. This series fixes this problem by ignoring chr_read() calls while the CPU is stopped. The first patch of the series fixes another problem triggered by this error case: a use-after-free case. The probability to have this problem is very low, as generally the post-copy phase is very short, so the window to modify the virtqueue while the vmstate has been sent is very small... except if you are doing trans-continental guest migration with high latency and post-copy phase that can be run for minutes. I've been able to reproduce the problem locally on a host, by adding network latency with "tc". Another condition is to have an rng daemon running in the guest to generate events in the virtio-rng device. Laurent Vivier (2): migration: don't close a file descriptor while it can be in use virtio-rng: stop virtqueue while the CPU is stopped hw/virtio/trace-events | 2 ++ hw/virtio/virtio-rng.c | 10 ++++++++++ migration/migration.c | 6 +++--- 3 files changed, 15 insertions(+), 3 deletions(-) -- 2.9.3