From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55064) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eqLRK-0003eN-F9 for qemu-devel@nongnu.org; Mon, 26 Feb 2018 11:15:52 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eqLRF-0007vT-Hq for qemu-devel@nongnu.org; Mon, 26 Feb 2018 11:15:46 -0500 Received: from indium.canonical.com ([91.189.90.7]:51862) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eqLRF-0007uw-A8 for qemu-devel@nongnu.org; Mon, 26 Feb 2018 11:15:41 -0500 Received: from loganberry.canonical.com ([91.189.90.37]) by indium.canonical.com with esmtp (Exim 4.86_2 #2 (Debian)) id 1eqLRD-0003Tj-Ar for ; Mon, 26 Feb 2018 16:15:39 +0000 Received: from loganberry.canonical.com (localhost [127.0.0.1]) by loganberry.canonical.com (Postfix) with ESMTP id 3A59F2E80D8 for ; Mon, 26 Feb 2018 16:15:38 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Date: Mon, 26 Feb 2018 16:08:16 -0000 From: Max Reitz <1751264@bugs.launchpad.net> Reply-To: Bug 1751264 <1751264@bugs.launchpad.net> Sender: bounces@canonical.com References: <151939024836.30479.4933664010119224710.malonedeb@gac.canonical.com> Message-Id: <151966129702.12023.7750287097674366749.malone@gac.canonical.com> Errors-To: bounces@canonical.com Subject: [Qemu-devel] [Bug 1751264] Re: qemu-img convert issue in a tmpfs partition List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Hi, This is a combination of (in our opinion) a bug in tmpfs (...and I think maybe btrfs as well?), the fact that the vmdk block driver is not very well optimized, and qemu-img convert assuming that the filesystem works as it thinks it does or that at least the block driver can work around this. So what happens is that qemu-img convert tries to find out which data it needs to copy. For this, it queries which parts of the image are allocated. This involves querying both the format level (vmdk in this case) and the protocol level (tmpfs in this case). Now the vmdk block driver is not very well optimized, so it only allows querying on cluster boundaries (64 kB by default, as far as I can tell). qcow2 OTOH allows greater areas (I just created a 512 MB image and it can query the whole image at once). So the requests go down to the protocol level. We expect that to respond very quickly to an allocation request (the lseek() you are seeing) -- but tmpfs (and I think btrfs, too) don't do that. They take a rather long time. For an example, the attached program seeks through a file (in 64 kB steps) = with SEEK_DATA/SEEK_HOLE. This is what happens: $ cd /tmp $ gcc test.c -std=3Dc11 -Wall -Wextra -pedantic -O3 $ qemu-img create -f raw -o preallocation=3Dfalloc empty 512M $ qemu-img create -f raw -o preallocation=3Dfalloc ~/empty 512M $ time ./a.out empty ./a.out empty 0,01s user 23,10s system 99% cpu 23,166 total $ time ./a.out ~/empty ./a.out ~/empty 0,01s user 0,03s system 96% cpu 0,041 total So there's a huge difference and that is (in my opinion) a bug in tmpfs. (When converting from qcow2 you don't notice this, because qcow2 allows performing a single allocation request for the whole image, so it doesn't matter much whether that's slow.) There are three ways around this: (1) tmpfs (and probably btrfs? -- although I can't reproduce it myself righ= t now) should be fixed. If they can't tell allocated areas quickly, they s= hould just report the whole file as allocated. (2) Our vmdk driver could be optimized. Sure, but that wouldn't solve the real issue and someone would have to do it first (and we don't have a strong interest in this, because all format drivers but qcow2 and raw are there mainly just for reading other formats and converting them to qcow2). (3a) qemu-img convert could poll for allocation information less insistently. One way would be to add a switch to disable this behavior completely and force it to just read everything. We already have -S 0 which could do this; but just reading all data and then doing zero detection over it kind of defeats the purpose. If read() + memcmp() is faster than lseek(SEEK_DATA), then the FS is just doing something wrong. (3b) Eric Blake has recently added support for a less insisting way to query allocation status that should only go to the format layer (e.g. vmdk) and ignore the protocol layer (e.g. tmpfs). Maybe qemu-img convert should use that. But in any case, I claim the main issue is in tmpfs. Max ** Attachment added: "test.c" https://bugs.launchpad.net/qemu/+bug/1751264/+attachment/5063575/+files/= test.c -- = You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1751264 Title: qemu-img convert issue in a tmpfs partition Status in QEMU: New Bug description: qemu-img convert command is slow when the file to convert is located in a tmpfs formatted partition. v2.1.0 on debian/jessie x64, ext4: 10m14s v2.1.0 on debian/jessie x64, tmpfs: 10m15s v2.1.0 on debian/stretch x64, ext4: 11m9s v2.1.0 on debian/stretch x64, tmpfs: 10m21.362s v2.8.0 on debian/jessie x64, ext4: 10m21s v2.8.0 on debian/jessie x64, tmpfs: Too long (50min+) v2.8.0 on debian/stretch x64, ext4: 10m42s v2.8.0 on debian/stretch x64, tmpfs: Too long (50min+) It seems that the issue is caused by this commit : https://github.com/qemu/qemu/commit/690c7301600162421b928c7f26fd488fd8fa4= 64e In order to reproduce this bug : 1/ mount a tmpfs partition : mount -t tmpfs tmpfs /tmp 2/ get a vmdk file (we used a 15GB image) and put it on /tmp 3/ run the 'qemu-img convert -O qcow2 /tmp/file.vmdk /path/to/destination= ' command When we trace the process, we can see that there's a lseek loop which is very slow (compare to outside a tmpfs partition). To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1751264/+subscriptions