qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: John Snow <jsnow@redhat.com>
To: Nir Soffer <nsoffer@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Nir Soffer <nirsof@gmail.com>,
	qemu-block <qemu-block@nongnu.org>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Max Reitz <mreitz@redhat.com>, Niels de Vos <ndevos@redhat.com>
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH] block: posix: Always allocate the first block
Date: Fri, 16 Aug 2019 19:00:55 -0400	[thread overview]
Message-ID: <c9805bc4-232b-aa72-2f48-878a7d1a55bb@redhat.com> (raw)
In-Reply-To: <CAMRbyytThpP1KXPmJLpA_i3JLot7j9UshjcqRerkFtmN_T5Seg@mail.gmail.com>



On 8/16/19 6:45 PM, Nir Soffer wrote:
> On Sat, Aug 17, 2019 at 12:57 AM John Snow <jsnow@redhat.com
> <mailto:jsnow@redhat.com>> wrote:
> 
>     On 8/16/19 5:21 PM, Nir Soffer wrote:
>     > When creating an image with preallocation "off" or "falloc", the first
>     > block of the image is typically not allocated. When using Gluster
>     > storage backed by XFS filesystem, reading this block using direct I/O
>     > succeeds regardless of request length, fooling alignment detection.
>     >
>     > In this case we fallback to a safe value (4096) instead of the optimal
>     > value (512), which may lead to unneeded data copying when aligning
>     > requests.  Allocating the first block avoids the fallback.
>     >
> 
>     Where does this detection/fallback happen? (Can it be improved?)
> 
> 
> In raw_probe_alignment().
> 
> This patch explain the issues:
> https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00568.html
> 
> Here Kevin and me discussed ways to improve it:
> https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00426.html
> 

Thanks for the reading!
That does help explain this patch better.

>     > When using preallocation=off, we always allocate at least one
>     filesystem
>     > block:
>     >
>     >     $ ./qemu-img create -f raw test.raw 1g
>     >     Formatting 'test.raw', fmt=raw size=1073741824
>     >
>     >     $ ls -lhs test.raw
>     >     4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw
>     >
>     > I did quick performance tests for these flows:
>     > - Provisioning a VM with a new raw image.
>     > - Copying disks with qemu-img convert to new raw target image
>     >
>     > I installed Fedora 29 server on raw sparse image, measuring the time
>     > from clicking "Begin installation" until the "Reboot" button appears:
>     >
>     > Before(s)  After(s)     Diff(%)
>     > -------------------------------
>     >      356        389        +8.4
>     >
>     > I ran this only once, so we cannot tell much from these results.
>     >
> 
>     That seems like a pretty big difference for just having pre-allocated a
>     single block. What was the actual command line / block graph for
>     that test?
> 
> 
> Having the first block allocated changes the alignment.
> 
> Before this patch, we detect request_alignment=1, so we fallback to 4096.
> Then we detect buf_align=1, so we fallback to value of request alignment.
> 
> The guest see a disk with:
> logical_block_size = 512
> physical_block_size = 512
> 
> But qemu uses:
> request_alignment = 4096
> buf_align = 4096
> 
> storage uses:
> logical_block_size = 512
> physical_block_size = 512
> 
> If the guest does direct I/O using 512 bytes aligment, qemu has to copy
> the buffer to align them to 4096 bytes.
> 
> After this patch, qemu detects the alignment correctly, so we have:
> 
> guest
> logical_block_size = 512
> physical_block_size = 512
> 
> qemu
> request_alignment = 512
> buf_align = 512
> 
> storage:
> logical_block_size = 512
> physical_block_size = 512
> 
> We expect this to be more efficient because qemu does not have to emulate
> anything.
> 
>     Was this over a network that could explain the variance?
> 
> 
> Maybe, this is complete install of Fedora 29 server, I'm not sure if the
> installation 
> access the network.
> 
>     > The second test was cloning the installation image with qemu-img
>     > convert, doing 10 runs:
>     >
>     >     for i in $(seq 10); do
>     >         rm -f dst.raw
>     >         sleep 10
>     >         time ./qemu-img convert -f raw -O raw -t none -T none
>     src.raw dst.raw
>     >     done
>     >
>     > Here is a table comparing the total time spent:
>     >
>     > Type    Before(s)   After(s)    Diff(%)
>     > ---------------------------------------
>     > real      530.028    469.123      -11.4
>     > user       17.204     10.768      -37.4
>     > sys        17.881      7.011      -60.7
>     >
>     > Here we see very clear improvement in CPU usage.
>     >
> 
>     Hard to argue much with that. I feel a little strange trying to force
>     the allocation of the first block, but I suppose in practice "almost no
>     preallocation" is indistinguishable from "exactly no preallocation" if
>     you squint.
> 
> 
> Right.
> 
> The real issue is that filesystems and block devices do not expose the
> alignment
> requirement for direct I/O, so we need to use these hacks and assumptions.
> 
> With local XFS we use xfsctl(XFS_IOC_DIOINFO) to get request_alignment,
> but this does
> not help for XFS filesystem used by Gluster on the server side.
> 
> I hope that Niels is working on adding similar ioctl for Glsuter, os it
> can expose the properties
> of the remote filesystem.
> 
> Nir

That sounds quite a bit less hacky, but I agree we still have to do what
we can in the meantime.

(It looks like you've been hashing this out with Kevin for a while, so
I'm going to sheepishly defer to his judgment on this patch. While I
think it's probably a fine trade-off, I can't really say off-hand if
there's a better, more targeted way to accomplish it.)

--js


  reply	other threads:[~2019-08-16 23:01 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-16 21:21 [Qemu-devel] [PATCH] block: posix: Always allocate the first block Nir Soffer
2019-08-16 21:57 ` [Qemu-devel] [Qemu-block] " John Snow
2019-08-16 22:45   ` Nir Soffer
2019-08-16 23:00     ` John Snow [this message]
2019-08-22 11:30 ` [Qemu-devel] " Nir Soffer
2019-08-22 14:28 ` Max Reitz
2019-08-22 16:39   ` Nir Soffer
2019-08-22 18:11     ` Max Reitz
2019-08-22 19:01       ` Nir Soffer
2019-08-23 13:58         ` Max Reitz
2019-08-23 16:30           ` Nir Soffer
2019-08-23 17:41             ` Max Reitz
2019-08-23 16:48           ` Nir Soffer
2019-08-23 17:53             ` Max Reitz
2019-08-24 22:57               ` Nir Soffer
2019-08-25  7:44 ` [Qemu-devel] [Qemu-block] " Maxim Levitsky
2019-08-25 19:51   ` Nir Soffer
2019-08-25 22:17     ` Maxim Levitsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c9805bc4-232b-aa72-2f48-878a7d1a55bb@redhat.com \
    --to=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=ndevos@redhat.com \
    --cc=nirsof@gmail.com \
    --cc=nsoffer@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).