All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Snow <jsnow@redhat.com>
To: Nir Soffer <nsoffer@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Nir Soffer <nirsof@gmail.com>,
	qemu-block <qemu-block@nongnu.org>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Max Reitz <mreitz@redhat.com>, Niels de Vos <ndevos@redhat.com>
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH] block: posix: Always allocate the first block
Date: Fri, 16 Aug 2019 19:00:55 -0400	[thread overview]
Message-ID: <c9805bc4-232b-aa72-2f48-878a7d1a55bb@redhat.com> (raw)
In-Reply-To: <CAMRbyytThpP1KXPmJLpA_i3JLot7j9UshjcqRerkFtmN_T5Seg@mail.gmail.com>



On 8/16/19 6:45 PM, Nir Soffer wrote:
> On Sat, Aug 17, 2019 at 12:57 AM John Snow <jsnow@redhat.com
> <mailto:jsnow@redhat.com>> wrote:
> 
>     On 8/16/19 5:21 PM, Nir Soffer wrote:
>     > When creating an image with preallocation "off" or "falloc", the first
>     > block of the image is typically not allocated. When using Gluster
>     > storage backed by XFS filesystem, reading this block using direct I/O
>     > succeeds regardless of request length, fooling alignment detection.
>     >
>     > In this case we fallback to a safe value (4096) instead of the optimal
>     > value (512), which may lead to unneeded data copying when aligning
>     > requests.  Allocating the first block avoids the fallback.
>     >
> 
>     Where does this detection/fallback happen? (Can it be improved?)
> 
> 
> In raw_probe_alignment().
> 
> This patch explain the issues:
> https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00568.html
> 
> Here Kevin and me discussed ways to improve it:
> https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00426.html
> 

Thanks for the reading!
That does help explain this patch better.

>     > When using preallocation=off, we always allocate at least one
>     filesystem
>     > block:
>     >
>     >     $ ./qemu-img create -f raw test.raw 1g
>     >     Formatting 'test.raw', fmt=raw size=1073741824
>     >
>     >     $ ls -lhs test.raw
>     >     4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw
>     >
>     > I did quick performance tests for these flows:
>     > - Provisioning a VM with a new raw image.
>     > - Copying disks with qemu-img convert to new raw target image
>     >
>     > I installed Fedora 29 server on raw sparse image, measuring the time
>     > from clicking "Begin installation" until the "Reboot" button appears:
>     >
>     > Before(s)  After(s)     Diff(%)
>     > -------------------------------
>     >      356        389        +8.4
>     >
>     > I ran this only once, so we cannot tell much from these results.
>     >
> 
>     That seems like a pretty big difference for just having pre-allocated a
>     single block. What was the actual command line / block graph for
>     that test?
> 
> 
> Having the first block allocated changes the alignment.
> 
> Before this patch, we detect request_alignment=1, so we fallback to 4096.
> Then we detect buf_align=1, so we fallback to value of request alignment.
> 
> The guest see a disk with:
> logical_block_size = 512
> physical_block_size = 512
> 
> But qemu uses:
> request_alignment = 4096
> buf_align = 4096
> 
> storage uses:
> logical_block_size = 512
> physical_block_size = 512
> 
> If the guest does direct I/O using 512 bytes aligment, qemu has to copy
> the buffer to align them to 4096 bytes.
> 
> After this patch, qemu detects the alignment correctly, so we have:
> 
> guest
> logical_block_size = 512
> physical_block_size = 512
> 
> qemu
> request_alignment = 512
> buf_align = 512
> 
> storage:
> logical_block_size = 512
> physical_block_size = 512
> 
> We expect this to be more efficient because qemu does not have to emulate
> anything.
> 
>     Was this over a network that could explain the variance?
> 
> 
> Maybe, this is complete install of Fedora 29 server, I'm not sure if the
> installation 
> access the network.
> 
>     > The second test was cloning the installation image with qemu-img
>     > convert, doing 10 runs:
>     >
>     >     for i in $(seq 10); do
>     >         rm -f dst.raw
>     >         sleep 10
>     >         time ./qemu-img convert -f raw -O raw -t none -T none
>     src.raw dst.raw
>     >     done
>     >
>     > Here is a table comparing the total time spent:
>     >
>     > Type    Before(s)   After(s)    Diff(%)
>     > ---------------------------------------
>     > real      530.028    469.123      -11.4
>     > user       17.204     10.768      -37.4
>     > sys        17.881      7.011      -60.7
>     >
>     > Here we see very clear improvement in CPU usage.
>     >
> 
>     Hard to argue much with that. I feel a little strange trying to force
>     the allocation of the first block, but I suppose in practice "almost no
>     preallocation" is indistinguishable from "exactly no preallocation" if
>     you squint.
> 
> 
> Right.
> 
> The real issue is that filesystems and block devices do not expose the
> alignment
> requirement for direct I/O, so we need to use these hacks and assumptions.
> 
> With local XFS we use xfsctl(XFS_IOC_DIOINFO) to get request_alignment,
> but this does
> not help for XFS filesystem used by Gluster on the server side.
> 
> I hope that Niels is working on adding similar ioctl for Glsuter, os it
> can expose the properties
> of the remote filesystem.
> 
> Nir

That sounds quite a bit less hacky, but I agree we still have to do what
we can in the meantime.

(It looks like you've been hashing this out with Kevin for a while, so
I'm going to sheepishly defer to his judgment on this patch. While I
think it's probably a fine trade-off, I can't really say off-hand if
there's a better, more targeted way to accomplish it.)

--js


  reply	other threads:[~2019-08-16 23:01 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-16 21:21 [Qemu-devel] [PATCH] block: posix: Always allocate the first block Nir Soffer
2019-08-16 21:57 ` [Qemu-devel] [Qemu-block] " John Snow
2019-08-16 22:45   ` Nir Soffer
2019-08-16 23:00     ` John Snow [this message]
2019-08-22 11:30 ` [Qemu-devel] " Nir Soffer
2019-08-22 14:28 ` Max Reitz
2019-08-22 16:39   ` Nir Soffer
2019-08-22 18:11     ` Max Reitz
2019-08-22 19:01       ` Nir Soffer
2019-08-23 13:58         ` Max Reitz
2019-08-23 16:30           ` Nir Soffer
2019-08-23 17:41             ` Max Reitz
2019-08-23 16:48           ` Nir Soffer
2019-08-23 17:53             ` Max Reitz
2019-08-24 22:57               ` Nir Soffer
2019-08-25  7:44 ` [Qemu-devel] [Qemu-block] " Maxim Levitsky
2019-08-25 19:51   ` Nir Soffer
2019-08-25 22:17     ` Maxim Levitsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c9805bc4-232b-aa72-2f48-878a7d1a55bb@redhat.com \
    --to=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=ndevos@redhat.com \
    --cc=nirsof@gmail.com \
    --cc=nsoffer@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.