qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Nir Soffer <nirsof@gmail.com>, qemu-block@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Nir Soffer <nsoffer@redhat.com>,
	integration@gluster.org, qemu-devel@nongnu.org,
	Niels de Vos <ndevos@redhat.com>
Subject: Re: [Qemu-devel] [PATCH] block: gluster: Probe alignment limits
Date: Wed, 21 Aug 2019 19:04:17 +0200	[thread overview]
Message-ID: <9b59c887-ff97-ff0a-fa18-ef9a19c1ad6e@redhat.com> (raw)
In-Reply-To: <20190817212111.13265-1-nsoffer@redhat.com>


[-- Attachment #1.1: Type: text/plain, Size: 4712 bytes --]

On 17.08.19 23:21, Nir Soffer wrote:
> Implement alignment probing similar to file-posix, by reading from the
> first 4k of the image.
> 
> Before this change, provisioning a VM on storage with sector size of
> 4096 bytes would fail when the installer try to create filesystems. Here
> is an example command that reproduces this issue:
> 
>     $ qemu-system-x86_64 -accel kvm -m 2048 -smp 2 \
>         -drive file=gluster://gluster1/gv0/fedora29.raw,format=raw,cache=none \
>         -cdrom Fedora-Server-dvd-x86_64-29-1.2.iso
> 
> The installer fails in few seconds when trying to create filesystem on
> /dev/mapper/fedora-root. In error report we can see that it failed with
> EINVAL (I could not extract the error from guest).
> 
> Copying disk fails with EINVAL:
> 
>     $ qemu-img convert -p -f raw -O raw -t none -T none \
>         gluster://gluster1/gv0/fedora29.raw \
>         gluster://gluster1/gv0/fedora29-clone.raw
>     qemu-img: error while writing sector 4190208: Invalid argument
> 
> This is a fix to same issue fixed in commit a6b257a08e3d (file-posix:
> Handle undetectable alignment) for gluster:// images.
> 
> This fix has the same limit, that the first block of the image should be
> allocated, otherwise we cannot detect the alignment and fallback to a
> safe value (4096) even when using storage with sector size of 512 bytes.
> 
> Signed-off-by: Nir Soffer <nsoffer@redhat.com>
> ---
>  block/gluster.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
> 
> diff --git a/block/gluster.c b/block/gluster.c
> index f64dc5b01e..d936240b72 100644
> --- a/block/gluster.c
> +++ b/block/gluster.c
> @@ -52,6 +52,9 @@
>  
>  #define GERR_INDEX_HINT "hint: check in 'server' array index '%d'\n"
>  
> +/* The value is known only on the server side. */
> +#define MAX_ALIGN 4096
> +
>  typedef struct GlusterAIOCB {
>      int64_t size;
>      int ret;
> @@ -902,8 +905,52 @@ out:
>      return ret;
>  }
>  
> +/*
> + * Check if read is allowed with given memory buffer and length.
> + *
> + * This function is used to check O_DIRECT request alignment.
> + */
> +static bool gluster_is_io_aligned(struct glfs_fd *fd, void *buf, size_t len)
> +{
> +    ssize_t ret = glfs_pread(fd, buf, len, 0, 0, NULL);
> +    return ret >= 0 || errno != EINVAL;

Is glfs_pread() guaranteed to return EINVAL on invalid alignment?
file-posix says this is only the case on Linux (for normal files).  Now
I also don’t know whether the gluster driver works on anything but Linux
anyway.

> +}
> +
> +static void gluster_probe_alignment(BlockDriverState *bs, struct glfs_fd *fd,
> +                                    Error **errp)
> +{
> +    char *buf;
> +    size_t alignments[] = {1, 512, 1024, 2048, 4096};
> +    size_t align;
> +    int i;
> +
> +    buf = qemu_memalign(MAX_ALIGN, MAX_ALIGN);
> +
> +    for (i = 0; i < ARRAY_SIZE(alignments); i++) {
> +        align = alignments[i];
> +        if (gluster_is_io_aligned(fd, buf, align)) {
> +            /* Fallback to safe value. */
> +            bs->bl.request_alignment = (align != 1) ? align : MAX_ALIGN;
> +            break;
> +        }
> +    }

I don’t like the fact that the last element of alignments[] should be
the same as MAX_ALIGN, without that ever having been made explicit anywhere.

It’s a bit worse in the file-posix patch, because if getpagesize() is
greater than 4k, max_align will be greater than 4k.  But MAX_BLOCKSIZE
is 4k, too, so I suppose we wouldn’t support any block size beyond that
anyway, which makes the error message appropriate still.

> +
> +    qemu_vfree(buf);
> +
> +    if (!bs->bl.request_alignment) {
> +        error_setg(errp, "Could not find working O_DIRECT alignment");
> +        error_append_hint(errp, "Try cache.direct=off\n");
> +    }
> +}
> +
>  static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
> +    BDRVGlusterState *s = bs->opaque;
> +
> +    gluster_probe_alignment(bs, s->fd, errp);
> +
> +    bs->bl.min_mem_alignment = bs->bl.request_alignment;

Well, I’ll just trust you that there is no weird system where the memory
alignment is greater than the request alignment.

> +    bs->bl.opt_mem_alignment = MAX(bs->bl.request_alignment, MAX_ALIGN);

Isn’t request_alignment guaranteed to not exceed MAX_ALIGN, i.e. isn’t
this always MAX_ALIGN?

>      bs->bl.max_transfer = GLUSTER_MAX_TRANSFER;
>  }

file-posix has a check in raw_reopen_prepare() whether we can find a
working alignment for the new FD.  Shouldn’t we do the same in
qemu_gluster_reopen_prepare()?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  parent reply	other threads:[~2019-08-21 17:05 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-17 21:21 [Qemu-devel] [PATCH] block: gluster: Probe alignment limits Nir Soffer
2019-08-17 21:31 ` Nir Soffer
2019-08-21 17:04 ` Max Reitz [this message]
2019-08-22  7:03   ` Niels de Vos
2019-08-22 19:05     ` Nir Soffer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9b59c887-ff97-ff0a-fa18-ef9a19c1ad6e@redhat.com \
    --to=mreitz@redhat.com \
    --cc=integration@gluster.org \
    --cc=kwolf@redhat.com \
    --cc=ndevos@redhat.com \
    --cc=nirsof@gmail.com \
    --cc=nsoffer@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).