All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Maydell <peter.maydell@linaro.org>
To: David CARLIER <devnexen@gmail.com>
Cc: QEMU Trivial <qemu-trivial@nongnu.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [PATCH 2/3] exec: posix_madvise usage on SunOS.
Date: Sat, 18 Jul 2020 15:15:38 +0100	[thread overview]
Message-ID: <CAFEAcA96mh_4EkKz31HgzfPOEQvhta8VTcvMV=An8Us0+x=NfQ@mail.gmail.com> (raw)
In-Reply-To: <CA+XhMqwtUrSpCqNGEETBijewzvmpno8OAX_PKSShDP_gUQ-3VQ@mail.gmail.com>

On Sat, 18 Jul 2020 at 14:21, David CARLIER <devnexen@gmail.com> wrote:
>
> From a9e3cced279ae55a59847ba232f7828bc2479367 Mon Sep 17 00:00:00 2001
> From: David Carlier <devnexen@gmail.com>
> Date: Sat, 18 Jul 2020 13:29:44 +0100
> Subject: [PATCH 2/3] exec: posix_madvise usage on SunOS.
>
> with _XOPEN_SOURCE set, the older mman.h API based on caddr_t handling
> is not accessible thus using posix_madvise here.
>
> Signed-off-by: David Carlier <devnexen@gmail.com>
> ---
>  exec.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/exec.c b/exec.c
> index 6f381f98e2..0466a75b89 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -3964,7 +3964,15 @@ int ram_block_discard_range(RAMBlock *rb,
> uint64_t start, size_t length)
>               * fallocate'd away).
>               */
>  #if defined(CONFIG_MADVISE)
> +#if !defined(CONFIG_SOLARIS)
>              ret =  madvise(host_startaddr, length, MADV_DONTNEED);
> +#else
> +        /*
> +         * mmap and its caddr_t based api is not accessible
> +         * with _XOPEN_SOURCE set on illumos
> +         */
> +            ret =  posix_madvise(host_startaddr, length, POSIX_MADV_DONTNEED);
> +#endif

Hi. I'm not sure this patch will do the right thing, because
I don't think that Solaris's POSIX_MADV_DONTNEED provides
the semantics that this QEMU function says it needs. The
comment at the top of the function says:

 * Unmap pages of memory from start to start+length such that
 * they a) read as 0, b) Trigger whatever fault mechanism
 * the OS provides for postcopy.
 * The pages must be unmapped by the end of the function.

(Aside: the use of 'unmap' in this comment is a bit confusing,
because it clearly doesn't mean 'unmap' if it wants read-as-0.
And the reference to faults on postcopy is incomprehensible
to me: if memory is read-as-0 it isn't going to fault.)

Linux's madvise(MADV_DONTNEED) does guarantee us this
read-as-zero behaviour. (It's a silly API choice that Linux
put this behaviour behind madvise, which is supposed to be
merely advisory, but that's how it is.) The Solaris
posix_madvise() manpage says it is merely advisory and
doesn't affect the behaviour of accesses to the memory.

If posix_madvise() behaviour was OK in this function, the
right way to fix this would be to use qemu_madvise()
instead, which already provides this "if host has
madvise(), use it, otherwise use posix_madvise()" logic.
But I suspect that the direct madvise() here is deliberate.

Side note: not sure the current code is correct for the
BSDs either -- they have madvise() but don't provide
Linux's really-read-as-zero guarantee for MADV_DONTNEED.
So we should probably be doing something else there, and
whatever that something-else is is probably also what
Solaris wants.

We use ram_block_discard_range() only in migration and
in virtio-balloon and virtio-mem; I've cc'd some people
who hopefully understand what the requirements on this
function are and might have a view on what the not-Linux
implementation should look like. (David Gilbert: git
blame says you wrote this code :-))

thanks
-- PMM


  reply	other threads:[~2020-07-18 14:16 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-18 13:20 [PATCH 2/3] exec: posix_madvise usage on SunOS David CARLIER
2020-07-18 14:15 ` Peter Maydell [this message]
2020-07-18 15:23   ` David CARLIER
2020-07-20 19:13   ` Dr. David Alan Gilbert
2020-07-21  8:22     ` David Hildenbrand
2020-07-21  9:31       ` Peter Maydell
2020-07-21  9:48         ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFEAcA96mh_4EkKz31HgzfPOEQvhta8VTcvMV=An8Us0+x=NfQ@mail.gmail.com' \
    --to=peter.maydell@linaro.org \
    --cc=devnexen@gmail.com \
    --cc=dgilbert@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-trivial@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.