All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc
@ 2017-02-24 17:27 Daniel P. Berrange
  2017-02-24 17:33 ` no-reply
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Daniel P. Berrange @ 2017-02-24 17:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michal Privoznik, Jitendra Kolhe, Stefan Hajnoczi, Paolo Bonzini,
	Daniel P. Berrange

When using a memory-backend object with prealloc turned on, QEMU
will memset() the first byte in every memory page to zero. While
this might have been acceptable for memory backends associated
with RAM, this corrupts application data for NVDIMMs.

Instead of setting every page to zero, read the current byte
value and then just write that same value back, so we are not
corrupting the original data. Directly write the value instead
of memset()ing it, since there's no benefit to memset for a
single byte write.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
---

NB, I have not tested performance of this, so no idea if this
makes it better/worse/no-change. Would appreciate if Jitendra
could repeat tests to see if it impacts scalability at all.

 util/oslib-posix.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 35012b9..2a5bb93 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -355,7 +355,20 @@ void os_mem_prealloc(int fd, char *area, size_t memory, Error **errp)
 
         /* MAP_POPULATE silently ignores failures */
         for (i = 0; i < numpages; i++) {
-            memset(area + (hpagesize * i), 0, 1);
+            /*
+             * Read & write back the same value, so we don't
+             * corrupt existinng user/app data that might be
+             * stored.
+             *
+             * 'volatile' to stop compiler optimizing this away
+             * to a no-op
+             *
+             * TODO: get a better solution from kernel so we
+             * don't need to write at all so we don't cause
+             * wear on the storage backing the region...
+             */
+            volatile char val = *(area + (hpagesize * i));
+            *(area + (hpagesize * i)) = val;
         }
     }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc
  2017-02-24 17:27 [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc Daniel P. Berrange
@ 2017-02-24 17:33 ` no-reply
  2017-02-27  9:25   ` Daniel P. Berrange
  2017-02-24 19:04 ` Eric Blake
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: no-reply @ 2017-02-24 17:33 UTC (permalink / raw)
  To: berrange; +Cc: famz, qemu-devel, jitendra.kolhe, mprivozn, stefanha, pbonzini

Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20170224172714.26026-1-berrange@redhat.com
Type: series
Subject: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]      patchew/1487262963-11519-1-git-send-email-peter.maydell@linaro.org -> patchew/1487262963-11519-1-git-send-email-peter.maydell@linaro.org
 - [tag update]      patchew/1487920971-16519-1-git-send-email-zhangchen.fnst@cn.fujitsu.com -> patchew/1487920971-16519-1-git-send-email-zhangchen.fnst@cn.fujitsu.com
 * [new tag]         patchew/20170224172714.26026-1-berrange@redhat.com -> patchew/20170224172714.26026-1-berrange@redhat.com
Switched to a new branch 'test'
131074e os: don't corrupt pre-existing memory-backend data with prealloc

=== OUTPUT BEGIN ===
Checking PATCH 1/1: os: don't corrupt pre-existing memory-backend data with prealloc...
ERROR: Use of volatile is usually wrong: see Documentation/volatile-considered-harmful.txt
#42: FILE: util/oslib-posix.c:370:
+            volatile char val = *(area + (hpagesize * i));

total: 1 errors, 0 warnings, 21 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc
  2017-02-24 17:27 [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc Daniel P. Berrange
  2017-02-24 17:33 ` no-reply
@ 2017-02-24 19:04 ` Eric Blake
  2017-02-27 13:28 ` Stefan Hajnoczi
  2017-02-27 15:53 ` Andrea Arcangeli
  3 siblings, 0 replies; 16+ messages in thread
From: Eric Blake @ 2017-02-24 19:04 UTC (permalink / raw)
  To: Daniel P. Berrange, qemu-devel
  Cc: Jitendra Kolhe, Michal Privoznik, Stefan Hajnoczi, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1631 bytes --]

On 02/24/2017 11:27 AM, Daniel P. Berrange wrote:
> When using a memory-backend object with prealloc turned on, QEMU
> will memset() the first byte in every memory page to zero. While
> this might have been acceptable for memory backends associated
> with RAM, this corrupts application data for NVDIMMs.
> 
> Instead of setting every page to zero, read the current byte
> value and then just write that same value back, so we are not
> corrupting the original data. Directly write the value instead
> of memset()ing it, since there's no benefit to memset for a
> single byte write.
> 
> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
> ---
> 

>          /* MAP_POPULATE silently ignores failures */
>          for (i = 0; i < numpages; i++) {
> -            memset(area + (hpagesize * i), 0, 1);
> +            /*
> +             * Read & write back the same value, so we don't
> +             * corrupt existinng user/app data that might be

s/existinng/existing/

> +             * stored.
> +             *
> +             * 'volatile' to stop compiler optimizing this away
> +             * to a no-op
> +             *
> +             * TODO: get a better solution from kernel so we
> +             * don't need to write at all so we don't cause
> +             * wear on the storage backing the region...
> +             */
> +            volatile char val = *(area + (hpagesize * i));
> +            *(area + (hpagesize * i)) = val;
>          }
>      }
>  
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc
  2017-02-24 17:33 ` no-reply
@ 2017-02-27  9:25   ` Daniel P. Berrange
  0 siblings, 0 replies; 16+ messages in thread
From: Daniel P. Berrange @ 2017-02-27  9:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, jitendra.kolhe, mprivozn, stefanha, pbonzini

On Fri, Feb 24, 2017 at 09:33:05AM -0800, no-reply@patchew.org wrote:
> === OUTPUT BEGIN ===
> Checking PATCH 1/1: os: don't corrupt pre-existing memory-backend data with prealloc...
> ERROR: Use of volatile is usually wrong: see Documentation/volatile-considered-harmful.txt

ERROR: checkpatch.pl is usually wrong ;-P

Heh, it is refering to a doc in the kernel source tree, which does not even
exist at that path location anymore :-)


> #42: FILE: util/oslib-posix.c:370:
> +            volatile char val = *(area + (hpagesize * i));
> 
> total: 1 errors, 0 warnings, 21 lines checked
> 
> Your patch has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> 
> === OUTPUT END ===

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc
  2017-02-24 17:27 [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc Daniel P. Berrange
  2017-02-24 17:33 ` no-reply
  2017-02-24 19:04 ` Eric Blake
@ 2017-02-27 13:28 ` Stefan Hajnoczi
  2017-02-27 15:53 ` Andrea Arcangeli
  3 siblings, 0 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2017-02-27 13:28 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: qemu-devel, Jitendra Kolhe, Michal Privoznik, Stefan Hajnoczi,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1033 bytes --]

On Fri, Feb 24, 2017 at 05:27:14PM +0000, Daniel P. Berrange wrote:
> When using a memory-backend object with prealloc turned on, QEMU
> will memset() the first byte in every memory page to zero. While
> this might have been acceptable for memory backends associated
> with RAM, this corrupts application data for NVDIMMs.
> 
> Instead of setting every page to zero, read the current byte
> value and then just write that same value back, so we are not
> corrupting the original data. Directly write the value instead
> of memset()ing it, since there's no benefit to memset for a
> single byte write.
> 
> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
> ---
> 
> NB, I have not tested performance of this, so no idea if this
> makes it better/worse/no-change. Would appreciate if Jitendra
> could repeat tests to see if it impacts scalability at all.
> 
>  util/oslib-posix.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc
  2017-02-24 17:27 [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc Daniel P. Berrange
                   ` (2 preceding siblings ...)
  2017-02-27 13:28 ` Stefan Hajnoczi
@ 2017-02-27 15:53 ` Andrea Arcangeli
  3 siblings, 0 replies; 16+ messages in thread
From: Andrea Arcangeli @ 2017-02-27 15:53 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: qemu-devel, Jitendra Kolhe, Michal Privoznik, Stefan Hajnoczi,
	Paolo Bonzini

Hello,

On Fri, Feb 24, 2017 at 05:27:14PM +0000, Daniel P. Berrange wrote:
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 35012b9..2a5bb93 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -355,7 +355,20 @@ void os_mem_prealloc(int fd, char *area, size_t memory, Error **errp)
>  
>          /* MAP_POPULATE silently ignores failures */
>          for (i = 0; i < numpages; i++) {
> -            memset(area + (hpagesize * i), 0, 1);
> +            /*
> +             * Read & write back the same value, so we don't
> +             * corrupt existinng user/app data that might be
> +             * stored.
> +             *
> +             * 'volatile' to stop compiler optimizing this away
> +             * to a no-op
> +             *
> +             * TODO: get a better solution from kernel so we
> +             * don't need to write at all so we don't cause
> +             * wear on the storage backing the region...
> +             */

It would be better that if MAP_POPULATE failed like mlock does, at
least for those get_user_pages errors like -ENOMEM that are already a
retval for mmap it sounds weird that it's not being forwarded. It'd be
enough to call do_munmap to rollback all mmap work before returning
-ENOMEM from mmap instead of succees.

-EFAULT or other get_user_pages errors are more troublesome, as mmap
wouldn't otherwise return -EFAULT, but those would generally be qemu
bugs or hardware errors and MAP_POPULATE I doubt is meant to be a
debug/robustness aid.  All we care about here is memory resource
management and not to risk -ENOMEM at runtime and immediate peak
performance. So perhaps a kernel patch that forwards -ENOMEM from
__mm_populate to mmap retval would be enough to make it usable?

> +            volatile char val = *(area + (hpagesize * i));
> +            *(area + (hpagesize * i)) = val;

It's not "var" that should be volatile, nor the pointer, but only the
memory area you write to, because the write to the memory are is the
only thing we care about.

If "val" is volatile gcc could read the memory area and cache the
value of the memory area, write it into the volatile var, then read
the volatile var again, compare it with the cached value and skip the
write to the memory if it's equal. In practice it's likely not doing
that, and making var volatile has the same effect, so it's probably
only a theoretical issue of course.

I would suggest this to correct it (untested):

	char *tmparea = area + (hpagesize * i);
	*(volatile char *)tmparea = *tmparea;

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc
  2017-02-27 13:46   ` Rik van Riel
@ 2017-02-27 13:58     ` Daniel P. Berrange
  0 siblings, 0 replies; 16+ messages in thread
From: Daniel P. Berrange @ 2017-02-27 13:58 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Stefan Hajnoczi, qemu-devel, Jitendra Kolhe, Paolo Bonzini,
	Stefan Hajnoczi, Michal Privoznik, Andrea Arcangeli

On Mon, Feb 27, 2017 at 08:46:10AM -0500, Rik van Riel wrote:
> On Mon, 2017-02-27 at 11:10 +0000, Stefan Hajnoczi wrote:
> > On Thu, Feb 23, 2017 at 10:59:22AM +0000, Daniel P. Berrange wrote:
> > > When using a memory-backend object with prealloc turned on, QEMU
> > > will memset() the first byte in every memory page to zero. While
> > > this might have been acceptable for memory backends associated
> > > with RAM, this corrupts application data for NVDIMMs.
> > > 
> > > Instead of setting every page to zero, read the current byte
> > > value and then just write that same value back, so we are not
> > > corrupting the original data.
> > > 
> > > Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
> > > ---
> > > 
> > > I'm unclear if this is actually still safe in practice ? Is the
> > > compiler permitted to optimize away the read+write since it doesn't
> > > change the memory value. I'd hope not, but I've been surprised
> > > before...
> > > 
> > > IMHO this is another factor in favour of requesting an API from
> > > the kernel to provide the prealloc behaviour we want.
> > > 
> > >  util/oslib-posix.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> > > index 35012b9..8f5b656 100644
> > > --- a/util/oslib-posix.c
> > > +++ b/util/oslib-posix.c
> > > @@ -355,7 +355,8 @@ void os_mem_prealloc(int fd, char *area, size_t
> > > memory, Error **errp)
> > >  
> > >          /* MAP_POPULATE silently ignores failures */
> > >          for (i = 0; i < numpages; i++) {
> > > -            memset(area + (hpagesize * i), 0, 1);
> > > +            char val = *(area + (hpagesize * i));
> > > +            memset(area + (hpagesize * i), 0, val);
> > 
> > Please include a comment in the final patch explaining why we want to
> > preserve memory contents.
> > 
> > In the case of NVDIMM I'm not sure if the memset is needed at
> > all.  The
> > memory already exists - no new pages need to be allocated by the
> > kernel.
> > We just want the page table entries to be populated for the NVDIMM
> > when
> > -mem-prealloc is used.
> > 
> > Perhaps Andrea or Rik have ideas on improving the kernel interface
> > and
> > whether mmap(MAP_POPULATE) should be used with NVDIMM instead of this
> > userspace "touch every page" workaround?
> 
> Why do we need the page table entries to be populated
> in advance at all?

It is a choice apps using QEMU have - they can tell QEMU whether they
want prealloc or not - if they decide they do want it, then we should
not be corrupting the data.

> The high cost of the page fault for regular memory
> is zeroing out the memory pages before we give them
> to userspace.

NVDIMM in the guest might be backed by regular memory in the host - QEMU
doesn't require use of NVDIMM in the host.

> Simply faulting in the NVDIMM memory as it is touched
> may make more sense than treating it like DRAM,
> especially given that with DAX, NVDIMM areas may be
> orders of magnitude larger than RAM, and we really
> do not want to set up all the page tables for every
> part of the guest DAX "disk".



Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc
  2017-02-27 11:10 ` Stefan Hajnoczi
@ 2017-02-27 13:46   ` Rik van Riel
  2017-02-27 13:58     ` Daniel P. Berrange
  0 siblings, 1 reply; 16+ messages in thread
From: Rik van Riel @ 2017-02-27 13:46 UTC (permalink / raw)
  To: Stefan Hajnoczi, Daniel P. Berrange
  Cc: qemu-devel, Jitendra Kolhe, Paolo Bonzini, Stefan Hajnoczi,
	Michal Privoznik, Andrea Arcangeli

On Mon, 2017-02-27 at 11:10 +0000, Stefan Hajnoczi wrote:
> On Thu, Feb 23, 2017 at 10:59:22AM +0000, Daniel P. Berrange wrote:
> > When using a memory-backend object with prealloc turned on, QEMU
> > will memset() the first byte in every memory page to zero. While
> > this might have been acceptable for memory backends associated
> > with RAM, this corrupts application data for NVDIMMs.
> > 
> > Instead of setting every page to zero, read the current byte
> > value and then just write that same value back, so we are not
> > corrupting the original data.
> > 
> > Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
> > ---
> > 
> > I'm unclear if this is actually still safe in practice ? Is the
> > compiler permitted to optimize away the read+write since it doesn't
> > change the memory value. I'd hope not, but I've been surprised
> > before...
> > 
> > IMHO this is another factor in favour of requesting an API from
> > the kernel to provide the prealloc behaviour we want.
> > 
> >  util/oslib-posix.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> > index 35012b9..8f5b656 100644
> > --- a/util/oslib-posix.c
> > +++ b/util/oslib-posix.c
> > @@ -355,7 +355,8 @@ void os_mem_prealloc(int fd, char *area, size_t
> > memory, Error **errp)
> >  
> >          /* MAP_POPULATE silently ignores failures */
> >          for (i = 0; i < numpages; i++) {
> > -            memset(area + (hpagesize * i), 0, 1);
> > +            char val = *(area + (hpagesize * i));
> > +            memset(area + (hpagesize * i), 0, val);
> 
> Please include a comment in the final patch explaining why we want to
> preserve memory contents.
> 
> In the case of NVDIMM I'm not sure if the memset is needed at
> all.  The
> memory already exists - no new pages need to be allocated by the
> kernel.
> We just want the page table entries to be populated for the NVDIMM
> when
> -mem-prealloc is used.
> 
> Perhaps Andrea or Rik have ideas on improving the kernel interface
> and
> whether mmap(MAP_POPULATE) should be used with NVDIMM instead of this
> userspace "touch every page" workaround?

Why do we need the page table entries to be populated
in advance at all?

The high cost of the page fault for regular memory
is zeroing out the memory pages before we give them
to userspace.

Simply faulting in the NVDIMM memory as it is touched
may make more sense than treating it like DRAM,
especially given that with DAX, NVDIMM areas may be
orders of magnitude larger than RAM, and we really
do not want to set up all the page tables for every
part of the guest DAX "disk".

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc
  2017-02-23 10:59 Daniel P. Berrange
  2017-02-23 12:05 ` Michal Privoznik
@ 2017-02-27 11:10 ` Stefan Hajnoczi
  2017-02-27 13:46   ` Rik van Riel
  1 sibling, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2017-02-27 11:10 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: qemu-devel, Jitendra Kolhe, Paolo Bonzini, Stefan Hajnoczi,
	Michal Privoznik, Rik van Riel, Andrea Arcangeli

[-- Attachment #1: Type: text/plain, Size: 2055 bytes --]

On Thu, Feb 23, 2017 at 10:59:22AM +0000, Daniel P. Berrange wrote:
> When using a memory-backend object with prealloc turned on, QEMU
> will memset() the first byte in every memory page to zero. While
> this might have been acceptable for memory backends associated
> with RAM, this corrupts application data for NVDIMMs.
> 
> Instead of setting every page to zero, read the current byte
> value and then just write that same value back, so we are not
> corrupting the original data.
> 
> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
> ---
> 
> I'm unclear if this is actually still safe in practice ? Is the
> compiler permitted to optimize away the read+write since it doesn't
> change the memory value. I'd hope not, but I've been surprised
> before...
> 
> IMHO this is another factor in favour of requesting an API from
> the kernel to provide the prealloc behaviour we want.
> 
>  util/oslib-posix.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 35012b9..8f5b656 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -355,7 +355,8 @@ void os_mem_prealloc(int fd, char *area, size_t memory, Error **errp)
>  
>          /* MAP_POPULATE silently ignores failures */
>          for (i = 0; i < numpages; i++) {
> -            memset(area + (hpagesize * i), 0, 1);
> +            char val = *(area + (hpagesize * i));
> +            memset(area + (hpagesize * i), 0, val);

Please include a comment in the final patch explaining why we want to
preserve memory contents.

In the case of NVDIMM I'm not sure if the memset is needed at all.  The
memory already exists - no new pages need to be allocated by the kernel.
We just want the page table entries to be populated for the NVDIMM when
-mem-prealloc is used.

Perhaps Andrea or Rik have ideas on improving the kernel interface and
whether mmap(MAP_POPULATE) should be used with NVDIMM instead of this
userspace "touch every page" workaround?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc
  2017-02-24 12:12         ` Dr. David Alan Gilbert
@ 2017-02-24 12:18           ` Paolo Bonzini
  0 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2017-02-24 12:18 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Daniel P. Berrange
  Cc: Michal Privoznik, Jitendra Kolhe, qemu-devel, Stefan Hajnoczi



On 24/02/2017 13:12, Dr. David Alan Gilbert wrote:
>> Ok, yeah, it makes sense that the compiler can optimize that away without
>> volatile. I wonder if adding volatile has much of a performance impact on
>> this loop...
> I don't think we have anything else in QEMU to do it (other than atomic's
> but we don't need this to be atomic).   I don't think the use of memset()
> helps, because the compiler is free to optimise that out as well; so
> I think 'volatile' is a reasonable use (although I seem to have a soft-spot
> for volatile and I know everyone else tells me I'm mad).

Yes, I think it is fine to use volatile here.

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc
  2017-02-24  9:24       ` Daniel P. Berrange
@ 2017-02-24 12:12         ` Dr. David Alan Gilbert
  2017-02-24 12:18           ` Paolo Bonzini
  0 siblings, 1 reply; 16+ messages in thread
From: Dr. David Alan Gilbert @ 2017-02-24 12:12 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Michal Privoznik, Jitendra Kolhe, Paolo Bonzini, qemu-devel,
	Stefan Hajnoczi

* Daniel P. Berrange (berrange@redhat.com) wrote:
> On Fri, Feb 24, 2017 at 10:05:17AM +0100, Michal Privoznik wrote:
> > On 02/23/2017 01:07 PM, Daniel P. Berrange wrote:
> > > On Thu, Feb 23, 2017 at 01:05:33PM +0100, Michal Privoznik wrote:
> > >> On 02/23/2017 11:59 AM, Daniel P. Berrange wrote:
> > >>> When using a memory-backend object with prealloc turned on, QEMU
> > >>> will memset() the first byte in every memory page to zero. While
> > >>> this might have been acceptable for memory backends associated
> > >>> with RAM, this corrupts application data for NVDIMMs.
> > >>>
> > >>> Instead of setting every page to zero, read the current byte
> > >>> value and then just write that same value back, so we are not
> > >>> corrupting the original data.
> > >>>
> > >>> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
> > >>> ---
> > >>>
> > >>> I'm unclear if this is actually still safe in practice ? Is the
> > >>> compiler permitted to optimize away the read+write since it doesn't
> > >>> change the memory value. I'd hope not, but I've been surprised
> > >>> before...
> > >>>
> > >>> IMHO this is another factor in favour of requesting an API from
> > >>> the kernel to provide the prealloc behaviour we want.
> > >>>
> > >>>  util/oslib-posix.c | 3 ++-
> > >>>  1 file changed, 2 insertions(+), 1 deletion(-)
> > >>>
> > >>> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> > >>> index 35012b9..8f5b656 100644
> > >>> --- a/util/oslib-posix.c
> > >>> +++ b/util/oslib-posix.c
> > >>> @@ -355,7 +355,8 @@ void os_mem_prealloc(int fd, char *area, size_t memory, Error **errp)
> > >>>  
> > >>>          /* MAP_POPULATE silently ignores failures */
> > >>>          for (i = 0; i < numpages; i++) {
> > >>> -            memset(area + (hpagesize * i), 0, 1);
> > >>> +            char val = *(area + (hpagesize * i));
> > >>> +            memset(area + (hpagesize * i), 0, val);
> > >>
> > >> I think you wanted:
> > >>
> > >> memset(area + (hpagesize * i), val, 1);
> > >>
> > >> because what you are suggesting will overwrite even more than the first
> > >> byte with zeroes.
> > > 
> > > Lol, yes, I'm stupid.
> > > 
> > > Anyway, rather than repost this yet, I'm interested if this is actually
> > > the right way to fix the problem or if we should do something totally
> > > different....
> > 
> > So, I've done some analysis and if the optimizations are enabled, this
> > whole body is optimized away. Not the loop though. Here's what I've tested:
> > 
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <string.h>
> > 
> > int main(int argc, char *argv[])
> > {
> >     int ret = EXIT_FAILURE;
> >     unsigned char *ptr;
> >     size_t i, j;
> > 
> >     if (!(ptr = malloc(1024 * 4))) {
> >         perror("malloc");
> >         goto cleanup;
> >     }
> > 
> >     for (i = 0; i < 4; i++) {
> >         unsigned char val = ptr[i*1024];
> >         memset(ptr + i * 1024, val, 1);
> >     }
> > 
> >     ret = EXIT_SUCCESS;
> >  cleanup:
> >     free(ptr);
> >     return ret;
> > }
> > 
> > 
> > But if I make @val volatile, I can enforce the compiler to include the
> > body of the loop and actually read and write some bytes. BTW: if I
> > replace memset with *(ptr + i * 1024) = val; and don't make @val
> > volatile even the loop is optimized away.
> 
> Ok, yeah, it makes sense that the compiler can optimize that away without
> volatile. I wonder if adding volatile has much of a performance impact on
> this loop...

I don't think we have anything else in QEMU to do it (other than atomic's
but we don't need this to be atomic).   I don't think the use of memset()
helps, because the compiler is free to optimise that out as well; so
I think 'volatile' is a reasonable use (although I seem to have a soft-spot
for volatile and I know everyone else tells me I'm mad).

However, note the performance of this loop is important as shown by
Jitendra's recent patchset to parallelise it.
I wonder if MADV_WILLNEED makes any difference.

Dave

> Regards,
> Daniel
> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc
  2017-02-24  9:05     ` Michal Privoznik
@ 2017-02-24  9:24       ` Daniel P. Berrange
  2017-02-24 12:12         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel P. Berrange @ 2017-02-24  9:24 UTC (permalink / raw)
  To: Michal Privoznik
  Cc: Jitendra Kolhe, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

On Fri, Feb 24, 2017 at 10:05:17AM +0100, Michal Privoznik wrote:
> On 02/23/2017 01:07 PM, Daniel P. Berrange wrote:
> > On Thu, Feb 23, 2017 at 01:05:33PM +0100, Michal Privoznik wrote:
> >> On 02/23/2017 11:59 AM, Daniel P. Berrange wrote:
> >>> When using a memory-backend object with prealloc turned on, QEMU
> >>> will memset() the first byte in every memory page to zero. While
> >>> this might have been acceptable for memory backends associated
> >>> with RAM, this corrupts application data for NVDIMMs.
> >>>
> >>> Instead of setting every page to zero, read the current byte
> >>> value and then just write that same value back, so we are not
> >>> corrupting the original data.
> >>>
> >>> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
> >>> ---
> >>>
> >>> I'm unclear if this is actually still safe in practice ? Is the
> >>> compiler permitted to optimize away the read+write since it doesn't
> >>> change the memory value. I'd hope not, but I've been surprised
> >>> before...
> >>>
> >>> IMHO this is another factor in favour of requesting an API from
> >>> the kernel to provide the prealloc behaviour we want.
> >>>
> >>>  util/oslib-posix.c | 3 ++-
> >>>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> >>> index 35012b9..8f5b656 100644
> >>> --- a/util/oslib-posix.c
> >>> +++ b/util/oslib-posix.c
> >>> @@ -355,7 +355,8 @@ void os_mem_prealloc(int fd, char *area, size_t memory, Error **errp)
> >>>  
> >>>          /* MAP_POPULATE silently ignores failures */
> >>>          for (i = 0; i < numpages; i++) {
> >>> -            memset(area + (hpagesize * i), 0, 1);
> >>> +            char val = *(area + (hpagesize * i));
> >>> +            memset(area + (hpagesize * i), 0, val);
> >>
> >> I think you wanted:
> >>
> >> memset(area + (hpagesize * i), val, 1);
> >>
> >> because what you are suggesting will overwrite even more than the first
> >> byte with zeroes.
> > 
> > Lol, yes, I'm stupid.
> > 
> > Anyway, rather than repost this yet, I'm interested if this is actually
> > the right way to fix the problem or if we should do something totally
> > different....
> 
> So, I've done some analysis and if the optimizations are enabled, this
> whole body is optimized away. Not the loop though. Here's what I've tested:
> 
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> 
> int main(int argc, char *argv[])
> {
>     int ret = EXIT_FAILURE;
>     unsigned char *ptr;
>     size_t i, j;
> 
>     if (!(ptr = malloc(1024 * 4))) {
>         perror("malloc");
>         goto cleanup;
>     }
> 
>     for (i = 0; i < 4; i++) {
>         unsigned char val = ptr[i*1024];
>         memset(ptr + i * 1024, val, 1);
>     }
> 
>     ret = EXIT_SUCCESS;
>  cleanup:
>     free(ptr);
>     return ret;
> }
> 
> 
> But if I make @val volatile, I can enforce the compiler to include the
> body of the loop and actually read and write some bytes. BTW: if I
> replace memset with *(ptr + i * 1024) = val; and don't make @val
> volatile even the loop is optimized away.

Ok, yeah, it makes sense that the compiler can optimize that away without
volatile. I wonder if adding volatile has much of a performance impact on
this loop...

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc
  2017-02-23 12:07   ` Daniel P. Berrange
@ 2017-02-24  9:05     ` Michal Privoznik
  2017-02-24  9:24       ` Daniel P. Berrange
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Privoznik @ 2017-02-24  9:05 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Jitendra Kolhe, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

On 02/23/2017 01:07 PM, Daniel P. Berrange wrote:
> On Thu, Feb 23, 2017 at 01:05:33PM +0100, Michal Privoznik wrote:
>> On 02/23/2017 11:59 AM, Daniel P. Berrange wrote:
>>> When using a memory-backend object with prealloc turned on, QEMU
>>> will memset() the first byte in every memory page to zero. While
>>> this might have been acceptable for memory backends associated
>>> with RAM, this corrupts application data for NVDIMMs.
>>>
>>> Instead of setting every page to zero, read the current byte
>>> value and then just write that same value back, so we are not
>>> corrupting the original data.
>>>
>>> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
>>> ---
>>>
>>> I'm unclear if this is actually still safe in practice ? Is the
>>> compiler permitted to optimize away the read+write since it doesn't
>>> change the memory value. I'd hope not, but I've been surprised
>>> before...
>>>
>>> IMHO this is another factor in favour of requesting an API from
>>> the kernel to provide the prealloc behaviour we want.
>>>
>>>  util/oslib-posix.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
>>> index 35012b9..8f5b656 100644
>>> --- a/util/oslib-posix.c
>>> +++ b/util/oslib-posix.c
>>> @@ -355,7 +355,8 @@ void os_mem_prealloc(int fd, char *area, size_t memory, Error **errp)
>>>  
>>>          /* MAP_POPULATE silently ignores failures */
>>>          for (i = 0; i < numpages; i++) {
>>> -            memset(area + (hpagesize * i), 0, 1);
>>> +            char val = *(area + (hpagesize * i));
>>> +            memset(area + (hpagesize * i), 0, val);
>>
>> I think you wanted:
>>
>> memset(area + (hpagesize * i), val, 1);
>>
>> because what you are suggesting will overwrite even more than the first
>> byte with zeroes.
> 
> Lol, yes, I'm stupid.
> 
> Anyway, rather than repost this yet, I'm interested if this is actually
> the right way to fix the problem or if we should do something totally
> different....

So, I've done some analysis and if the optimizations are enabled, this
whole body is optimized away. Not the loop though. Here's what I've tested:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char *argv[])
{
    int ret = EXIT_FAILURE;
    unsigned char *ptr;
    size_t i, j;

    if (!(ptr = malloc(1024 * 4))) {
        perror("malloc");
        goto cleanup;
    }

    for (i = 0; i < 4; i++) {
        unsigned char val = ptr[i*1024];
        memset(ptr + i * 1024, val, 1);
    }

    ret = EXIT_SUCCESS;
 cleanup:
    free(ptr);
    return ret;
}


But if I make @val volatile, I can enforce the compiler to include the
body of the loop and actually read and write some bytes. BTW: if I
replace memset with *(ptr + i * 1024) = val; and don't make @val
volatile even the loop is optimized away.

I was compiling with:

gcc -S -O3 -o test.S test.c

Michal

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc
  2017-02-23 12:05 ` Michal Privoznik
@ 2017-02-23 12:07   ` Daniel P. Berrange
  2017-02-24  9:05     ` Michal Privoznik
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel P. Berrange @ 2017-02-23 12:07 UTC (permalink / raw)
  To: Michal Privoznik
  Cc: qemu-devel, Paolo Bonzini, Stefan Hajnoczi, Jitendra Kolhe

On Thu, Feb 23, 2017 at 01:05:33PM +0100, Michal Privoznik wrote:
> On 02/23/2017 11:59 AM, Daniel P. Berrange wrote:
> > When using a memory-backend object with prealloc turned on, QEMU
> > will memset() the first byte in every memory page to zero. While
> > this might have been acceptable for memory backends associated
> > with RAM, this corrupts application data for NVDIMMs.
> > 
> > Instead of setting every page to zero, read the current byte
> > value and then just write that same value back, so we are not
> > corrupting the original data.
> > 
> > Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
> > ---
> > 
> > I'm unclear if this is actually still safe in practice ? Is the
> > compiler permitted to optimize away the read+write since it doesn't
> > change the memory value. I'd hope not, but I've been surprised
> > before...
> > 
> > IMHO this is another factor in favour of requesting an API from
> > the kernel to provide the prealloc behaviour we want.
> > 
> >  util/oslib-posix.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> > index 35012b9..8f5b656 100644
> > --- a/util/oslib-posix.c
> > +++ b/util/oslib-posix.c
> > @@ -355,7 +355,8 @@ void os_mem_prealloc(int fd, char *area, size_t memory, Error **errp)
> >  
> >          /* MAP_POPULATE silently ignores failures */
> >          for (i = 0; i < numpages; i++) {
> > -            memset(area + (hpagesize * i), 0, 1);
> > +            char val = *(area + (hpagesize * i));
> > +            memset(area + (hpagesize * i), 0, val);
> 
> I think you wanted:
> 
> memset(area + (hpagesize * i), val, 1);
> 
> because what you are suggesting will overwrite even more than the first
> byte with zeroes.

Lol, yes, I'm stupid.

Anyway, rather than repost this yet, I'm interested if this is actually
the right way to fix the problem or if we should do something totally
different....

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc
  2017-02-23 10:59 Daniel P. Berrange
@ 2017-02-23 12:05 ` Michal Privoznik
  2017-02-23 12:07   ` Daniel P. Berrange
  2017-02-27 11:10 ` Stefan Hajnoczi
  1 sibling, 1 reply; 16+ messages in thread
From: Michal Privoznik @ 2017-02-23 12:05 UTC (permalink / raw)
  To: Daniel P. Berrange, qemu-devel
  Cc: Paolo Bonzini, Stefan Hajnoczi, Jitendra Kolhe

On 02/23/2017 11:59 AM, Daniel P. Berrange wrote:
> When using a memory-backend object with prealloc turned on, QEMU
> will memset() the first byte in every memory page to zero. While
> this might have been acceptable for memory backends associated
> with RAM, this corrupts application data for NVDIMMs.
> 
> Instead of setting every page to zero, read the current byte
> value and then just write that same value back, so we are not
> corrupting the original data.
> 
> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
> ---
> 
> I'm unclear if this is actually still safe in practice ? Is the
> compiler permitted to optimize away the read+write since it doesn't
> change the memory value. I'd hope not, but I've been surprised
> before...
> 
> IMHO this is another factor in favour of requesting an API from
> the kernel to provide the prealloc behaviour we want.
> 
>  util/oslib-posix.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 35012b9..8f5b656 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -355,7 +355,8 @@ void os_mem_prealloc(int fd, char *area, size_t memory, Error **errp)
>  
>          /* MAP_POPULATE silently ignores failures */
>          for (i = 0; i < numpages; i++) {
> -            memset(area + (hpagesize * i), 0, 1);
> +            char val = *(area + (hpagesize * i));
> +            memset(area + (hpagesize * i), 0, val);

I think you wanted:

memset(area + (hpagesize * i), val, 1);

because what you are suggesting will overwrite even more than the first
byte with zeroes.

Michal

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc
@ 2017-02-23 10:59 Daniel P. Berrange
  2017-02-23 12:05 ` Michal Privoznik
  2017-02-27 11:10 ` Stefan Hajnoczi
  0 siblings, 2 replies; 16+ messages in thread
From: Daniel P. Berrange @ 2017-02-23 10:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Stefan Hajnoczi, Jitendra Kolhe, Michal Privoznik,
	Daniel P. Berrange

When using a memory-backend object with prealloc turned on, QEMU
will memset() the first byte in every memory page to zero. While
this might have been acceptable for memory backends associated
with RAM, this corrupts application data for NVDIMMs.

Instead of setting every page to zero, read the current byte
value and then just write that same value back, so we are not
corrupting the original data.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
---

I'm unclear if this is actually still safe in practice ? Is the
compiler permitted to optimize away the read+write since it doesn't
change the memory value. I'd hope not, but I've been surprised
before...

IMHO this is another factor in favour of requesting an API from
the kernel to provide the prealloc behaviour we want.

 util/oslib-posix.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 35012b9..8f5b656 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -355,7 +355,8 @@ void os_mem_prealloc(int fd, char *area, size_t memory, Error **errp)
 
         /* MAP_POPULATE silently ignores failures */
         for (i = 0; i < numpages; i++) {
-            memset(area + (hpagesize * i), 0, 1);
+            char val = *(area + (hpagesize * i));
+            memset(area + (hpagesize * i), 0, val);
         }
     }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-02-27 15:53 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-24 17:27 [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc Daniel P. Berrange
2017-02-24 17:33 ` no-reply
2017-02-27  9:25   ` Daniel P. Berrange
2017-02-24 19:04 ` Eric Blake
2017-02-27 13:28 ` Stefan Hajnoczi
2017-02-27 15:53 ` Andrea Arcangeli
  -- strict thread matches above, loose matches on Subject: below --
2017-02-23 10:59 Daniel P. Berrange
2017-02-23 12:05 ` Michal Privoznik
2017-02-23 12:07   ` Daniel P. Berrange
2017-02-24  9:05     ` Michal Privoznik
2017-02-24  9:24       ` Daniel P. Berrange
2017-02-24 12:12         ` Dr. David Alan Gilbert
2017-02-24 12:18           ` Paolo Bonzini
2017-02-27 11:10 ` Stefan Hajnoczi
2017-02-27 13:46   ` Rik van Riel
2017-02-27 13:58     ` Daniel P. Berrange

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.