All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] softmmu/physmem: Use qemu_madvise
@ 2022-03-16  4:04 Andrew Deason
  2022-03-16  4:14 ` Peter Xu
  2022-03-16  7:51 ` David Hildenbrand
  0 siblings, 2 replies; 13+ messages in thread
From: Andrew Deason @ 2022-03-16  4:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Philippe Mathieu-Daudé,
	Peter Xu, Dr . David Alan Gilbert, Andrew Deason, Paolo Bonzini

We have a thin wrapper around madvise, called qemu_madvise, which
provides consistent behavior for the !CONFIG_MADVISE case, and works
around some platform-specific quirks (some platforms only provide
posix_madvise, and some don't offer all 'advise' types). This specific
caller of madvise has never used it, tracing back to its original
introduction in commit e0b266f01dd2 ("migration_completion: Take
current state").

Call qemu_madvise here, to follow the same logic as all of our other
madvise callers. This slightly changes the behavior for
!CONFIG_MADVISE (EINVAL instead of ENOSYS, and a slightly different
error message), but this is now more consistent with other callers
that use qemu_madvise.

Signed-off-by: Andrew Deason <adeason@sinenomine.net>
---
Looking at the history of commits that touch this madvise() call, it
doesn't _look_ like there's any reason to be directly calling madvise vs
qemu_advise (I don't see anything mentioned), but I'm not sure.

 softmmu/physmem.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 43ae70fbe2..900c692b5e 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -3584,40 +3584,32 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
                          rb->idstr, start, length, ret);
             goto err;
 #endif
         }
         if (need_madvise) {
             /* For normal RAM this causes it to be unmapped,
              * for shared memory it causes the local mapping to disappear
              * and to fall back on the file contents (which we just
              * fallocate'd away).
              */
-#if defined(CONFIG_MADVISE)
             if (qemu_ram_is_shared(rb) && rb->fd < 0) {
-                ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE);
+                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_REMOVE);
             } else {
-                ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
+                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
             }
             if (ret) {
                 ret = -errno;
                 error_report("ram_block_discard_range: Failed to discard range "
                              "%s:%" PRIx64 " +%zx (%d)",
                              rb->idstr, start, length, ret);
                 goto err;
             }
-#else
-            ret = -ENOSYS;
-            error_report("ram_block_discard_range: MADVISE not available"
-                         "%s:%" PRIx64 " +%zx (%d)",
-                         rb->idstr, start, length, ret);
-            goto err;
-#endif
         }
         trace_ram_block_discard_range(rb->idstr, host_startaddr, length,
                                       need_madvise, need_fallocate, ret);
     } else {
         error_report("ram_block_discard_range: Overrun block '%s' (%" PRIu64
                      "/%zx/" RAM_ADDR_FMT")",
                      rb->idstr, start, length, rb->max_length);
     }
 
 err:
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] softmmu/physmem: Use qemu_madvise
  2022-03-16  4:04 [PATCH] softmmu/physmem: Use qemu_madvise Andrew Deason
@ 2022-03-16  4:14 ` Peter Xu
  2022-03-16  7:51 ` David Hildenbrand
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Xu @ 2022-03-16  4:14 UTC (permalink / raw)
  To: Andrew Deason
  Cc: Paolo Bonzini, Philippe Mathieu-Daudé,
	qemu-devel, Dr . David Alan Gilbert, David Hildenbrand

On Tue, Mar 15, 2022 at 11:04:05PM -0500, Andrew Deason wrote:
> We have a thin wrapper around madvise, called qemu_madvise, which
> provides consistent behavior for the !CONFIG_MADVISE case, and works
> around some platform-specific quirks (some platforms only provide
> posix_madvise, and some don't offer all 'advise' types). This specific
> caller of madvise has never used it, tracing back to its original
> introduction in commit e0b266f01dd2 ("migration_completion: Take
> current state").
> 
> Call qemu_madvise here, to follow the same logic as all of our other
> madvise callers. This slightly changes the behavior for
> !CONFIG_MADVISE (EINVAL instead of ENOSYS, and a slightly different
> error message), but this is now more consistent with other callers
> that use qemu_madvise.
> 
> Signed-off-by: Andrew Deason <adeason@sinenomine.net>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] softmmu/physmem: Use qemu_madvise
  2022-03-16  4:04 [PATCH] softmmu/physmem: Use qemu_madvise Andrew Deason
  2022-03-16  4:14 ` Peter Xu
@ 2022-03-16  7:51 ` David Hildenbrand
  2022-03-16  9:26   ` Peter Maydell
  1 sibling, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2022-03-16  7:51 UTC (permalink / raw)
  To: Andrew Deason, qemu-devel
  Cc: Paolo Bonzini, Philippe Mathieu-Daudé,
	Peter Xu, Dr . David Alan Gilbert

On 16.03.22 05:04, Andrew Deason wrote:
> We have a thin wrapper around madvise, called qemu_madvise, which
> provides consistent behavior for the !CONFIG_MADVISE case, and works
> around some platform-specific quirks (some platforms only provide
> posix_madvise, and some don't offer all 'advise' types). This specific
> caller of madvise has never used it, tracing back to its original
> introduction in commit e0b266f01dd2 ("migration_completion: Take
> current state").
> 
> Call qemu_madvise here, to follow the same logic as all of our other
> madvise callers. This slightly changes the behavior for
> !CONFIG_MADVISE (EINVAL instead of ENOSYS, and a slightly different
> error message), but this is now more consistent with other callers
> that use qemu_madvise.
> 
> Signed-off-by: Andrew Deason <adeason@sinenomine.net>
> ---
> Looking at the history of commits that touch this madvise() call, it
> doesn't _look_ like there's any reason to be directly calling madvise vs
> qemu_advise (I don't see anything mentioned), but I'm not sure.
> 
>  softmmu/physmem.c | 12 ++----------
>  1 file changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index 43ae70fbe2..900c692b5e 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -3584,40 +3584,32 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
>                           rb->idstr, start, length, ret);
>              goto err;
>  #endif
>          }
>          if (need_madvise) {
>              /* For normal RAM this causes it to be unmapped,
>               * for shared memory it causes the local mapping to disappear
>               * and to fall back on the file contents (which we just
>               * fallocate'd away).
>               */
> -#if defined(CONFIG_MADVISE)
>              if (qemu_ram_is_shared(rb) && rb->fd < 0) {
> -                ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE);
> +                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_REMOVE);
>              } else {
> -                ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> +                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);

posix_madvise(QEMU_MADV_DONTNEED) has completely different semantics
then madvise() -- it's not a discard that we need here.

So ram_block_discard_range() would now succeed in environments (BSD?)
where it's supposed to fail.

So AFAIKs this isn't sane.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] softmmu/physmem: Use qemu_madvise
  2022-03-16  7:51 ` David Hildenbrand
@ 2022-03-16  9:26   ` Peter Maydell
  2022-03-16  9:37     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Maydell @ 2022-03-16  9:26 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Dr . David Alan Gilbert, Peter Xu,
	Philippe Mathieu-Daudé,
	Andrew Deason, Paolo Bonzini

On Wed, 16 Mar 2022 at 07:53, David Hildenbrand <david@redhat.com> wrote:
>
> On 16.03.22 05:04, Andrew Deason wrote:
> > We have a thin wrapper around madvise, called qemu_madvise, which
> > provides consistent behavior for the !CONFIG_MADVISE case, and works
> > around some platform-specific quirks (some platforms only provide
> > posix_madvise, and some don't offer all 'advise' types). This specific
> > caller of madvise has never used it, tracing back to its original
> > introduction in commit e0b266f01dd2 ("migration_completion: Take
> > current state").
> >
> > Call qemu_madvise here, to follow the same logic as all of our other
> > madvise callers. This slightly changes the behavior for
> > !CONFIG_MADVISE (EINVAL instead of ENOSYS, and a slightly different
> > error message), but this is now more consistent with other callers
> > that use qemu_madvise.
> >
> > Signed-off-by: Andrew Deason <adeason@sinenomine.net>
> > ---
> > Looking at the history of commits that touch this madvise() call, it
> > doesn't _look_ like there's any reason to be directly calling madvise vs
> > qemu_advise (I don't see anything mentioned), but I'm not sure.
> >
> >  softmmu/physmem.c | 12 ++----------
> >  1 file changed, 2 insertions(+), 10 deletions(-)
> >
> > diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> > index 43ae70fbe2..900c692b5e 100644
> > --- a/softmmu/physmem.c
> > +++ b/softmmu/physmem.c
> > @@ -3584,40 +3584,32 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
> >                           rb->idstr, start, length, ret);
> >              goto err;
> >  #endif
> >          }
> >          if (need_madvise) {
> >              /* For normal RAM this causes it to be unmapped,
> >               * for shared memory it causes the local mapping to disappear
> >               * and to fall back on the file contents (which we just
> >               * fallocate'd away).
> >               */
> > -#if defined(CONFIG_MADVISE)
> >              if (qemu_ram_is_shared(rb) && rb->fd < 0) {
> > -                ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE);
> > +                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_REMOVE);
> >              } else {
> > -                ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> > +                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
>
> posix_madvise(QEMU_MADV_DONTNEED) has completely different semantics
> then madvise() -- it's not a discard that we need here.
>
> So ram_block_discard_range() would now succeed in environments (BSD?)
> where it's supposed to fail.
>
> So AFAIKs this isn't sane.

But CONFIG_MADVISE just means "host has madvise()"; it doesn't imply
"this is a Linux madvise() with MADV_DONTNEED". Solaris madvise()
doesn't seem to have  MADV_DONTNEED at all; a quick look at the
FreeBSD manpage suggests its madvise MADV_DONTNEED is identical
to its posix_madvise MADV_DONTNEED.

If we need "specifically Linux MADV_DONTNEED semantics" maybe we
should define a QEMU_MADV_LINUX_DONTNEED which either (a) does the
right thing or (b) fails, and use qemu_madvise() regardless.

Certainly the current code is pretty fragile to being changed by
people who don't understand the undocumented subtlety behind
the use of a direct madvise() call here.

-- PMM


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] softmmu/physmem: Use qemu_madvise
  2022-03-16  9:26   ` Peter Maydell
@ 2022-03-16  9:37     ` Dr. David Alan Gilbert
  2022-03-16  9:41       ` David Hildenbrand
  0 siblings, 1 reply; 13+ messages in thread
From: Dr. David Alan Gilbert @ 2022-03-16  9:37 UTC (permalink / raw)
  To: Peter Maydell
  Cc: David Hildenbrand, Philippe Mathieu-Daudé,
	Peter Xu, qemu-devel, Andrew Deason, Paolo Bonzini

* Peter Maydell (peter.maydell@linaro.org) wrote:
> On Wed, 16 Mar 2022 at 07:53, David Hildenbrand <david@redhat.com> wrote:
> >
> > On 16.03.22 05:04, Andrew Deason wrote:
> > > We have a thin wrapper around madvise, called qemu_madvise, which
> > > provides consistent behavior for the !CONFIG_MADVISE case, and works
> > > around some platform-specific quirks (some platforms only provide
> > > posix_madvise, and some don't offer all 'advise' types). This specific
> > > caller of madvise has never used it, tracing back to its original
> > > introduction in commit e0b266f01dd2 ("migration_completion: Take
> > > current state").
> > >
> > > Call qemu_madvise here, to follow the same logic as all of our other
> > > madvise callers. This slightly changes the behavior for
> > > !CONFIG_MADVISE (EINVAL instead of ENOSYS, and a slightly different
> > > error message), but this is now more consistent with other callers
> > > that use qemu_madvise.
> > >
> > > Signed-off-by: Andrew Deason <adeason@sinenomine.net>
> > > ---
> > > Looking at the history of commits that touch this madvise() call, it
> > > doesn't _look_ like there's any reason to be directly calling madvise vs
> > > qemu_advise (I don't see anything mentioned), but I'm not sure.
> > >
> > >  softmmu/physmem.c | 12 ++----------
> > >  1 file changed, 2 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> > > index 43ae70fbe2..900c692b5e 100644
> > > --- a/softmmu/physmem.c
> > > +++ b/softmmu/physmem.c
> > > @@ -3584,40 +3584,32 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
> > >                           rb->idstr, start, length, ret);
> > >              goto err;
> > >  #endif
> > >          }
> > >          if (need_madvise) {
> > >              /* For normal RAM this causes it to be unmapped,
> > >               * for shared memory it causes the local mapping to disappear
> > >               * and to fall back on the file contents (which we just
> > >               * fallocate'd away).
> > >               */
> > > -#if defined(CONFIG_MADVISE)
> > >              if (qemu_ram_is_shared(rb) && rb->fd < 0) {
> > > -                ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE);
> > > +                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_REMOVE);
> > >              } else {
> > > -                ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> > > +                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> >
> > posix_madvise(QEMU_MADV_DONTNEED) has completely different semantics
> > then madvise() -- it's not a discard that we need here.
> >
> > So ram_block_discard_range() would now succeed in environments (BSD?)
> > where it's supposed to fail.
> >
> > So AFAIKs this isn't sane.
> 
> But CONFIG_MADVISE just means "host has madvise()"; it doesn't imply
> "this is a Linux madvise() with MADV_DONTNEED". Solaris madvise()
> doesn't seem to have  MADV_DONTNEED at all; a quick look at the
> FreeBSD manpage suggests its madvise MADV_DONTNEED is identical
> to its posix_madvise MADV_DONTNEED.
> 
> If we need "specifically Linux MADV_DONTNEED semantics" maybe we
> should define a QEMU_MADV_LINUX_DONTNEED which either (a) does the
> right thing or (b) fails, and use qemu_madvise() regardless.
> 
> Certainly the current code is pretty fragile to being changed by
> people who don't understand the undocumented subtlety behind
> the use of a direct madvise() call here.

Yeh and I'm not sure I can remembe rall the subtleties; there's a big
hairy set of ifdef's in include/qemu/madvise.h that makes
sure we always have the definition of QEMU_MADV_REMOVE/DONTNEED
even on platforms that might not define it themselves.

But I think this code is used for things with different degrees
of care about the semantics; e.g. 'balloon' just cares that
it frees memory up and doesn't care about the detailed semantics
that much; so it's probably fine with that.
Postcopy is much more touchy, but then it's only going to be
calling this on Linux anyway (because of the userfault dependency).

Dave

> -- PMM
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] softmmu/physmem: Use qemu_madvise
  2022-03-16  9:37     ` Dr. David Alan Gilbert
@ 2022-03-16  9:41       ` David Hildenbrand
  2022-03-16 14:29         ` Andrew Deason
  2022-03-22 16:39         ` Andrew Deason
  0 siblings, 2 replies; 13+ messages in thread
From: David Hildenbrand @ 2022-03-16  9:41 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Peter Maydell
  Cc: Paolo Bonzini, Philippe Mathieu-Daudé,
	qemu-devel, Peter Xu, Andrew Deason

On 16.03.22 10:37, Dr. David Alan Gilbert wrote:
> * Peter Maydell (peter.maydell@linaro.org) wrote:
>> On Wed, 16 Mar 2022 at 07:53, David Hildenbrand <david@redhat.com> wrote:
>>>
>>> On 16.03.22 05:04, Andrew Deason wrote:
>>>> We have a thin wrapper around madvise, called qemu_madvise, which
>>>> provides consistent behavior for the !CONFIG_MADVISE case, and works
>>>> around some platform-specific quirks (some platforms only provide
>>>> posix_madvise, and some don't offer all 'advise' types). This specific
>>>> caller of madvise has never used it, tracing back to its original
>>>> introduction in commit e0b266f01dd2 ("migration_completion: Take
>>>> current state").
>>>>
>>>> Call qemu_madvise here, to follow the same logic as all of our other
>>>> madvise callers. This slightly changes the behavior for
>>>> !CONFIG_MADVISE (EINVAL instead of ENOSYS, and a slightly different
>>>> error message), but this is now more consistent with other callers
>>>> that use qemu_madvise.
>>>>
>>>> Signed-off-by: Andrew Deason <adeason@sinenomine.net>
>>>> ---
>>>> Looking at the history of commits that touch this madvise() call, it
>>>> doesn't _look_ like there's any reason to be directly calling madvise vs
>>>> qemu_advise (I don't see anything mentioned), but I'm not sure.
>>>>
>>>>  softmmu/physmem.c | 12 ++----------
>>>>  1 file changed, 2 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
>>>> index 43ae70fbe2..900c692b5e 100644
>>>> --- a/softmmu/physmem.c
>>>> +++ b/softmmu/physmem.c
>>>> @@ -3584,40 +3584,32 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
>>>>                           rb->idstr, start, length, ret);
>>>>              goto err;
>>>>  #endif
>>>>          }
>>>>          if (need_madvise) {
>>>>              /* For normal RAM this causes it to be unmapped,
>>>>               * for shared memory it causes the local mapping to disappear
>>>>               * and to fall back on the file contents (which we just
>>>>               * fallocate'd away).
>>>>               */
>>>> -#if defined(CONFIG_MADVISE)
>>>>              if (qemu_ram_is_shared(rb) && rb->fd < 0) {
>>>> -                ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE);
>>>> +                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_REMOVE);
>>>>              } else {
>>>> -                ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
>>>> +                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
>>>
>>> posix_madvise(QEMU_MADV_DONTNEED) has completely different semantics
>>> then madvise() -- it's not a discard that we need here.
>>>
>>> So ram_block_discard_range() would now succeed in environments (BSD?)
>>> where it's supposed to fail.
>>>
>>> So AFAIKs this isn't sane.
>>
>> But CONFIG_MADVISE just means "host has madvise()"; it doesn't imply
>> "this is a Linux madvise() with MADV_DONTNEED". Solaris madvise()
>> doesn't seem to have  MADV_DONTNEED at all; a quick look at the
>> FreeBSD manpage suggests its madvise MADV_DONTNEED is identical
>> to its posix_madvise MADV_DONTNEED.
>>
>> If we need "specifically Linux MADV_DONTNEED semantics" maybe we
>> should define a QEMU_MADV_LINUX_DONTNEED which either (a) does the
>> right thing or (b) fails, and use qemu_madvise() regardless.
>>
>> Certainly the current code is pretty fragile to being changed by
>> people who don't understand the undocumented subtlety behind
>> the use of a direct madvise() call here.
> 
> Yeh and I'm not sure I can remembe rall the subtleties; there's a big
> hairy set of ifdef's in include/qemu/madvise.h that makes
> sure we always have the definition of QEMU_MADV_REMOVE/DONTNEED
> even on platforms that might not define it themselves.
> 
> But I think this code is used for things with different degrees
> of care about the semantics; e.g. 'balloon' just cares that
> it frees memory up and doesn't care about the detailed semantics
> that much; so it's probably fine with that.
> Postcopy is much more touchy, but then it's only going to be
> calling this on Linux anyway (because of the userfault dependency).

MADV_DONTNEED/MADV_REMOVE only provides discard semantics on Linux IIRC
-- and that's what we want to achieve: ram_block_discard_range()

So I agree with Peter that we might want to make this more explicit.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] softmmu/physmem: Use qemu_madvise
  2022-03-16  9:41       ` David Hildenbrand
@ 2022-03-16 14:29         ` Andrew Deason
  2022-03-22 16:39         ` Andrew Deason
  1 sibling, 0 replies; 13+ messages in thread
From: Andrew Deason @ 2022-03-16 14:29 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Peter Maydell, qemu-devel, Philippe Mathieu-Daudé,
	Peter Xu, Dr. David Alan Gilbert, Paolo Bonzini

On Wed, 16 Mar 2022 10:41:41 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 16.03.22 10:37, Dr. David Alan Gilbert wrote:
> > * Peter Maydell (peter.maydell@linaro.org) wrote:
> >> On Wed, 16 Mar 2022 at 07:53, David Hildenbrand <david@redhat.com> wrote:
> >>>
> >>> On 16.03.22 05:04, Andrew Deason wrote:
> >>>> We have a thin wrapper around madvise, called qemu_madvise, which
> >>>> provides consistent behavior for the !CONFIG_MADVISE case, and works
> >>>> around some platform-specific quirks (some platforms only provide
> >>>> posix_madvise, and some don't offer all 'advise' types). This specific
> >>>> caller of madvise has never used it, tracing back to its original
> >>>> introduction in commit e0b266f01dd2 ("migration_completion: Take
> >>>> current state").
> >>>>
> >>>> Call qemu_madvise here, to follow the same logic as all of our other
> >>>> madvise callers. This slightly changes the behavior for
> >>>> !CONFIG_MADVISE (EINVAL instead of ENOSYS, and a slightly different
> >>>> error message), but this is now more consistent with other callers
> >>>> that use qemu_madvise.
> >>>>
> >>>> Signed-off-by: Andrew Deason <adeason@sinenomine.net>
> >>>> ---
> >>>> Looking at the history of commits that touch this madvise() call, it
> >>>> doesn't _look_ like there's any reason to be directly calling madvise vs
> >>>> qemu_advise (I don't see anything mentioned), but I'm not sure.
> >>>>
> >>>>  softmmu/physmem.c | 12 ++----------
> >>>>  1 file changed, 2 insertions(+), 10 deletions(-)
> >>>>
> >>>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> >>>> index 43ae70fbe2..900c692b5e 100644
> >>>> --- a/softmmu/physmem.c
> >>>> +++ b/softmmu/physmem.c
> >>>> @@ -3584,40 +3584,32 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
> >>>>                           rb->idstr, start, length, ret);
> >>>>              goto err;
> >>>>  #endif
> >>>>          }
> >>>>          if (need_madvise) {
> >>>>              /* For normal RAM this causes it to be unmapped,
> >>>>               * for shared memory it causes the local mapping to disappear
> >>>>               * and to fall back on the file contents (which we just
> >>>>               * fallocate'd away).
> >>>>               */
> >>>> -#if defined(CONFIG_MADVISE)
> >>>>              if (qemu_ram_is_shared(rb) && rb->fd < 0) {
> >>>> -                ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE);
> >>>> +                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_REMOVE);
> >>>>              } else {
> >>>> -                ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> >>>> +                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> >>>
> >>> posix_madvise(QEMU_MADV_DONTNEED) has completely different semantics
> >>> then madvise() -- it's not a discard that we need here.
> >>>
> >>> So ram_block_discard_range() would now succeed in environments (BSD?)
> >>> where it's supposed to fail.
> >>>
> >>> So AFAIKs this isn't sane.
> >>
> >> But CONFIG_MADVISE just means "host has madvise()"; it doesn't imply
> >> "this is a Linux madvise() with MADV_DONTNEED". Solaris madvise()
> >> doesn't seem to have  MADV_DONTNEED at all; a quick look at the
> >> FreeBSD manpage suggests its madvise MADV_DONTNEED is identical
> >> to its posix_madvise MADV_DONTNEED.
> >>
> >> If we need "specifically Linux MADV_DONTNEED semantics" maybe we
> >> should define a QEMU_MADV_LINUX_DONTNEED which either (a) does the
> >> right thing or (b) fails, and use qemu_madvise() regardless.
> >>
> >> Certainly the current code is pretty fragile to being changed by
> >> people who don't understand the undocumented subtlety behind
> >> the use of a direct madvise() call here.
> > 
> > Yeh and I'm not sure I can remembe rall the subtleties; there's a big
> > hairy set of ifdef's in include/qemu/madvise.h that makes
> > sure we always have the definition of QEMU_MADV_REMOVE/DONTNEED
> > even on platforms that might not define it themselves.
> > 
> > But I think this code is used for things with different degrees
> > of care about the semantics; e.g. 'balloon' just cares that
> > it frees memory up and doesn't care about the detailed semantics
> > that much; so it's probably fine with that.
> > Postcopy is much more touchy, but then it's only going to be
> > calling this on Linux anyway (because of the userfault dependency).
> 
> MADV_DONTNEED/MADV_REMOVE only provides discard semantics on Linux IIRC
> -- and that's what we want to achieve: ram_block_discard_range()
> 
> So I agree with Peter that we might want to make this more explicit.

Maybe this could use an argument like QEMU_MADV_DONTNEED_DISCARD, which
gets #define'd to QEMU_MADV_INVALID for posix_madvise, or based on a
configure-time test, or just all non-Linux to be safe? Regardless, it
means "MADV_DONTNEED with discard semantics".

Solaris does have MADV_DONTNEED (possibly new in Solaris 11; I haven't
checked the history), but no MADV_REMOVE. As of 11.4.42 CBE, madvise(3C) says:

       MADV_DONTNEED

           Tell  the  kernel  that  the  specified  address range is no longer
           needed, so the system starts to free the resources associated  with
           the address range.

           MADV_DONTNEED  and  MADV_FREE  perform  related but distinct opera-
           tions. MADV_DONTNEED tries to move  any  data  from  the  specified
           address  range  out  of memory, but it ensures that the contents of
           that range  will  be  recovered  when  they  are  next  referenced.
           MADV_FREE  does not attempt to preserve the contents of the address
           range. As a result, subsequent references to an address range  that
           received  madvise (MADV_DONTNEED) are likely to be slower than ref-
           erences to a range that received madvise (MADV_FREE).

Is there a small test program I could run to see if the semantics are
what we want? Or should we just assume only Linux works here; I assume
that's safer.

-- 
Andrew Deason
adeason@sinenomine.net


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] softmmu/physmem: Use qemu_madvise
  2022-03-16  9:41       ` David Hildenbrand
  2022-03-16 14:29         ` Andrew Deason
@ 2022-03-22 16:39         ` Andrew Deason
  2022-03-22 16:43           ` David Hildenbrand
  2022-03-22 16:53           ` Dr. David Alan Gilbert
  1 sibling, 2 replies; 13+ messages in thread
From: Andrew Deason @ 2022-03-22 16:39 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Peter Maydell, qemu-devel, Philippe Mathieu-Daudé,
	Peter Xu, Dr. David Alan Gilbert, Paolo Bonzini

On Wed, 16 Mar 2022 10:41:41 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 16.03.22 10:37, Dr. David Alan Gilbert wrote:
> > * Peter Maydell (peter.maydell@linaro.org) wrote:
> >> On Wed, 16 Mar 2022 at 07:53, David Hildenbrand <david@redhat.com> wrote:
> >>>
> >>> On 16.03.22 05:04, Andrew Deason wrote:
> >>>> We have a thin wrapper around madvise, called qemu_madvise, which
> >>>> provides consistent behavior for the !CONFIG_MADVISE case, and works
> >>>> around some platform-specific quirks (some platforms only provide
> >>>> posix_madvise, and some don't offer all 'advise' types). This specific
> >>>> caller of madvise has never used it, tracing back to its original
> >>>> introduction in commit e0b266f01dd2 ("migration_completion: Take
> >>>> current state").
> >>>>
> >>>> Call qemu_madvise here, to follow the same logic as all of our other
> >>>> madvise callers. This slightly changes the behavior for
> >>>> !CONFIG_MADVISE (EINVAL instead of ENOSYS, and a slightly different
> >>>> error message), but this is now more consistent with other callers
> >>>> that use qemu_madvise.
> >>>>
> >>>> Signed-off-by: Andrew Deason <adeason@sinenomine.net>
> >>>> ---
> >>>> Looking at the history of commits that touch this madvise() call, it
> >>>> doesn't _look_ like there's any reason to be directly calling madvise vs
> >>>> qemu_advise (I don't see anything mentioned), but I'm not sure.
> >>>>
> >>>>  softmmu/physmem.c | 12 ++----------
> >>>>  1 file changed, 2 insertions(+), 10 deletions(-)
> >>>>
> >>>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> >>>> index 43ae70fbe2..900c692b5e 100644
> >>>> --- a/softmmu/physmem.c
> >>>> +++ b/softmmu/physmem.c
> >>>> @@ -3584,40 +3584,32 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
> >>>>                           rb->idstr, start, length, ret);
> >>>>              goto err;
> >>>>  #endif
> >>>>          }
> >>>>          if (need_madvise) {
> >>>>              /* For normal RAM this causes it to be unmapped,
> >>>>               * for shared memory it causes the local mapping to disappear
> >>>>               * and to fall back on the file contents (which we just
> >>>>               * fallocate'd away).
> >>>>               */
> >>>> -#if defined(CONFIG_MADVISE)
> >>>>              if (qemu_ram_is_shared(rb) && rb->fd < 0) {
> >>>> -                ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE);
> >>>> +                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_REMOVE);
> >>>>              } else {
> >>>> -                ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> >>>> +                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> >>>
> >>> posix_madvise(QEMU_MADV_DONTNEED) has completely different semantics
> >>> then madvise() -- it's not a discard that we need here.
> >>>
> >>> So ram_block_discard_range() would now succeed in environments (BSD?)
> >>> where it's supposed to fail.
> >>>
> >>> So AFAIKs this isn't sane.
> >>
> >> But CONFIG_MADVISE just means "host has madvise()"; it doesn't imply
> >> "this is a Linux madvise() with MADV_DONTNEED". Solaris madvise()
> >> doesn't seem to have  MADV_DONTNEED at all; a quick look at the
> >> FreeBSD manpage suggests its madvise MADV_DONTNEED is identical
> >> to its posix_madvise MADV_DONTNEED.
> >>
> >> If we need "specifically Linux MADV_DONTNEED semantics" maybe we
> >> should define a QEMU_MADV_LINUX_DONTNEED which either (a) does the
> >> right thing or (b) fails, and use qemu_madvise() regardless.
> >>
> >> Certainly the current code is pretty fragile to being changed by
> >> people who don't understand the undocumented subtlety behind
> >> the use of a direct madvise() call here.
> > 
> > Yeh and I'm not sure I can remembe rall the subtleties; there's a big
> > hairy set of ifdef's in include/qemu/madvise.h that makes
> > sure we always have the definition of QEMU_MADV_REMOVE/DONTNEED
> > even on platforms that might not define it themselves.
> > 
> > But I think this code is used for things with different degrees
> > of care about the semantics; e.g. 'balloon' just cares that
> > it frees memory up and doesn't care about the detailed semantics
> > that much; so it's probably fine with that.
> > Postcopy is much more touchy, but then it's only going to be
> > calling this on Linux anyway (because of the userfault dependency).
> 
> MADV_DONTNEED/MADV_REMOVE only provides discard semantics on Linux IIRC
> -- and that's what we want to achieve: ram_block_discard_range()
> 
> So I agree with Peter that we might want to make this more explicit.

I was looking at the comments/history around this code to try to make
this more explicit/clear, and it seems like the whole function is very
Linux-specific. All we ever do is:

- fallocate(FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE)
- madvise(MADV_REMOVE)
- madvise(MADV_DONTNEED) with Linux semantics

All of those operations are Linux-only, so trying to figure out the
cross-platform way to model this seems kind of pointless. Is it fine to
just #ifdef the whole thing to be just for Linux?

-- 
Andrew Deason
adeason@sinenomine.net


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] softmmu/physmem: Use qemu_madvise
  2022-03-22 16:39         ` Andrew Deason
@ 2022-03-22 16:43           ` David Hildenbrand
  2022-03-22 16:53           ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2022-03-22 16:43 UTC (permalink / raw)
  To: Andrew Deason
  Cc: Peter Maydell, qemu-devel, Philippe Mathieu-Daudé,
	Peter Xu, Dr. David Alan Gilbert, Paolo Bonzini

On 22.03.22 17:39, Andrew Deason wrote:
> On Wed, 16 Mar 2022 10:41:41 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 16.03.22 10:37, Dr. David Alan Gilbert wrote:
>>> * Peter Maydell (peter.maydell@linaro.org) wrote:
>>>> On Wed, 16 Mar 2022 at 07:53, David Hildenbrand <david@redhat.com> wrote:
>>>>>
>>>>> On 16.03.22 05:04, Andrew Deason wrote:
>>>>>> We have a thin wrapper around madvise, called qemu_madvise, which
>>>>>> provides consistent behavior for the !CONFIG_MADVISE case, and works
>>>>>> around some platform-specific quirks (some platforms only provide
>>>>>> posix_madvise, and some don't offer all 'advise' types). This specific
>>>>>> caller of madvise has never used it, tracing back to its original
>>>>>> introduction in commit e0b266f01dd2 ("migration_completion: Take
>>>>>> current state").
>>>>>>
>>>>>> Call qemu_madvise here, to follow the same logic as all of our other
>>>>>> madvise callers. This slightly changes the behavior for
>>>>>> !CONFIG_MADVISE (EINVAL instead of ENOSYS, and a slightly different
>>>>>> error message), but this is now more consistent with other callers
>>>>>> that use qemu_madvise.
>>>>>>
>>>>>> Signed-off-by: Andrew Deason <adeason@sinenomine.net>
>>>>>> ---
>>>>>> Looking at the history of commits that touch this madvise() call, it
>>>>>> doesn't _look_ like there's any reason to be directly calling madvise vs
>>>>>> qemu_advise (I don't see anything mentioned), but I'm not sure.
>>>>>>
>>>>>>  softmmu/physmem.c | 12 ++----------
>>>>>>  1 file changed, 2 insertions(+), 10 deletions(-)
>>>>>>
>>>>>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
>>>>>> index 43ae70fbe2..900c692b5e 100644
>>>>>> --- a/softmmu/physmem.c
>>>>>> +++ b/softmmu/physmem.c
>>>>>> @@ -3584,40 +3584,32 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
>>>>>>                           rb->idstr, start, length, ret);
>>>>>>              goto err;
>>>>>>  #endif
>>>>>>          }
>>>>>>          if (need_madvise) {
>>>>>>              /* For normal RAM this causes it to be unmapped,
>>>>>>               * for shared memory it causes the local mapping to disappear
>>>>>>               * and to fall back on the file contents (which we just
>>>>>>               * fallocate'd away).
>>>>>>               */
>>>>>> -#if defined(CONFIG_MADVISE)
>>>>>>              if (qemu_ram_is_shared(rb) && rb->fd < 0) {
>>>>>> -                ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE);
>>>>>> +                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_REMOVE);
>>>>>>              } else {
>>>>>> -                ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
>>>>>> +                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
>>>>>
>>>>> posix_madvise(QEMU_MADV_DONTNEED) has completely different semantics
>>>>> then madvise() -- it's not a discard that we need here.
>>>>>
>>>>> So ram_block_discard_range() would now succeed in environments (BSD?)
>>>>> where it's supposed to fail.
>>>>>
>>>>> So AFAIKs this isn't sane.
>>>>
>>>> But CONFIG_MADVISE just means "host has madvise()"; it doesn't imply
>>>> "this is a Linux madvise() with MADV_DONTNEED". Solaris madvise()
>>>> doesn't seem to have  MADV_DONTNEED at all; a quick look at the
>>>> FreeBSD manpage suggests its madvise MADV_DONTNEED is identical
>>>> to its posix_madvise MADV_DONTNEED.
>>>>
>>>> If we need "specifically Linux MADV_DONTNEED semantics" maybe we
>>>> should define a QEMU_MADV_LINUX_DONTNEED which either (a) does the
>>>> right thing or (b) fails, and use qemu_madvise() regardless.
>>>>
>>>> Certainly the current code is pretty fragile to being changed by
>>>> people who don't understand the undocumented subtlety behind
>>>> the use of a direct madvise() call here.
>>>
>>> Yeh and I'm not sure I can remembe rall the subtleties; there's a big
>>> hairy set of ifdef's in include/qemu/madvise.h that makes
>>> sure we always have the definition of QEMU_MADV_REMOVE/DONTNEED
>>> even on platforms that might not define it themselves.
>>>
>>> But I think this code is used for things with different degrees
>>> of care about the semantics; e.g. 'balloon' just cares that
>>> it frees memory up and doesn't care about the detailed semantics
>>> that much; so it's probably fine with that.
>>> Postcopy is much more touchy, but then it's only going to be
>>> calling this on Linux anyway (because of the userfault dependency).
>>
>> MADV_DONTNEED/MADV_REMOVE only provides discard semantics on Linux IIRC
>> -- and that's what we want to achieve: ram_block_discard_range()
>>
>> So I agree with Peter that we might want to make this more explicit.
> 
> I was looking at the comments/history around this code to try to make
> this more explicit/clear, and it seems like the whole function is very
> Linux-specific. All we ever do is:
> 
> - fallocate(FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE)
> - madvise(MADV_REMOVE)
> - madvise(MADV_DONTNEED) with Linux semantics
> 
> All of those operations are Linux-only, so trying to figure out the
> cross-platform way to model this seems kind of pointless. Is it fine to
> just #ifdef the whole thing to be just for Linux?
> 

Fine with me, as long as it compiles on other OSs :)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] softmmu/physmem: Use qemu_madvise
  2022-03-22 16:39         ` Andrew Deason
  2022-03-22 16:43           ` David Hildenbrand
@ 2022-03-22 16:53           ` Dr. David Alan Gilbert
  2022-03-22 17:34             ` Andrew Deason
  1 sibling, 1 reply; 13+ messages in thread
From: Dr. David Alan Gilbert @ 2022-03-22 16:53 UTC (permalink / raw)
  To: Andrew Deason
  Cc: Peter Maydell, David Hildenbrand, Philippe Mathieu-Daudé,
	Peter Xu, qemu-devel, Paolo Bonzini

* Andrew Deason (adeason@sinenomine.net) wrote:
> On Wed, 16 Mar 2022 10:41:41 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
> > On 16.03.22 10:37, Dr. David Alan Gilbert wrote:
> > > * Peter Maydell (peter.maydell@linaro.org) wrote:
> > >> On Wed, 16 Mar 2022 at 07:53, David Hildenbrand <david@redhat.com> wrote:
> > >>>
> > >>> On 16.03.22 05:04, Andrew Deason wrote:
> > >>>> We have a thin wrapper around madvise, called qemu_madvise, which
> > >>>> provides consistent behavior for the !CONFIG_MADVISE case, and works
> > >>>> around some platform-specific quirks (some platforms only provide
> > >>>> posix_madvise, and some don't offer all 'advise' types). This specific
> > >>>> caller of madvise has never used it, tracing back to its original
> > >>>> introduction in commit e0b266f01dd2 ("migration_completion: Take
> > >>>> current state").
> > >>>>
> > >>>> Call qemu_madvise here, to follow the same logic as all of our other
> > >>>> madvise callers. This slightly changes the behavior for
> > >>>> !CONFIG_MADVISE (EINVAL instead of ENOSYS, and a slightly different
> > >>>> error message), but this is now more consistent with other callers
> > >>>> that use qemu_madvise.
> > >>>>
> > >>>> Signed-off-by: Andrew Deason <adeason@sinenomine.net>
> > >>>> ---
> > >>>> Looking at the history of commits that touch this madvise() call, it
> > >>>> doesn't _look_ like there's any reason to be directly calling madvise vs
> > >>>> qemu_advise (I don't see anything mentioned), but I'm not sure.
> > >>>>
> > >>>>  softmmu/physmem.c | 12 ++----------
> > >>>>  1 file changed, 2 insertions(+), 10 deletions(-)
> > >>>>
> > >>>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> > >>>> index 43ae70fbe2..900c692b5e 100644
> > >>>> --- a/softmmu/physmem.c
> > >>>> +++ b/softmmu/physmem.c
> > >>>> @@ -3584,40 +3584,32 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
> > >>>>                           rb->idstr, start, length, ret);
> > >>>>              goto err;
> > >>>>  #endif
> > >>>>          }
> > >>>>          if (need_madvise) {
> > >>>>              /* For normal RAM this causes it to be unmapped,
> > >>>>               * for shared memory it causes the local mapping to disappear
> > >>>>               * and to fall back on the file contents (which we just
> > >>>>               * fallocate'd away).
> > >>>>               */
> > >>>> -#if defined(CONFIG_MADVISE)
> > >>>>              if (qemu_ram_is_shared(rb) && rb->fd < 0) {
> > >>>> -                ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE);
> > >>>> +                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_REMOVE);
> > >>>>              } else {
> > >>>> -                ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> > >>>> +                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> > >>>
> > >>> posix_madvise(QEMU_MADV_DONTNEED) has completely different semantics
> > >>> then madvise() -- it's not a discard that we need here.
> > >>>
> > >>> So ram_block_discard_range() would now succeed in environments (BSD?)
> > >>> where it's supposed to fail.
> > >>>
> > >>> So AFAIKs this isn't sane.
> > >>
> > >> But CONFIG_MADVISE just means "host has madvise()"; it doesn't imply
> > >> "this is a Linux madvise() with MADV_DONTNEED". Solaris madvise()
> > >> doesn't seem to have  MADV_DONTNEED at all; a quick look at the
> > >> FreeBSD manpage suggests its madvise MADV_DONTNEED is identical
> > >> to its posix_madvise MADV_DONTNEED.
> > >>
> > >> If we need "specifically Linux MADV_DONTNEED semantics" maybe we
> > >> should define a QEMU_MADV_LINUX_DONTNEED which either (a) does the
> > >> right thing or (b) fails, and use qemu_madvise() regardless.
> > >>
> > >> Certainly the current code is pretty fragile to being changed by
> > >> people who don't understand the undocumented subtlety behind
> > >> the use of a direct madvise() call here.
> > > 
> > > Yeh and I'm not sure I can remembe rall the subtleties; there's a big
> > > hairy set of ifdef's in include/qemu/madvise.h that makes
> > > sure we always have the definition of QEMU_MADV_REMOVE/DONTNEED
> > > even on platforms that might not define it themselves.
> > > 
> > > But I think this code is used for things with different degrees
> > > of care about the semantics; e.g. 'balloon' just cares that
> > > it frees memory up and doesn't care about the detailed semantics
> > > that much; so it's probably fine with that.
> > > Postcopy is much more touchy, but then it's only going to be
> > > calling this on Linux anyway (because of the userfault dependency).
> > 
> > MADV_DONTNEED/MADV_REMOVE only provides discard semantics on Linux IIRC
> > -- and that's what we want to achieve: ram_block_discard_range()
> > 
> > So I agree with Peter that we might want to make this more explicit.
> 
> I was looking at the comments/history around this code to try to make
> this more explicit/clear, and it seems like the whole function is very
> Linux-specific. All we ever do is:
> 
> - fallocate(FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE)
> - madvise(MADV_REMOVE)
> - madvise(MADV_DONTNEED) with Linux semantics
> 
> All of those operations are Linux-only, so trying to figure out the
> cross-platform way to model this seems kind of pointless. Is it fine to
> just #ifdef the whole thing to be just for Linux?

For ballooning we don't really need Linux semantics; we just need it to
use less host memory.  Postcopy needs the more careful semantics though.

Dave
> -- 
> Andrew Deason
> adeason@sinenomine.net
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] softmmu/physmem: Use qemu_madvise
  2022-03-22 16:53           ` Dr. David Alan Gilbert
@ 2022-03-22 17:34             ` Andrew Deason
  2022-03-22 17:58               ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Deason @ 2022-03-22 17:34 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Peter Maydell, David Hildenbrand, Philippe Mathieu-Daudé,
	Peter Xu, qemu-devel, Paolo Bonzini

On Tue, 22 Mar 2022 16:53:05 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Andrew Deason (adeason@sinenomine.net) wrote:
> > On Wed, 16 Mar 2022 10:41:41 +0100
> > David Hildenbrand <david@redhat.com> wrote:
> > 
> > > On 16.03.22 10:37, Dr. David Alan Gilbert wrote:
> > > > * Peter Maydell (peter.maydell@linaro.org) wrote:
> > > >> On Wed, 16 Mar 2022 at 07:53, David Hildenbrand <david@redhat.com> wrote:
> > > >>>
> > > >>> On 16.03.22 05:04, Andrew Deason wrote:
> > > >>>> We have a thin wrapper around madvise, called qemu_madvise, which
> > > >>>> provides consistent behavior for the !CONFIG_MADVISE case, and works
> > > >>>> around some platform-specific quirks (some platforms only provide
> > > >>>> posix_madvise, and some don't offer all 'advise' types). This specific
> > > >>>> caller of madvise has never used it, tracing back to its original
> > > >>>> introduction in commit e0b266f01dd2 ("migration_completion: Take
> > > >>>> current state").
> > > >>>>
> > > >>>> Call qemu_madvise here, to follow the same logic as all of our other
> > > >>>> madvise callers. This slightly changes the behavior for
> > > >>>> !CONFIG_MADVISE (EINVAL instead of ENOSYS, and a slightly different
> > > >>>> error message), but this is now more consistent with other callers
> > > >>>> that use qemu_madvise.
> > > >>>>
> > > >>>> Signed-off-by: Andrew Deason <adeason@sinenomine.net>
> > > >>>> ---
> > > >>>> Looking at the history of commits that touch this madvise() call, it
> > > >>>> doesn't _look_ like there's any reason to be directly calling madvise vs
> > > >>>> qemu_advise (I don't see anything mentioned), but I'm not sure.
> > > >>>>
> > > >>>>  softmmu/physmem.c | 12 ++----------
> > > >>>>  1 file changed, 2 insertions(+), 10 deletions(-)
> > > >>>>
> > > >>>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> > > >>>> index 43ae70fbe2..900c692b5e 100644
> > > >>>> --- a/softmmu/physmem.c
> > > >>>> +++ b/softmmu/physmem.c
> > > >>>> @@ -3584,40 +3584,32 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
> > > >>>>                           rb->idstr, start, length, ret);
> > > >>>>              goto err;
> > > >>>>  #endif
> > > >>>>          }
> > > >>>>          if (need_madvise) {
> > > >>>>              /* For normal RAM this causes it to be unmapped,
> > > >>>>               * for shared memory it causes the local mapping to disappear
> > > >>>>               * and to fall back on the file contents (which we just
> > > >>>>               * fallocate'd away).
> > > >>>>               */
> > > >>>> -#if defined(CONFIG_MADVISE)
> > > >>>>              if (qemu_ram_is_shared(rb) && rb->fd < 0) {
> > > >>>> -                ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE);
> > > >>>> +                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_REMOVE);
> > > >>>>              } else {
> > > >>>> -                ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> > > >>>> +                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> > > >>>
> > > >>> posix_madvise(QEMU_MADV_DONTNEED) has completely different semantics
> > > >>> then madvise() -- it's not a discard that we need here.
> > > >>>
> > > >>> So ram_block_discard_range() would now succeed in environments (BSD?)
> > > >>> where it's supposed to fail.
> > > >>>
> > > >>> So AFAIKs this isn't sane.
> > > >>
> > > >> But CONFIG_MADVISE just means "host has madvise()"; it doesn't imply
> > > >> "this is a Linux madvise() with MADV_DONTNEED". Solaris madvise()
> > > >> doesn't seem to have  MADV_DONTNEED at all; a quick look at the
> > > >> FreeBSD manpage suggests its madvise MADV_DONTNEED is identical
> > > >> to its posix_madvise MADV_DONTNEED.
> > > >>
> > > >> If we need "specifically Linux MADV_DONTNEED semantics" maybe we
> > > >> should define a QEMU_MADV_LINUX_DONTNEED which either (a) does the
> > > >> right thing or (b) fails, and use qemu_madvise() regardless.
> > > >>
> > > >> Certainly the current code is pretty fragile to being changed by
> > > >> people who don't understand the undocumented subtlety behind
> > > >> the use of a direct madvise() call here.
> > > > 
> > > > Yeh and I'm not sure I can remembe rall the subtleties; there's a big
> > > > hairy set of ifdef's in include/qemu/madvise.h that makes
> > > > sure we always have the definition of QEMU_MADV_REMOVE/DONTNEED
> > > > even on platforms that might not define it themselves.
> > > > 
> > > > But I think this code is used for things with different degrees
> > > > of care about the semantics; e.g. 'balloon' just cares that
> > > > it frees memory up and doesn't care about the detailed semantics
> > > > that much; so it's probably fine with that.
> > > > Postcopy is much more touchy, but then it's only going to be
> > > > calling this on Linux anyway (because of the userfault dependency).
> > > 
> > > MADV_DONTNEED/MADV_REMOVE only provides discard semantics on Linux IIRC
> > > -- and that's what we want to achieve: ram_block_discard_range()
> > > 
> > > So I agree with Peter that we might want to make this more explicit.
> > 
> > I was looking at the comments/history around this code to try to make
> > this more explicit/clear, and it seems like the whole function is very
> > Linux-specific. All we ever do is:
> > 
> > - fallocate(FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE)
> > - madvise(MADV_REMOVE)
> > - madvise(MADV_DONTNEED) with Linux semantics
> > 
> > All of those operations are Linux-only, so trying to figure out the
> > cross-platform way to model this seems kind of pointless. Is it fine to
> > just #ifdef the whole thing to be just for Linux?
> 
> For ballooning we don't really need Linux semantics; we just need it to
> use less host memory.  Postcopy needs the more careful semantics though.

Right, sorry, you mentioned that, but I was thinking it sounded like
that applied to the MADV_DONTNEED path, which is only 1 case of the 3.
But reading into this a bit, maybe all of these cases are fine on
non-Linux for ballooning: fallocate is never called ('rb->fd' is always
-1), and QEMU_MADV_REMOVE falls back to QEMU_MADV_DONTNEED, which is
fine for ballooning. Am I understanding that correctly?

But regardless, it's simpler/more-conservative to make it all
Linux-specific. If I just take ram_block_discard_range away from
non-Linux (that is, make it always return an error), is that breaking
actual functionality, or is it removing a niche code path that non-Linux
isn't really using anyway, and it's not worth the time to figure out? I
am not familiar with qemu internals, so this is not obvious to me.

For context: I'm just trying to get the tree to compile on other
platforms (immediate focus is the guest agent on Solaris). The madvise()
calls here generate warnings due to platform-specific quirks that
qemu_madvise() has some logic to deal with. So my question is whether
to adapt these to the cross-platform qemu_advise(), or treat the
function as platform-specific code.

-- 
Andrew Deason
adeason@sinenomine.net


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] softmmu/physmem: Use qemu_madvise
  2022-03-22 17:34             ` Andrew Deason
@ 2022-03-22 17:58               ` Dr. David Alan Gilbert
  2022-03-22 19:35                 ` Andrew Deason
  0 siblings, 1 reply; 13+ messages in thread
From: Dr. David Alan Gilbert @ 2022-03-22 17:58 UTC (permalink / raw)
  To: Andrew Deason
  Cc: Peter Maydell, David Hildenbrand, Philippe Mathieu-Daudé,
	Peter Xu, qemu-devel, Paolo Bonzini

* Andrew Deason (adeason@sinenomine.net) wrote:
> On Tue, 22 Mar 2022 16:53:05 +0000
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Andrew Deason (adeason@sinenomine.net) wrote:
> > > On Wed, 16 Mar 2022 10:41:41 +0100
> > > David Hildenbrand <david@redhat.com> wrote:
> > > 
> > > > On 16.03.22 10:37, Dr. David Alan Gilbert wrote:
> > > > > * Peter Maydell (peter.maydell@linaro.org) wrote:
> > > > >> On Wed, 16 Mar 2022 at 07:53, David Hildenbrand <david@redhat.com> wrote:
> > > > >>>
> > > > >>> On 16.03.22 05:04, Andrew Deason wrote:
> > > > >>>> We have a thin wrapper around madvise, called qemu_madvise, which
> > > > >>>> provides consistent behavior for the !CONFIG_MADVISE case, and works
> > > > >>>> around some platform-specific quirks (some platforms only provide
> > > > >>>> posix_madvise, and some don't offer all 'advise' types). This specific
> > > > >>>> caller of madvise has never used it, tracing back to its original
> > > > >>>> introduction in commit e0b266f01dd2 ("migration_completion: Take
> > > > >>>> current state").
> > > > >>>>
> > > > >>>> Call qemu_madvise here, to follow the same logic as all of our other
> > > > >>>> madvise callers. This slightly changes the behavior for
> > > > >>>> !CONFIG_MADVISE (EINVAL instead of ENOSYS, and a slightly different
> > > > >>>> error message), but this is now more consistent with other callers
> > > > >>>> that use qemu_madvise.
> > > > >>>>
> > > > >>>> Signed-off-by: Andrew Deason <adeason@sinenomine.net>
> > > > >>>> ---
> > > > >>>> Looking at the history of commits that touch this madvise() call, it
> > > > >>>> doesn't _look_ like there's any reason to be directly calling madvise vs
> > > > >>>> qemu_advise (I don't see anything mentioned), but I'm not sure.
> > > > >>>>
> > > > >>>>  softmmu/physmem.c | 12 ++----------
> > > > >>>>  1 file changed, 2 insertions(+), 10 deletions(-)
> > > > >>>>
> > > > >>>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> > > > >>>> index 43ae70fbe2..900c692b5e 100644
> > > > >>>> --- a/softmmu/physmem.c
> > > > >>>> +++ b/softmmu/physmem.c
> > > > >>>> @@ -3584,40 +3584,32 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
> > > > >>>>                           rb->idstr, start, length, ret);
> > > > >>>>              goto err;
> > > > >>>>  #endif
> > > > >>>>          }
> > > > >>>>          if (need_madvise) {
> > > > >>>>              /* For normal RAM this causes it to be unmapped,
> > > > >>>>               * for shared memory it causes the local mapping to disappear
> > > > >>>>               * and to fall back on the file contents (which we just
> > > > >>>>               * fallocate'd away).
> > > > >>>>               */
> > > > >>>> -#if defined(CONFIG_MADVISE)
> > > > >>>>              if (qemu_ram_is_shared(rb) && rb->fd < 0) {
> > > > >>>> -                ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE);
> > > > >>>> +                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_REMOVE);
> > > > >>>>              } else {
> > > > >>>> -                ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> > > > >>>> +                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> > > > >>>
> > > > >>> posix_madvise(QEMU_MADV_DONTNEED) has completely different semantics
> > > > >>> then madvise() -- it's not a discard that we need here.
> > > > >>>
> > > > >>> So ram_block_discard_range() would now succeed in environments (BSD?)
> > > > >>> where it's supposed to fail.
> > > > >>>
> > > > >>> So AFAIKs this isn't sane.
> > > > >>
> > > > >> But CONFIG_MADVISE just means "host has madvise()"; it doesn't imply
> > > > >> "this is a Linux madvise() with MADV_DONTNEED". Solaris madvise()
> > > > >> doesn't seem to have  MADV_DONTNEED at all; a quick look at the
> > > > >> FreeBSD manpage suggests its madvise MADV_DONTNEED is identical
> > > > >> to its posix_madvise MADV_DONTNEED.
> > > > >>
> > > > >> If we need "specifically Linux MADV_DONTNEED semantics" maybe we
> > > > >> should define a QEMU_MADV_LINUX_DONTNEED which either (a) does the
> > > > >> right thing or (b) fails, and use qemu_madvise() regardless.
> > > > >>
> > > > >> Certainly the current code is pretty fragile to being changed by
> > > > >> people who don't understand the undocumented subtlety behind
> > > > >> the use of a direct madvise() call here.
> > > > > 
> > > > > Yeh and I'm not sure I can remembe rall the subtleties; there's a big
> > > > > hairy set of ifdef's in include/qemu/madvise.h that makes
> > > > > sure we always have the definition of QEMU_MADV_REMOVE/DONTNEED
> > > > > even on platforms that might not define it themselves.
> > > > > 
> > > > > But I think this code is used for things with different degrees
> > > > > of care about the semantics; e.g. 'balloon' just cares that
> > > > > it frees memory up and doesn't care about the detailed semantics
> > > > > that much; so it's probably fine with that.
> > > > > Postcopy is much more touchy, but then it's only going to be
> > > > > calling this on Linux anyway (because of the userfault dependency).
> > > > 
> > > > MADV_DONTNEED/MADV_REMOVE only provides discard semantics on Linux IIRC
> > > > -- and that's what we want to achieve: ram_block_discard_range()
> > > > 
> > > > So I agree with Peter that we might want to make this more explicit.
> > > 
> > > I was looking at the comments/history around this code to try to make
> > > this more explicit/clear, and it seems like the whole function is very
> > > Linux-specific. All we ever do is:
> > > 
> > > - fallocate(FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE)
> > > - madvise(MADV_REMOVE)
> > > - madvise(MADV_DONTNEED) with Linux semantics
> > > 
> > > All of those operations are Linux-only, so trying to figure out the
> > > cross-platform way to model this seems kind of pointless. Is it fine to
> > > just #ifdef the whole thing to be just for Linux?
> > 
> > For ballooning we don't really need Linux semantics; we just need it to
> > use less host memory.  Postcopy needs the more careful semantics though.
> 
> Right, sorry, you mentioned that, but I was thinking it sounded like
> that applied to the MADV_DONTNEED path, which is only 1 case of the 3.
> But reading into this a bit, maybe all of these cases are fine on
> non-Linux for ballooning: fallocate is never called ('rb->fd' is always
> -1), and QEMU_MADV_REMOVE falls back to QEMU_MADV_DONTNEED, which is
> fine for ballooning. Am I understanding that correctly?
> 
> But regardless, it's simpler/more-conservative to make it all
> Linux-specific. If I just take ram_block_discard_range away from
> non-Linux (that is, make it always return an error), is that breaking
> actual functionality, or is it removing a niche code path that non-Linux
> isn't really using anyway, and it's not worth the time to figure out? I
> am not familiar with qemu internals, so this is not obvious to me.
> 
> For context: I'm just trying to get the tree to compile on other
> platforms (immediate focus is the guest agent on Solaris). The madvise()
> calls here generate warnings due to platform-specific quirks that
> qemu_madvise() has some logic to deal with. So my question is whether
> to adapt these to the cross-platform qemu_advise(), or treat the
> function as platform-specific code.

I think you should keep this function to have the basic functionality of
freeing a chunk of memory so that it takes less host space up; that way
a guest using virtio-balloon will still be nice to the host.

Dave

> -- 
> Andrew Deason
> adeason@sinenomine.net
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] softmmu/physmem: Use qemu_madvise
  2022-03-22 17:58               ` Dr. David Alan Gilbert
@ 2022-03-22 19:35                 ` Andrew Deason
  0 siblings, 0 replies; 13+ messages in thread
From: Andrew Deason @ 2022-03-22 19:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Peter Maydell, David Hildenbrand, Philippe Mathieu-Daudé,
	Peter Xu, qemu-devel, Paolo Bonzini

On Tue, 22 Mar 2022 17:58:19 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Andrew Deason (adeason@sinenomine.net) wrote:
> > On Tue, 22 Mar 2022 16:53:05 +0000
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > 
> > > * Andrew Deason (adeason@sinenomine.net) wrote:
> > > > On Wed, 16 Mar 2022 10:41:41 +0100
> > > > David Hildenbrand <david@redhat.com> wrote:
> > > > 
> > > > > On 16.03.22 10:37, Dr. David Alan Gilbert wrote:
> > > > > > * Peter Maydell (peter.maydell@linaro.org) wrote:
> > > > > >> On Wed, 16 Mar 2022 at 07:53, David Hildenbrand <david@redhat.com> wrote:
> > > > > >>>
> > > > > >>> On 16.03.22 05:04, Andrew Deason wrote:
> > > > > >>>> We have a thin wrapper around madvise, called qemu_madvise, which
> > > > > >>>> provides consistent behavior for the !CONFIG_MADVISE case, and works
> > > > > >>>> around some platform-specific quirks (some platforms only provide
> > > > > >>>> posix_madvise, and some don't offer all 'advise' types). This specific
> > > > > >>>> caller of madvise has never used it, tracing back to its original
> > > > > >>>> introduction in commit e0b266f01dd2 ("migration_completion: Take
> > > > > >>>> current state").
> > > > > >>>>
> > > > > >>>> Call qemu_madvise here, to follow the same logic as all of our other
> > > > > >>>> madvise callers. This slightly changes the behavior for
> > > > > >>>> !CONFIG_MADVISE (EINVAL instead of ENOSYS, and a slightly different
> > > > > >>>> error message), but this is now more consistent with other callers
> > > > > >>>> that use qemu_madvise.
> > > > > >>>>
> > > > > >>>> Signed-off-by: Andrew Deason <adeason@sinenomine.net>
> > > > > >>>> ---
> > > > > >>>> Looking at the history of commits that touch this madvise() call, it
> > > > > >>>> doesn't _look_ like there's any reason to be directly calling madvise vs
> > > > > >>>> qemu_advise (I don't see anything mentioned), but I'm not sure.
> > > > > >>>>
> > > > > >>>>  softmmu/physmem.c | 12 ++----------
> > > > > >>>>  1 file changed, 2 insertions(+), 10 deletions(-)
> > > > > >>>>
> > > > > >>>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> > > > > >>>> index 43ae70fbe2..900c692b5e 100644
> > > > > >>>> --- a/softmmu/physmem.c
> > > > > >>>> +++ b/softmmu/physmem.c
> > > > > >>>> @@ -3584,40 +3584,32 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
> > > > > >>>>                           rb->idstr, start, length, ret);
> > > > > >>>>              goto err;
> > > > > >>>>  #endif
> > > > > >>>>          }
> > > > > >>>>          if (need_madvise) {
> > > > > >>>>              /* For normal RAM this causes it to be unmapped,
> > > > > >>>>               * for shared memory it causes the local mapping to disappear
> > > > > >>>>               * and to fall back on the file contents (which we just
> > > > > >>>>               * fallocate'd away).
> > > > > >>>>               */
> > > > > >>>> -#if defined(CONFIG_MADVISE)
> > > > > >>>>              if (qemu_ram_is_shared(rb) && rb->fd < 0) {
> > > > > >>>> -                ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE);
> > > > > >>>> +                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_REMOVE);
> > > > > >>>>              } else {
> > > > > >>>> -                ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> > > > > >>>> +                ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> > > > > >>>
> > > > > >>> posix_madvise(QEMU_MADV_DONTNEED) has completely different semantics
> > > > > >>> then madvise() -- it's not a discard that we need here.
> > > > > >>>
> > > > > >>> So ram_block_discard_range() would now succeed in environments (BSD?)
> > > > > >>> where it's supposed to fail.
> > > > > >>>
> > > > > >>> So AFAIKs this isn't sane.
> > > > > >>
> > > > > >> But CONFIG_MADVISE just means "host has madvise()"; it doesn't imply
> > > > > >> "this is a Linux madvise() with MADV_DONTNEED". Solaris madvise()
> > > > > >> doesn't seem to have  MADV_DONTNEED at all; a quick look at the
> > > > > >> FreeBSD manpage suggests its madvise MADV_DONTNEED is identical
> > > > > >> to its posix_madvise MADV_DONTNEED.
> > > > > >>
> > > > > >> If we need "specifically Linux MADV_DONTNEED semantics" maybe we
> > > > > >> should define a QEMU_MADV_LINUX_DONTNEED which either (a) does the
> > > > > >> right thing or (b) fails, and use qemu_madvise() regardless.
> > > > > >>
> > > > > >> Certainly the current code is pretty fragile to being changed by
> > > > > >> people who don't understand the undocumented subtlety behind
> > > > > >> the use of a direct madvise() call here.
> > > > > > 
> > > > > > Yeh and I'm not sure I can remembe rall the subtleties; there's a big
> > > > > > hairy set of ifdef's in include/qemu/madvise.h that makes
> > > > > > sure we always have the definition of QEMU_MADV_REMOVE/DONTNEED
> > > > > > even on platforms that might not define it themselves.
> > > > > > 
> > > > > > But I think this code is used for things with different degrees
> > > > > > of care about the semantics; e.g. 'balloon' just cares that
> > > > > > it frees memory up and doesn't care about the detailed semantics
> > > > > > that much; so it's probably fine with that.
> > > > > > Postcopy is much more touchy, but then it's only going to be
> > > > > > calling this on Linux anyway (because of the userfault dependency).
> > > > > 
> > > > > MADV_DONTNEED/MADV_REMOVE only provides discard semantics on Linux IIRC
> > > > > -- and that's what we want to achieve: ram_block_discard_range()
> > > > > 
> > > > > So I agree with Peter that we might want to make this more explicit.
> > > > 
> > > > I was looking at the comments/history around this code to try to make
> > > > this more explicit/clear, and it seems like the whole function is very
> > > > Linux-specific. All we ever do is:
> > > > 
> > > > - fallocate(FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE)
> > > > - madvise(MADV_REMOVE)
> > > > - madvise(MADV_DONTNEED) with Linux semantics
> > > > 
> > > > All of those operations are Linux-only, so trying to figure out the
> > > > cross-platform way to model this seems kind of pointless. Is it fine to
> > > > just #ifdef the whole thing to be just for Linux?
> > > 
> > > For ballooning we don't really need Linux semantics; we just need it to
> > > use less host memory.  Postcopy needs the more careful semantics though.
> > 
> > Right, sorry, you mentioned that, but I was thinking it sounded like
> > that applied to the MADV_DONTNEED path, which is only 1 case of the 3.
> > But reading into this a bit, maybe all of these cases are fine on
> > non-Linux for ballooning: fallocate is never called ('rb->fd' is always
> > -1), and QEMU_MADV_REMOVE falls back to QEMU_MADV_DONTNEED, which is
> > fine for ballooning. Am I understanding that correctly?
> > 
> > But regardless, it's simpler/more-conservative to make it all
> > Linux-specific. If I just take ram_block_discard_range away from
> > non-Linux (that is, make it always return an error), is that breaking
> > actual functionality, or is it removing a niche code path that non-Linux
> > isn't really using anyway, and it's not worth the time to figure out? I
> > am not familiar with qemu internals, so this is not obvious to me.
> > 
> > For context: I'm just trying to get the tree to compile on other
> > platforms (immediate focus is the guest agent on Solaris). The madvise()
> > calls here generate warnings due to platform-specific quirks that
> > qemu_madvise() has some logic to deal with. So my question is whether
> > to adapt these to the cross-platform qemu_advise(), or treat the
> > function as platform-specific code.
> 
> I think you should keep this function to have the basic functionality of
> freeing a chunk of memory so that it takes less host space up; that way
> a guest using virtio-balloon will still be nice to the host.

Okay, but the postcopy code path still requires the linux semantics
(that is, that accessing the range afterwards triggers a fault). That
sounds like having two functions is in order, to indicate which
semantics are needed? Something like:

- int ram_block_discard_range(...). Existing semantics; pages must be
  unmapped, or return an error. Called by postcopy stuff.

- void ram_block_try_discard_range(...). Best-effort, just try to reduce
  memory usage. Called by balloon stuff.

There are several callers in hw/virtio/virtio-mem.c that do look at the
return code. Do these require the pages be unmapped? (the linux
semantics)

-- 
Andrew Deason
adeason@sinenomine.net


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-03-22 19:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-16  4:04 [PATCH] softmmu/physmem: Use qemu_madvise Andrew Deason
2022-03-16  4:14 ` Peter Xu
2022-03-16  7:51 ` David Hildenbrand
2022-03-16  9:26   ` Peter Maydell
2022-03-16  9:37     ` Dr. David Alan Gilbert
2022-03-16  9:41       ` David Hildenbrand
2022-03-16 14:29         ` Andrew Deason
2022-03-22 16:39         ` Andrew Deason
2022-03-22 16:43           ` David Hildenbrand
2022-03-22 16:53           ` Dr. David Alan Gilbert
2022-03-22 17:34             ` Andrew Deason
2022-03-22 17:58               ` Dr. David Alan Gilbert
2022-03-22 19:35                 ` Andrew Deason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.