All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
@ 2010-09-15 17:08 Andrea Arcangeli
  2010-09-15 17:34 ` Alexander Graf
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Andrea Arcangeli @ 2010-09-15 17:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Anthony Liguori

From: Andrea Arcangeli <aarcange@redhat.com>

All allocated guest physical memory shall be marked MADV_DONTFORK, otherwise
fork will fail because of accounting issues preventing migration or netdev_add
when the guest allocated more than half of host physical memory.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---

diff --git a/exec.c b/exec.c
index 380dab5..e2bdf19 100644
--- a/exec.c
+++ b/exec.c
@@ -2861,6 +2861,9 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
 #ifdef MADV_MERGEABLE
             madvise(new_block->host, size, MADV_MERGEABLE);
 #endif
+#ifdef MADV_DONTFORK
+            madvise(new_block->host, size, MADV_DONTFORK);
+#endif
         }
     }
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
  2010-09-15 17:08 [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory Andrea Arcangeli
@ 2010-09-15 17:34 ` Alexander Graf
  2010-09-15 17:37 ` Andreas Färber
  2010-09-15 21:03 ` [Qemu-devel] " Michael S. Tsirkin
  2 siblings, 0 replies; 18+ messages in thread
From: Alexander Graf @ 2010-09-15 17:34 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Anthony Liguori, qemu-devel


On 15.09.2010, at 19:08, Andrea Arcangeli wrote:

> From: Andrea Arcangeli <aarcange@redhat.com>
> 
> All allocated guest physical memory shall be marked MADV_DONTFORK, otherwise
> fork will fail because of accounting issues preventing migration or netdev_add
> when the guest allocated more than half of host physical memory.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

The madvise logic is just about to get changed now. Please look at Andreas' patches and coordinate with him there.


Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
  2010-09-15 17:08 [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory Andrea Arcangeli
  2010-09-15 17:34 ` Alexander Graf
@ 2010-09-15 17:37 ` Andreas Färber
  2011-01-05 15:10   ` Andrea Arcangeli
  2010-09-15 21:03 ` [Qemu-devel] " Michael S. Tsirkin
  2 siblings, 1 reply; 18+ messages in thread
From: Andreas Färber @ 2010-09-15 17:37 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Blue Swirl, Anthony Liguori, qemu-devel@nongnu.org Developers

Am 15.09.2010 um 19:08 schrieb Andrea Arcangeli:

> From: Andrea Arcangeli <aarcange@redhat.com>
>
> All allocated guest physical memory shall be marked MADV_DONTFORK,  
> otherwise
> fork will fail because of accounting issues preventing migration or  
> netdev_add
> when the guest allocated more than half of host physical memory.
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
>
> diff --git a/exec.c b/exec.c
> index 380dab5..e2bdf19 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2861,6 +2861,9 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState  
> *dev, const char *name,
> #ifdef MADV_MERGEABLE
>             madvise(new_block->host, size, MADV_MERGEABLE);
> #endif
> +#ifdef MADV_DONTFORK
> +            madvise(new_block->host, size, MADV_DONTFORK);
> +#endif

There is a pending patch which introduces qemu_madvise() and will  
eliminate the need for these #ifdefs, which was demanded by Blue. I'll  
cc you on v6.

Regards,

Andreas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] Re: [PATCH] add MADV_DONTFORK to guest physical memory
  2010-09-15 17:08 [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory Andrea Arcangeli
  2010-09-15 17:34 ` Alexander Graf
  2010-09-15 17:37 ` Andreas Färber
@ 2010-09-15 21:03 ` Michael S. Tsirkin
  2010-09-16  6:51   ` Gleb Natapov
  2 siblings, 1 reply; 18+ messages in thread
From: Michael S. Tsirkin @ 2010-09-15 21:03 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Anthony Liguori, qemu-devel

On Wed, Sep 15, 2010 at 07:08:24PM +0200, Andrea Arcangeli wrote:
> From: Andrea Arcangeli <aarcange@redhat.com>
> 
> All allocated guest physical memory shall be marked MADV_DONTFORK, otherwise
> fork will fail because of accounting issues preventing migration or netdev_add
> when the guest allocated more than half of host physical memory.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
> 
> diff --git a/exec.c b/exec.c
> index 380dab5..e2bdf19 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2861,6 +2861,9 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
>  #ifdef MADV_MERGEABLE
>              madvise(new_block->host, size, MADV_MERGEABLE);
>  #endif
> +#ifdef MADV_DONTFORK
> +            madvise(new_block->host, size, MADV_DONTFORK);
> +#endif
>          }
>      }

Is this always a number of full host pages? If not, we'd get trouble
when trying to call helpers.

-- 
MST

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] Re: [PATCH] add MADV_DONTFORK to guest physical memory
  2010-09-15 21:03 ` [Qemu-devel] " Michael S. Tsirkin
@ 2010-09-16  6:51   ` Gleb Natapov
  0 siblings, 0 replies; 18+ messages in thread
From: Gleb Natapov @ 2010-09-16  6:51 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Andrea Arcangeli, Anthony Liguori, qemu-devel

On Wed, Sep 15, 2010 at 11:03:14PM +0200, Michael S. Tsirkin wrote:
> On Wed, Sep 15, 2010 at 07:08:24PM +0200, Andrea Arcangeli wrote:
> > From: Andrea Arcangeli <aarcange@redhat.com>
> > 
> > All allocated guest physical memory shall be marked MADV_DONTFORK, otherwise
> > fork will fail because of accounting issues preventing migration or netdev_add
> > when the guest allocated more than half of host physical memory.
> > 
> > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> > ---
> > 
> > diff --git a/exec.c b/exec.c
> > index 380dab5..e2bdf19 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -2861,6 +2861,9 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
> >  #ifdef MADV_MERGEABLE
> >              madvise(new_block->host, size, MADV_MERGEABLE);
> >  #endif
> > +#ifdef MADV_DONTFORK
> > +            madvise(new_block->host, size, MADV_DONTFORK);
> > +#endif
> >          }
> >      }
> 
> Is this always a number of full host pages? If not, we'd get trouble
> when trying to call helpers.
> 
How it can be not a number of full host pages? This is guest memory.

--
			Gleb.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
  2010-09-15 17:37 ` Andreas Färber
@ 2011-01-05 15:10   ` Andrea Arcangeli
  2011-01-05 18:02     ` Michael Roth
  0 siblings, 1 reply; 18+ messages in thread
From: Andrea Arcangeli @ 2011-01-05 15:10 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Blue Swirl, Anthony Liguori, qemu-devel@nongnu.org Developers

The bug is still there so I rediffed the old patch against current
code.

On a related topic: could somebody give me advice on how to implement
a command line (command line seems enough, the other option would be
monitor command) to make the MADV_MERGEABLE conditional? I got KSM on
THP working fine but KSM may decrease performance by increasing the
number of copy on write and by splitting hugepages, so we'd like to be
able to turn off KSM on a per-VM basis (not on the whole host, which
of course we already can by setting /sys/kernel/mm/ksm/run to 0) so
that high perf VMs will keep running at maximum speed with KSM off but
others may still benefit from KSM. For that I need to make the below
MADV_MERGEABLE madvise conditional to something and the code itself
will be trivial, we've just to converge on a command line option
(hopefully quickly ;).

======
Subject: avoid allocation failures during fork/exec for migrate/hotplug

From: Andrea Arcangeli <aarcange@redhat.com>

Mark all guest physical memory as MADV_DONTFORK to avoid false positive
allocation failures during fork in migrate/netdev hotplug.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---

diff --git a/exec.c b/exec.c
index db9ff55..9e2aa12 100644
--- a/exec.c
+++ b/exec.c
@@ -2851,6 +2851,9 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
             new_block->host = qemu_vmalloc(size);
 #endif
             qemu_madvise(new_block->host, size, QEMU_MADV_MERGEABLE);
+
+            /* no allocation failures during fork/exec for migrate/hotplug */
+            qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK);
         }
     }
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
  2011-01-05 15:10   ` Andrea Arcangeli
@ 2011-01-05 18:02     ` Michael Roth
  2011-01-05 19:44       ` Alexander Graf
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Roth @ 2011-01-05 18:02 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Blue Swirl, Andreas Färber, Anthony Liguori,
	qemu-devel@nongnu.org Developers

On 01/05/2011 09:10 AM, Andrea Arcangeli wrote:
> The bug is still there so I rediffed the old patch against current
> code.
>
> On a related topic: could somebody give me advice on how to implement
> a command line (command line seems enough, the other option would be
> monitor command) to make the MADV_MERGEABLE conditional? I got KSM on
> THP working fine but KSM may decrease performance by increasing the
> number of copy on write and by splitting hugepages, so we'd like to be
> able to turn off KSM on a per-VM basis (not on the whole host, which
> of course we already can by setting /sys/kernel/mm/ksm/run to 0) so
> that high perf VMs will keep running at maximum speed with KSM off but
> others may still benefit from KSM. For that I need to make the below
> MADV_MERGEABLE madvise conditional to something and the code itself
> will be trivial, we've just to converge on a command line option
> (hopefully quickly ;).

There was a -mem_prealloc option added a while back to set MAP_POPULATE 
on memory mapped in via the -mem-path option. So an analogous 
-mem_nomerge option or something along that line seems reasonable for 
conditionally unsetting QEMU_MADV_MERGEABLE.

And for consistency you should probably make both your proposed changes 
for -mem-path'd memory as well.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
  2011-01-05 18:02     ` Michael Roth
@ 2011-01-05 19:44       ` Alexander Graf
  2011-01-05 19:54         ` Andrea Arcangeli
  2011-01-05 20:10         ` Anthony Liguori
  0 siblings, 2 replies; 18+ messages in thread
From: Alexander Graf @ 2011-01-05 19:44 UTC (permalink / raw)
  To: Michael Roth
  Cc: Andrea Arcangeli, Blue Swirl, Andreas Färber,
	Anthony Liguori, qemu-devel@nongnu.org Developers


On 05.01.2011, at 19:02, Michael Roth wrote:

> On 01/05/2011 09:10 AM, Andrea Arcangeli wrote:
>> The bug is still there so I rediffed the old patch against current
>> code.
>> 
>> On a related topic: could somebody give me advice on how to implement
>> a command line (command line seems enough, the other option would be
>> monitor command) to make the MADV_MERGEABLE conditional? I got KSM on
>> THP working fine but KSM may decrease performance by increasing the
>> number of copy on write and by splitting hugepages, so we'd like to be
>> able to turn off KSM on a per-VM basis (not on the whole host, which
>> of course we already can by setting /sys/kernel/mm/ksm/run to 0) so
>> that high perf VMs will keep running at maximum speed with KSM off but
>> others may still benefit from KSM. For that I need to make the below
>> MADV_MERGEABLE madvise conditional to something and the code itself
>> will be trivial, we've just to converge on a command line option
>> (hopefully quickly ;).
> 
> There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE.
> 
> And for consistency you should probably make both your proposed changes for -mem-path'd memory as well.

Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters?

-mem size=512,populate=on,ksm=off

and default -m to something reasonable with the new syntax.


Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
  2011-01-05 19:44       ` Alexander Graf
@ 2011-01-05 19:54         ` Andrea Arcangeli
  2011-01-05 20:00           ` Alexander Graf
  2011-01-05 20:26           ` Michael Roth
  2011-01-05 20:10         ` Anthony Liguori
  1 sibling, 2 replies; 18+ messages in thread
From: Andrea Arcangeli @ 2011-01-05 19:54 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Blue Swirl, Andreas Färber,
	qemu-devel@nongnu.org Developers, Michael Roth, Anthony Liguori

Hello everyone,

On Wed, Jan 05, 2011 at 08:44:38PM +0100, Alexander Graf wrote:
> 
> On 05.01.2011, at 19:02, Michael Roth wrote:
> 
> > On 01/05/2011 09:10 AM, Andrea Arcangeli wrote:
> >> The bug is still there so I rediffed the old patch against current
> >> code.
> >> 
> >> On a related topic: could somebody give me advice on how to implement
> >> a command line (command line seems enough, the other option would be
> >> monitor command) to make the MADV_MERGEABLE conditional? I got KSM on
> >> THP working fine but KSM may decrease performance by increasing the
> >> number of copy on write and by splitting hugepages, so we'd like to be
> >> able to turn off KSM on a per-VM basis (not on the whole host, which
> >> of course we already can by setting /sys/kernel/mm/ksm/run to 0) so
> >> that high perf VMs will keep running at maximum speed with KSM off but
> >> others may still benefit from KSM. For that I need to make the below
> >> MADV_MERGEABLE madvise conditional to something and the code itself
> >> will be trivial, we've just to converge on a command line option
> >> (hopefully quickly ;).
> > 
> > There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE.
> > 
> > And for consistency you should probably make both your proposed changes for -mem-path'd memory as well.
> 
> Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters?
> 
> -mem size=512,populate=on,ksm=off
> 
> and default -m to something reasonable with the new syntax.

I'm neutral... so feel free to decide what I should implement ;).

One comment on combining -m ksm=off (or -mem_nomerge) with
-mem-path. It seems unnecessary because ksm can't be turned on on
VM_HUGETLB vmas (MADV_MERGEABLE will return -EINVAL) and mem-path only
makes sense if used in combination with hugetlbfs (which sets
VM_HUGETLB of course).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
  2011-01-05 19:54         ` Andrea Arcangeli
@ 2011-01-05 20:00           ` Alexander Graf
  2011-01-05 20:12             ` Anthony Liguori
  2011-01-05 20:15             ` Andrea Arcangeli
  2011-01-05 20:26           ` Michael Roth
  1 sibling, 2 replies; 18+ messages in thread
From: Alexander Graf @ 2011-01-05 20:00 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Blue Swirl, Andreas Färber,
	qemu-devel@nongnu.org Developers, Michael Roth, Anthony Liguori


On 05.01.2011, at 20:54, Andrea Arcangeli wrote:

> Hello everyone,
> 
> On Wed, Jan 05, 2011 at 08:44:38PM +0100, Alexander Graf wrote:
>> 
>> On 05.01.2011, at 19:02, Michael Roth wrote:
>> 
>>> On 01/05/2011 09:10 AM, Andrea Arcangeli wrote:
>>>> The bug is still there so I rediffed the old patch against current
>>>> code.
>>>> 
>>>> On a related topic: could somebody give me advice on how to implement
>>>> a command line (command line seems enough, the other option would be
>>>> monitor command) to make the MADV_MERGEABLE conditional? I got KSM on
>>>> THP working fine but KSM may decrease performance by increasing the
>>>> number of copy on write and by splitting hugepages, so we'd like to be
>>>> able to turn off KSM on a per-VM basis (not on the whole host, which
>>>> of course we already can by setting /sys/kernel/mm/ksm/run to 0) so
>>>> that high perf VMs will keep running at maximum speed with KSM off but
>>>> others may still benefit from KSM. For that I need to make the below
>>>> MADV_MERGEABLE madvise conditional to something and the code itself
>>>> will be trivial, we've just to converge on a command line option
>>>> (hopefully quickly ;).
>>> 
>>> There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE.
>>> 
>>> And for consistency you should probably make both your proposed changes for -mem-path'd memory as well.
>> 
>> Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters?
>> 
>> -mem size=512,populate=on,ksm=off
>> 
>> and default -m to something reasonable with the new syntax.
> 
> I'm neutral... so feel free to decide what I should implement ;).
> 
> One comment on combining -m ksm=off (or -mem_nomerge) with
> -mem-path. It seems unnecessary because ksm can't be turned on on
> VM_HUGETLB vmas (MADV_MERGEABLE will return -EINVAL) and mem-path only
> makes sense if used in combination with hugetlbfs (which sets
> VM_HUGETLB of course).

Sure, not all combinations make sense. But "-mem size=1G,path=/dev/shm/vm1.ram,populate=on" would make sense, no? TPH should go along the same lines here too. It'd just be a flag "tph" that defaults to on if available.

That way we could also do all the sanity checks in a single place. I really like the idea of combining memory management command line parameters into a single option :). In the end I'd assume it's Anthony's call though.


Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
  2011-01-05 19:44       ` Alexander Graf
  2011-01-05 19:54         ` Andrea Arcangeli
@ 2011-01-05 20:10         ` Anthony Liguori
  1 sibling, 0 replies; 18+ messages in thread
From: Anthony Liguori @ 2011-01-05 20:10 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Andrea Arcangeli, Anthony Liguori,
	qemu-devel@nongnu.org Developers, Michael Roth, Blue Swirl,
	Andreas Färber

On 01/05/2011 01:44 PM, Alexander Graf wrote:
> On 05.01.2011, at 19:02, Michael Roth wrote:
>
>    
>> On 01/05/2011 09:10 AM, Andrea Arcangeli wrote:
>>      
>>> The bug is still there so I rediffed the old patch against current
>>> code.
>>>
>>> On a related topic: could somebody give me advice on how to implement
>>> a command line (command line seems enough, the other option would be
>>> monitor command) to make the MADV_MERGEABLE conditional? I got KSM on
>>> THP working fine but KSM may decrease performance by increasing the
>>> number of copy on write and by splitting hugepages, so we'd like to be
>>> able to turn off KSM on a per-VM basis (not on the whole host, which
>>> of course we already can by setting /sys/kernel/mm/ksm/run to 0) so
>>> that high perf VMs will keep running at maximum speed with KSM off but
>>> others may still benefit from KSM. For that I need to make the below
>>> MADV_MERGEABLE madvise conditional to something and the code itself
>>> will be trivial, we've just to converge on a command line option
>>> (hopefully quickly ;).
>>>        
>> There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE.
>>
>> And for consistency you should probably make both your proposed changes for -mem-path'd memory as well.
>>      
> Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters?
>
> -mem size=512,populate=on,ksm=off
>
> and default -m to something reasonable with the new syntax.
>    

Yeah, that does make sense.  Maybe we should consider folding it in with 
the -numa option too?

Regards,

Anthony Liguori

>
> Alex
>
>    

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
  2011-01-05 20:00           ` Alexander Graf
@ 2011-01-05 20:12             ` Anthony Liguori
  2011-01-05 20:15             ` Andrea Arcangeli
  1 sibling, 0 replies; 18+ messages in thread
From: Anthony Liguori @ 2011-01-05 20:12 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Andrea Arcangeli, Blue Swirl, Andreas Färber, Michael Roth,
	qemu-devel@nongnu.org Developers

On 01/05/2011 02:00 PM, Alexander Graf wrote:
> On 05.01.2011, at 20:54, Andrea Arcangeli wrote:
>
>    
>> Hello everyone,
>>
>> On Wed, Jan 05, 2011 at 08:44:38PM +0100, Alexander Graf wrote:
>>      
>>> On 05.01.2011, at 19:02, Michael Roth wrote:
>>>
>>>        
>>>> On 01/05/2011 09:10 AM, Andrea Arcangeli wrote:
>>>>          
>>>>> The bug is still there so I rediffed the old patch against current
>>>>> code.
>>>>>
>>>>> On a related topic: could somebody give me advice on how to implement
>>>>> a command line (command line seems enough, the other option would be
>>>>> monitor command) to make the MADV_MERGEABLE conditional? I got KSM on
>>>>> THP working fine but KSM may decrease performance by increasing the
>>>>> number of copy on write and by splitting hugepages, so we'd like to be
>>>>> able to turn off KSM on a per-VM basis (not on the whole host, which
>>>>> of course we already can by setting /sys/kernel/mm/ksm/run to 0) so
>>>>> that high perf VMs will keep running at maximum speed with KSM off but
>>>>> others may still benefit from KSM. For that I need to make the below
>>>>> MADV_MERGEABLE madvise conditional to something and the code itself
>>>>> will be trivial, we've just to converge on a command line option
>>>>> (hopefully quickly ;).
>>>>>            
>>>> There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE.
>>>>
>>>> And for consistency you should probably make both your proposed changes for -mem-path'd memory as well.
>>>>          
>>> Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters?
>>>
>>> -mem size=512,populate=on,ksm=off
>>>
>>> and default -m to something reasonable with the new syntax.
>>>        
>> I'm neutral... so feel free to decide what I should implement ;).
>>
>> One comment on combining -m ksm=off (or -mem_nomerge) with
>> -mem-path. It seems unnecessary because ksm can't be turned on on
>> VM_HUGETLB vmas (MADV_MERGEABLE will return -EINVAL) and mem-path only
>> makes sense if used in combination with hugetlbfs (which sets
>> VM_HUGETLB of course).
>>      
> Sure, not all combinations make sense. But "-mem size=1G,path=/dev/shm/vm1.ram,populate=on" would make sense, no? TPH should go along the same lines here too. It'd just be a flag "tph" that defaults to on if available.
>
> That way we could also do all the sanity checks in a single place. I really like the idea of combining memory management command line parameters into a single option :). In the end I'd assume it's Anthony's call though.
>    

Where it's really helpful is in the ever elusive configuration file 
format.  -mem becomes:

[mem]
size=1G
path=/dev/shm/vm1.ram
populate=on

Which is nice from a grouping perspective.

Regards,

Anthony Liguori

> Alex
>
>    

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
  2011-01-05 20:00           ` Alexander Graf
  2011-01-05 20:12             ` Anthony Liguori
@ 2011-01-05 20:15             ` Andrea Arcangeli
  1 sibling, 0 replies; 18+ messages in thread
From: Andrea Arcangeli @ 2011-01-05 20:15 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Blue Swirl, Andreas Färber,
	qemu-devel@nongnu.org Developers, Michael Roth, Anthony Liguori

On Wed, Jan 05, 2011 at 09:00:45PM +0100, Alexander Graf wrote:
> Sure, not all combinations make sense. But "-mem
> size=1G,path=/dev/shm/vm1.ram,populate=on" would make sense, no? TPH

I was referring to Michael's suggestion, sure many options make sense
for mem-path too, in fact populate makes more sense for mem-path
(considering if hugetlbfs allocation fails guest may die if
libhugetlbfs isn't trapping the -ENOMEM and doing munmap creating hole
and replacing hole with regular anoymous memory).

> should go along the same lines here too. It'd just be a flag "tph"
> that defaults to on if available.

THP is already default on (thp only requires us to 2m align the
virtual address of guest physical memory which we should do by default
even for qemu no-kvm). We can add a thp=off switch on the same lines
of ksm=off.

> That way we could also do all the sanity checks in a single place. I
> really like the idea of combining memory management command line
> parameters into a single option :). In the end I'd assume it's
> Anthony's call though.

Ok, let me know ;). Sounds like a lot of trouble compared to
-mem_nomerge, but I agree it may be worth it for the long term.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
  2011-01-05 19:54         ` Andrea Arcangeli
  2011-01-05 20:00           ` Alexander Graf
@ 2011-01-05 20:26           ` Michael Roth
  2011-01-05 20:35             ` Andrea Arcangeli
  1 sibling, 1 reply; 18+ messages in thread
From: Michael Roth @ 2011-01-05 20:26 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Blue Swirl, Andreas Färber, Anthony Liguori, Alexander Graf,
	qemu-devel@nongnu.org Developers

On 01/05/2011 01:54 PM, Andrea Arcangeli wrote:
> Hello everyone,
>
> On Wed, Jan 05, 2011 at 08:44:38PM +0100, Alexander Graf wrote:
>>
>> On 05.01.2011, at 19:02, Michael Roth wrote:
>>
>>> On 01/05/2011 09:10 AM, Andrea Arcangeli wrote:
>>>> The bug is still there so I rediffed the old patch against current
>>>> code.
>>>>
>>>> On a related topic: could somebody give me advice on how to implement
>>>> a command line (command line seems enough, the other option would be
>>>> monitor command) to make the MADV_MERGEABLE conditional? I got KSM on
>>>> THP working fine but KSM may decrease performance by increasing the
>>>> number of copy on write and by splitting hugepages, so we'd like to be
>>>> able to turn off KSM on a per-VM basis (not on the whole host, which
>>>> of course we already can by setting /sys/kernel/mm/ksm/run to 0) so
>>>> that high perf VMs will keep running at maximum speed with KSM off but
>>>> others may still benefit from KSM. For that I need to make the below
>>>> MADV_MERGEABLE madvise conditional to something and the code itself
>>>> will be trivial, we've just to converge on a command line option
>>>> (hopefully quickly ;).
>>>
>>> There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE.
>>>
>>> And for consistency you should probably make both your proposed changes for -mem-path'd memory as well.
>>
>> Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters?
>>
>> -mem size=512,populate=on,ksm=off
>>
>> and default -m to something reasonable with the new syntax.
>
> I'm neutral... so feel free to decide what I should implement ;).

Have to agree the consolidated approach definitely seems nicer.

>
> One comment on combining -m ksm=off (or -mem_nomerge) with
> -mem-path. It seems unnecessary because ksm can't be turned on on
> VM_HUGETLB vmas (MADV_MERGEABLE will return -EINVAL) and mem-path only
> makes sense if used in combination with hugetlbfs (which sets
> VM_HUGETLB of course).
>

Yah you're right, but I've seen several discussions about using mempath 
for tmpfs/ram-backed files for things like numa/zram/etc so tend to 
think of it as something potentially more than just a hook for 
hugetlbfs, which is becoming less and less useful. But the MADV_DONTFORK 
stuff should still be immediately applicable.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
  2011-01-05 20:26           ` Michael Roth
@ 2011-01-05 20:35             ` Andrea Arcangeli
  2011-01-05 21:27               ` Michael Roth
  0 siblings, 1 reply; 18+ messages in thread
From: Andrea Arcangeli @ 2011-01-05 20:35 UTC (permalink / raw)
  To: Michael Roth
  Cc: Blue Swirl, Andreas Färber, Anthony Liguori, Alexander Graf,
	qemu-devel@nongnu.org Developers

On Wed, Jan 05, 2011 at 02:26:19PM -0600, Michael Roth wrote:
> Yah you're right, but I've seen several discussions about using mempath 
> for tmpfs/ram-backed files for things like numa/zram/etc so tend to 
> think of it as something potentially more than just a hook for 
> hugetlbfs, which is becoming less and less useful. But the MADV_DONTFORK 
> stuff should still be immediately applicable.

Yes, MADV_DONTFORK should be set all on all guest physical memory
without options so I hope the new patch I just posted is fine to stop
the spurious -ENOMEM failures in fork.

The ksm part should go incremental but 99% of it will be about
changing the command line, making the madvise conditional will be
trivial as well.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
  2011-01-05 20:35             ` Andrea Arcangeli
@ 2011-01-05 21:27               ` Michael Roth
  2011-01-06 17:49                 ` Andrea Arcangeli
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Roth @ 2011-01-05 21:27 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Blue Swirl, Andreas Färber,
	qemu-devel@nongnu.org Developers, Anthony Liguori,
	Alexander Graf

On 01/05/2011 02:35 PM, Andrea Arcangeli wrote:
> On Wed, Jan 05, 2011 at 02:26:19PM -0600, Michael Roth wrote:
>> Yah you're right, but I've seen several discussions about using mempath
>> for tmpfs/ram-backed files for things like numa/zram/etc so tend to
>> think of it as something potentially more than just a hook for
>> hugetlbfs, which is becoming less and less useful. But the MADV_DONTFORK
>> stuff should still be immediately applicable.
>
> Yes, MADV_DONTFORK should be set all on all guest physical memory
> without options so I hope the new patch I just posted is fine to stop
> the spurious -ENOMEM failures in fork.

The patch in this thread? A couple paths still aren't covered when using 
-mem-path. Something like this should get them all:

diff --git a/exec.c b/exec.c
index 49c28b1..cbdcb16 100644
--- a/exec.c
+++ b/exec.c
@@ -2853,6 +2853,9 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState 
*dev, const char *name,
  #endif
              qemu_madvise(new_block->host, size, QEMU_MADV_MERGEABLE);
          }
+
+        /* no allocation failures during fork/exec for migrate/hotplug */
+        qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK);
      }

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
  2011-01-05 21:27               ` Michael Roth
@ 2011-01-06 17:49                 ` Andrea Arcangeli
  2011-01-06 20:49                   ` Michael Roth
  0 siblings, 1 reply; 18+ messages in thread
From: Andrea Arcangeli @ 2011-01-06 17:49 UTC (permalink / raw)
  To: Michael Roth
  Cc: Blue Swirl, Andreas Färber,
	qemu-devel@nongnu.org Developers, Anthony Liguori,
	Alexander Graf

On Wed, Jan 05, 2011 at 03:27:37PM -0600, Michael Roth wrote:
> On 01/05/2011 02:35 PM, Andrea Arcangeli wrote:
> > On Wed, Jan 05, 2011 at 02:26:19PM -0600, Michael Roth wrote:
> >> Yah you're right, but I've seen several discussions about using mempath
> >> for tmpfs/ram-backed files for things like numa/zram/etc so tend to
> >> think of it as something potentially more than just a hook for
> >> hugetlbfs, which is becoming less and less useful. But the MADV_DONTFORK
> >> stuff should still be immediately applicable.
> >
> > Yes, MADV_DONTFORK should be set all on all guest physical memory
> > without options so I hope the new patch I just posted is fine to stop
> > the spurious -ENOMEM failures in fork.
> 
> The patch in this thread? A couple paths still aren't covered when using 
> -mem-path. Something like this should get them all:

Well the reason of MADV_DONTFORK is to avoid accounting issues with
anonymous memory, mem-path don't have that issue as hugetlbfs skips
the accounting (it has to because hugetlbfs are not always taken from
the regular page allocator). It could be however considered a minor
performance optimization.

Now you mention that you want to use -mem-path for other things too,
so maybe that's why you need it there too. BTW, if you ever need it
for more than hugetlbfs, I'm afraid this MAP_PRIVATE I see when
mem_prealloc isn't set, is going to screw any other potential useful
usage besides hugetlbfs, not exactly sure why it makes any sense to
use MAP_PRIVATE there and not only MAP_SHARED.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
  2011-01-06 17:49                 ` Andrea Arcangeli
@ 2011-01-06 20:49                   ` Michael Roth
  0 siblings, 0 replies; 18+ messages in thread
From: Michael Roth @ 2011-01-06 20:49 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Blue Swirl, Andreas Färber,
	qemu-devel@nongnu.org Developers, Anthony Liguori,
	Alexander Graf

On 01/06/2011 11:49 AM, Andrea Arcangeli wrote:
> On Wed, Jan 05, 2011 at 03:27:37PM -0600, Michael Roth wrote:
>> On 01/05/2011 02:35 PM, Andrea Arcangeli wrote:
>>> On Wed, Jan 05, 2011 at 02:26:19PM -0600, Michael Roth wrote:
>>>> Yah you're right, but I've seen several discussions about using mempath
>>>> for tmpfs/ram-backed files for things like numa/zram/etc so tend to
>>>> think of it as something potentially more than just a hook for
>>>> hugetlbfs, which is becoming less and less useful. But the MADV_DONTFORK
>>>> stuff should still be immediately applicable.
>>>
>>> Yes, MADV_DONTFORK should be set all on all guest physical memory
>>> without options so I hope the new patch I just posted is fine to stop
>>> the spurious -ENOMEM failures in fork.
>>
>> The patch in this thread? A couple paths still aren't covered when using
>> -mem-path. Something like this should get them all:
>
> Well the reason of MADV_DONTFORK is to avoid accounting issues with
> anonymous memory, mem-path don't have that issue as hugetlbfs skips
> the accounting (it has to because hugetlbfs are not always taken from
> the regular page allocator). It could be however considered a minor
> performance optimization.

That's one case, but there's also a wonky fallback in that path that 
defaults to the normal qemu_vmalloc():

if (mem_path) {
#if defined (__linux__) && !defined(TARGET_S390X)
             new_block->host = file_ram_alloc(new_block, size, mem_path);
             if (!new_block->host) {
                 new_block->host = qemu_vmalloc(size);
                 qemu_madvise(new_block->host, size, QEMU_MADV_MERGEABLE);
             }

May make sense to only add coverage for that specific case though. If 
file_ram_alloc() is generalized we could deal with it then.

> Now you mention that you want to use -mem-path for other things too,
> so maybe that's why you need it there too. BTW, if you ever need it
> for more than hugetlbfs, I'm afraid this MAP_PRIVATE I see when
> mem_prealloc isn't set, is going to screw any other potential useful
> usage besides hugetlbfs, not exactly sure why it makes any sense to
> use MAP_PRIVATE there and not only MAP_SHARED.

Not sure either...but if it's another hugetlbfs-ism it shouldn't matter 
since using mem-path for something other than hugetlbfs would ideally be 
configurable with a -mem path=/dev/shm/vm0.mem,hugetlbfs=off or 
something along that line, which wouldn't necessarily set MAP_PRIVATE.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2011-01-06 20:49 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-15 17:08 [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory Andrea Arcangeli
2010-09-15 17:34 ` Alexander Graf
2010-09-15 17:37 ` Andreas Färber
2011-01-05 15:10   ` Andrea Arcangeli
2011-01-05 18:02     ` Michael Roth
2011-01-05 19:44       ` Alexander Graf
2011-01-05 19:54         ` Andrea Arcangeli
2011-01-05 20:00           ` Alexander Graf
2011-01-05 20:12             ` Anthony Liguori
2011-01-05 20:15             ` Andrea Arcangeli
2011-01-05 20:26           ` Michael Roth
2011-01-05 20:35             ` Andrea Arcangeli
2011-01-05 21:27               ` Michael Roth
2011-01-06 17:49                 ` Andrea Arcangeli
2011-01-06 20:49                   ` Michael Roth
2011-01-05 20:10         ` Anthony Liguori
2010-09-15 21:03 ` [Qemu-devel] " Michael S. Tsirkin
2010-09-16  6:51   ` Gleb Natapov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.