All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-2.13] Clear mem_path if we fall back to anonymous RAM allocation
@ 2018-04-19  7:21 David Gibson
  2018-04-19 12:33 ` Igor Mammedov
  2018-04-19 16:30 ` [Qemu-devel] " Greg Kurz
  0 siblings, 2 replies; 12+ messages in thread
From: David Gibson @ 2018-04-19  7:21 UTC (permalink / raw)
  To: pbonzini, imammedo, ehabkost
  Cc: groug, clg, qemu-ppc, qemu-devel, David Gibson

If the -mem-path option is set, we attempt to map the guest's RAM from a
file in the given path; it's usually used to back guest RAM with hugepages.
If we're unable to (e.g. not enough free hugepages) then we fall back to
allocating normal anonymous pages.  This behaviour can be surprising, but a
comment in allocate_system_memory_nonnuma() suggests it's legacy behaviour
we can't change.

What really isn't ok, though, is that in this case we leave mem_path set.
That means functions which attempt to determine the pagesize of main RAM
can erroneously think it is hugepage based on the requested path, even
though it's not.

This is particular bad for the pseries machine type.  KVM HV limitations
mean the guest can't use pagesizes larger than the host page size used to
back RAM.  That means that such a fallback, rather than merely giving
poorer performance that expected will cause the guest to freeze up early in
boot as it attempts to use large page mappings that can't work.

This patch addresses the problem by clearing the mem_path variable when we
fall back to anonymous pages, meaning that subsequent attempts to
determine the RAM page size will get an accurate result.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 numa.c | 1 +
 1 file changed, 1 insertion(+)

Paolo et al, as with my earlier patches adding some extensions to the
helpers for determining backing page sizes, if there are no objections
can I get an ack to merge this via my ppc tree?

diff --git a/numa.c b/numa.c
index 1116c90af9..78a869e598 100644
--- a/numa.c
+++ b/numa.c
@@ -469,6 +469,7 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
             /* Legacy behavior: if allocation failed, fall back to
              * regular RAM allocation.
              */
+            mem_path = NULL;
             memory_region_init_ram_nomigrate(mr, owner, name, ram_size, &error_fatal);
         }
 #else
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.13] Clear mem_path if we fall back to anonymous RAM allocation
  2018-04-19  7:21 [Qemu-devel] [PATCH for-2.13] Clear mem_path if we fall back to anonymous RAM allocation David Gibson
@ 2018-04-19 12:33 ` Igor Mammedov
  2018-04-19 12:58   ` [Qemu-devel] [qemu-s390x] " Cornelia Huck
  2018-04-19 16:30 ` [Qemu-devel] " Greg Kurz
  1 sibling, 1 reply; 12+ messages in thread
From: Igor Mammedov @ 2018-04-19 12:33 UTC (permalink / raw)
  To: David Gibson
  Cc: pbonzini, ehabkost, qemu-devel, qemu-ppc, groug, clg,
	David Hildenbrand, qemu-s390x

On Thu, 19 Apr 2018 17:21:23 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> If the -mem-path option is set, we attempt to map the guest's RAM from a
> file in the given path; it's usually used to back guest RAM with hugepages.
> If we're unable to (e.g. not enough free hugepages) then we fall back to
> allocating normal anonymous pages.  This behaviour can be surprising, but a
> comment in allocate_system_memory_nonnuma() suggests it's legacy behaviour
> we can't change.
> 
> What really isn't ok, though, is that in this case we leave mem_path set.
> That means functions which attempt to determine the pagesize of main RAM
> can erroneously think it is hugepage based on the requested path, even
> though it's not.
> 
> This is particular bad for the pseries machine type.  KVM HV limitations
> mean the guest can't use pagesizes larger than the host page size used to
> back RAM.  That means that such a fallback, rather than merely giving
> poorer performance that expected will cause the guest to freeze up early in
> boot as it attempts to use large page mappings that can't work.
> 
> This patch addresses the problem by clearing the mem_path variable when we
> fall back to anonymous pages, meaning that subsequent attempts to
> determine the RAM page size will get an accurate result.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  numa.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> Paolo et al, as with my earlier patches adding some extensions to the
> helpers for determining backing page sizes, if there are no objections
> can I get an ack to merge this via my ppc tree?
> 
> diff --git a/numa.c b/numa.c
> index 1116c90af9..78a869e598 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -469,6 +469,7 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
>              /* Legacy behavior: if allocation failed, fall back to
>               * regular RAM allocation.
>               */
> +            mem_path = NULL;
>              memory_region_init_ram_nomigrate(mr, owner, name, ram_size, &error_fatal);
>          }
>  #else

mem_path is also used by kvm_s390_apply_cpu_model(),
and in ccw_init() memory is initialized before CPUs are
so if QEM was started with -mem-path, then before patch
created CPU won't have CMM enabled and print warning:
  
 "CMM will not be enabled because it is not compatible with hugetlbfs."

and after patch it might enable CMM if we clear mem_path.
So question is do we care about this?

PS:
CCing s390 folks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH for-2.13] Clear mem_path if we fall back to anonymous RAM allocation
  2018-04-19 12:33 ` Igor Mammedov
@ 2018-04-19 12:58   ` Cornelia Huck
  2018-04-19 13:34     ` Christian Borntraeger
  0 siblings, 1 reply; 12+ messages in thread
From: Cornelia Huck @ 2018-04-19 12:58 UTC (permalink / raw)
  To: Igor Mammedov, Christian Borntraeger
  Cc: David Gibson, ehabkost, David Hildenbrand, groug, qemu-devel,
	qemu-s390x, qemu-ppc, clg, pbonzini

On Thu, 19 Apr 2018 14:33:18 +0200
Igor Mammedov <imammedo@redhat.com> wrote:

> On Thu, 19 Apr 2018 17:21:23 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > If the -mem-path option is set, we attempt to map the guest's RAM from a
> > file in the given path; it's usually used to back guest RAM with hugepages.
> > If we're unable to (e.g. not enough free hugepages) then we fall back to
> > allocating normal anonymous pages.  This behaviour can be surprising, but a
> > comment in allocate_system_memory_nonnuma() suggests it's legacy behaviour
> > we can't change.
> > 
> > What really isn't ok, though, is that in this case we leave mem_path set.
> > That means functions which attempt to determine the pagesize of main RAM
> > can erroneously think it is hugepage based on the requested path, even
> > though it's not.
> > 
> > This is particular bad for the pseries machine type.  KVM HV limitations
> > mean the guest can't use pagesizes larger than the host page size used to
> > back RAM.  That means that such a fallback, rather than merely giving
> > poorer performance that expected will cause the guest to freeze up early in
> > boot as it attempts to use large page mappings that can't work.
> > 
> > This patch addresses the problem by clearing the mem_path variable when we
> > fall back to anonymous pages, meaning that subsequent attempts to
> > determine the RAM page size will get an accurate result.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  numa.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > Paolo et al, as with my earlier patches adding some extensions to the
> > helpers for determining backing page sizes, if there are no objections
> > can I get an ack to merge this via my ppc tree?
> > 
> > diff --git a/numa.c b/numa.c
> > index 1116c90af9..78a869e598 100644
> > --- a/numa.c
> > +++ b/numa.c
> > @@ -469,6 +469,7 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
> >              /* Legacy behavior: if allocation failed, fall back to
> >               * regular RAM allocation.
> >               */
> > +            mem_path = NULL;
> >              memory_region_init_ram_nomigrate(mr, owner, name, ram_size, &error_fatal);
> >          }
> >  #else  
> 
> mem_path is also used by kvm_s390_apply_cpu_model(),
> and in ccw_init() memory is initialized before CPUs are
> so if QEM was started with -mem-path, then before patch
> created CPU won't have CMM enabled and print warning:
>   
>  "CMM will not be enabled because it is not compatible with hugetlbfs."
> 
> and after patch it might enable CMM if we clear mem_path.
> So question is do we care about this?

I don't quite remember the cmm semantics here -- Christian?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH for-2.13] Clear mem_path if we fall back to anonymous RAM allocation
  2018-04-19 12:58   ` [Qemu-devel] [qemu-s390x] " Cornelia Huck
@ 2018-04-19 13:34     ` Christian Borntraeger
  2018-04-19 14:11       ` David Hildenbrand
  0 siblings, 1 reply; 12+ messages in thread
From: Christian Borntraeger @ 2018-04-19 13:34 UTC (permalink / raw)
  To: Cornelia Huck, Igor Mammedov
  Cc: David Gibson, ehabkost, David Hildenbrand, groug, qemu-devel,
	qemu-s390x, qemu-ppc, clg, pbonzini



On 04/19/2018 02:58 PM, Cornelia Huck wrote:
> On Thu, 19 Apr 2018 14:33:18 +0200
> Igor Mammedov <imammedo@redhat.com> wrote:
> 
>> On Thu, 19 Apr 2018 17:21:23 +1000
>> David Gibson <david@gibson.dropbear.id.au> wrote:
>>
>>> If the -mem-path option is set, we attempt to map the guest's RAM from a
>>> file in the given path; it's usually used to back guest RAM with hugepages.
>>> If we're unable to (e.g. not enough free hugepages) then we fall back to
>>> allocating normal anonymous pages.  This behaviour can be surprising, but a
>>> comment in allocate_system_memory_nonnuma() suggests it's legacy behaviour
>>> we can't change.
>>>
>>> What really isn't ok, though, is that in this case we leave mem_path set.
>>> That means functions which attempt to determine the pagesize of main RAM
>>> can erroneously think it is hugepage based on the requested path, even
>>> though it's not.
>>>
>>> This is particular bad for the pseries machine type.  KVM HV limitations
>>> mean the guest can't use pagesizes larger than the host page size used to
>>> back RAM.  That means that such a fallback, rather than merely giving
>>> poorer performance that expected will cause the guest to freeze up early in
>>> boot as it attempts to use large page mappings that can't work.
>>>
>>> This patch addresses the problem by clearing the mem_path variable when we
>>> fall back to anonymous pages, meaning that subsequent attempts to
>>> determine the RAM page size will get an accurate result.
>>>
>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>> ---
>>>  numa.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> Paolo et al, as with my earlier patches adding some extensions to the
>>> helpers for determining backing page sizes, if there are no objections
>>> can I get an ack to merge this via my ppc tree?
>>>
>>> diff --git a/numa.c b/numa.c
>>> index 1116c90af9..78a869e598 100644
>>> --- a/numa.c
>>> +++ b/numa.c
>>> @@ -469,6 +469,7 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
>>>              /* Legacy behavior: if allocation failed, fall back to
>>>               * regular RAM allocation.
>>>               */
>>> +            mem_path = NULL;
>>>              memory_region_init_ram_nomigrate(mr, owner, name, ram_size, &error_fatal);
>>>          }
>>>  #else  
>>
>> mem_path is also used by kvm_s390_apply_cpu_model(),
>> and in ccw_init() memory is initialized before CPUs are
>> so if QEM was started with -mem-path, then before patch
>> created CPU won't have CMM enabled and print warning:
>>   
>>  "CMM will not be enabled because it is not compatible with hugetlbfs."
>>
>> and after patch it might enable CMM if we clear mem_path.
>> So question is do we care about this?
> 
> I don't quite remember the cmm semantics here -- Christian?

The CMMA interface does not work on large pages. I think the kernel will react
with EFAULT in some cases (cmma migration and others) so qemu will probably fail
unexpectedly. 

But this patch seems to only clear mem-path if we do not allocate at all from
hugetlbfs. So things should be ok, no?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH for-2.13] Clear mem_path if we fall back to anonymous RAM allocation
  2018-04-19 13:34     ` Christian Borntraeger
@ 2018-04-19 14:11       ` David Hildenbrand
  2018-04-19 16:08         ` Greg Kurz
  0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand @ 2018-04-19 14:11 UTC (permalink / raw)
  To: Christian Borntraeger, Cornelia Huck, Igor Mammedov
  Cc: David Gibson, ehabkost, groug, qemu-devel, qemu-s390x, qemu-ppc,
	clg, pbonzini

On 19.04.2018 15:34, Christian Borntraeger wrote:
> 
> 
> On 04/19/2018 02:58 PM, Cornelia Huck wrote:
>> On Thu, 19 Apr 2018 14:33:18 +0200
>> Igor Mammedov <imammedo@redhat.com> wrote:
>>
>>> On Thu, 19 Apr 2018 17:21:23 +1000
>>> David Gibson <david@gibson.dropbear.id.au> wrote:
>>>
>>>> If the -mem-path option is set, we attempt to map the guest's RAM from a
>>>> file in the given path; it's usually used to back guest RAM with hugepages.
>>>> If we're unable to (e.g. not enough free hugepages) then we fall back to
>>>> allocating normal anonymous pages.  This behaviour can be surprising, but a
>>>> comment in allocate_system_memory_nonnuma() suggests it's legacy behaviour
>>>> we can't change.
>>>>
>>>> What really isn't ok, though, is that in this case we leave mem_path set.
>>>> That means functions which attempt to determine the pagesize of main RAM
>>>> can erroneously think it is hugepage based on the requested path, even
>>>> though it's not.
>>>>
>>>> This is particular bad for the pseries machine type.  KVM HV limitations
>>>> mean the guest can't use pagesizes larger than the host page size used to
>>>> back RAM.  That means that such a fallback, rather than merely giving
>>>> poorer performance that expected will cause the guest to freeze up early in
>>>> boot as it attempts to use large page mappings that can't work.
>>>>
>>>> This patch addresses the problem by clearing the mem_path variable when we
>>>> fall back to anonymous pages, meaning that subsequent attempts to
>>>> determine the RAM page size will get an accurate result.
>>>>
>>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>>> ---
>>>>  numa.c | 1 +
>>>>  1 file changed, 1 insertion(+)
>>>>
>>>> Paolo et al, as with my earlier patches adding some extensions to the
>>>> helpers for determining backing page sizes, if there are no objections
>>>> can I get an ack to merge this via my ppc tree?
>>>>
>>>> diff --git a/numa.c b/numa.c
>>>> index 1116c90af9..78a869e598 100644
>>>> --- a/numa.c
>>>> +++ b/numa.c
>>>> @@ -469,6 +469,7 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
>>>>              /* Legacy behavior: if allocation failed, fall back to
>>>>               * regular RAM allocation.
>>>>               */
>>>> +            mem_path = NULL;
>>>>              memory_region_init_ram_nomigrate(mr, owner, name, ram_size, &error_fatal);
>>>>          }
>>>>  #else  
>>>
>>> mem_path is also used by kvm_s390_apply_cpu_model(),
>>> and in ccw_init() memory is initialized before CPUs are
>>> so if QEM was started with -mem-path, then before patch
>>> created CPU won't have CMM enabled and print warning:
>>>   
>>>  "CMM will not be enabled because it is not compatible with hugetlbfs."
>>>
>>> and after patch it might enable CMM if we clear mem_path.
>>> So question is do we care about this?
>>
>> I don't quite remember the cmm semantics here -- Christian?
> 
> The CMMA interface does not work on large pages. I think the kernel will react
> with EFAULT in some cases (cmma migration and others) so qemu will probably fail
> unexpectedly. 
> 
> But this patch seems to only clear mem-path if we do not allocate at all from
> hugetlbfs. So things should be ok, no?
> 
> 

This even looks like the right thing to me, as hugetlbfs was never
supported.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH for-2.13] Clear mem_path if we fall back to anonymous RAM allocation
  2018-04-19 14:11       ` David Hildenbrand
@ 2018-04-19 16:08         ` Greg Kurz
  2018-04-20  2:17           ` David Gibson
  2018-04-20  7:13           ` Christian Borntraeger
  0 siblings, 2 replies; 12+ messages in thread
From: Greg Kurz @ 2018-04-19 16:08 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Christian Borntraeger, Cornelia Huck, Igor Mammedov,
	David Gibson, ehabkost, qemu-devel, qemu-s390x, qemu-ppc, clg,
	pbonzini

On Thu, 19 Apr 2018 16:11:37 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 19.04.2018 15:34, Christian Borntraeger wrote:
> > 
> > 
> > On 04/19/2018 02:58 PM, Cornelia Huck wrote:  
> >> On Thu, 19 Apr 2018 14:33:18 +0200
> >> Igor Mammedov <imammedo@redhat.com> wrote:
> >>  
> >>> On Thu, 19 Apr 2018 17:21:23 +1000
> >>> David Gibson <david@gibson.dropbear.id.au> wrote:
> >>>  
> >>>> If the -mem-path option is set, we attempt to map the guest's RAM from a
> >>>> file in the given path; it's usually used to back guest RAM with hugepages.
> >>>> If we're unable to (e.g. not enough free hugepages) then we fall back to
> >>>> allocating normal anonymous pages.  This behaviour can be surprising, but a
> >>>> comment in allocate_system_memory_nonnuma() suggests it's legacy behaviour
> >>>> we can't change.
> >>>>
> >>>> What really isn't ok, though, is that in this case we leave mem_path set.
> >>>> That means functions which attempt to determine the pagesize of main RAM
> >>>> can erroneously think it is hugepage based on the requested path, even
> >>>> though it's not.
> >>>>
> >>>> This is particular bad for the pseries machine type.  KVM HV limitations
> >>>> mean the guest can't use pagesizes larger than the host page size used to
> >>>> back RAM.  That means that such a fallback, rather than merely giving
> >>>> poorer performance that expected will cause the guest to freeze up early in
> >>>> boot as it attempts to use large page mappings that can't work.
> >>>>
> >>>> This patch addresses the problem by clearing the mem_path variable when we
> >>>> fall back to anonymous pages, meaning that subsequent attempts to
> >>>> determine the RAM page size will get an accurate result.
> >>>>
> >>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >>>> ---
> >>>>  numa.c | 1 +
> >>>>  1 file changed, 1 insertion(+)
> >>>>
> >>>> Paolo et al, as with my earlier patches adding some extensions to the
> >>>> helpers for determining backing page sizes, if there are no objections
> >>>> can I get an ack to merge this via my ppc tree?
> >>>>
> >>>> diff --git a/numa.c b/numa.c
> >>>> index 1116c90af9..78a869e598 100644
> >>>> --- a/numa.c
> >>>> +++ b/numa.c
> >>>> @@ -469,6 +469,7 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
> >>>>              /* Legacy behavior: if allocation failed, fall back to
> >>>>               * regular RAM allocation.
> >>>>               */
> >>>> +            mem_path = NULL;
> >>>>              memory_region_init_ram_nomigrate(mr, owner, name, ram_size, &error_fatal);
> >>>>          }
> >>>>  #else    
> >>>
> >>> mem_path is also used by kvm_s390_apply_cpu_model(),
> >>> and in ccw_init() memory is initialized before CPUs are

Something similar happens with spapr: kvm_fixup_page_sizes() calls
qemu_getrampagesize() during CPU start, which happens before the machine
init calls allocate_system_memory_nonnuma(). Shouldn't we allocate memory
before calling spapr_init_cpus() in spapr_machine_init() then ?

> >>> so if QEM was started with -mem-path, then before patch
> >>> created CPU won't have CMM enabled and print warning:
> >>>   
> >>>  "CMM will not be enabled because it is not compatible with hugetlbfs."
> >>>
> >>> and after patch it might enable CMM if we clear mem_path.
> >>> So question is do we care about this?  
> >>
> >> I don't quite remember the cmm semantics here -- Christian?  
> > 
> > The CMMA interface does not work on large pages. I think the kernel will react
> > with EFAULT in some cases (cmma migration and others) so qemu will probably fail
> > unexpectedly. 
> > 
> > But this patch seems to only clear mem-path if we do not allocate at all from
> > hugetlbfs. So things should be ok, no?
> > 
> >   
> 
> This even looks like the right thing to me, as hugetlbfs was never
> supported.
> 

Unrelated to this patch, -mem-path can be passed something that doesn't sit
in a hugetlbfs, in which case we use getpagesize()... is there a reason for
kvm_s390_enable_cmma() to filter out this case as well ? Or should we rather
check mem_path isn't NULL and points to a hugetlbfs ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.13] Clear mem_path if we fall back to anonymous RAM allocation
  2018-04-19  7:21 [Qemu-devel] [PATCH for-2.13] Clear mem_path if we fall back to anonymous RAM allocation David Gibson
  2018-04-19 12:33 ` Igor Mammedov
@ 2018-04-19 16:30 ` Greg Kurz
  2018-04-20  2:18   ` David Gibson
  1 sibling, 1 reply; 12+ messages in thread
From: Greg Kurz @ 2018-04-19 16:30 UTC (permalink / raw)
  To: David Gibson; +Cc: pbonzini, imammedo, ehabkost, clg, qemu-ppc, qemu-devel

On Thu, 19 Apr 2018 17:21:23 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> If the -mem-path option is set, we attempt to map the guest's RAM from a
> file in the given path; it's usually used to back guest RAM with hugepages.
> If we're unable to (e.g. not enough free hugepages) then we fall back to
> allocating normal anonymous pages.  This behaviour can be surprising, but a
> comment in allocate_system_memory_nonnuma() suggests it's legacy behaviour
> we can't change.
> 
> What really isn't ok, though, is that in this case we leave mem_path set.
> That means functions which attempt to determine the pagesize of main RAM
> can erroneously think it is hugepage based on the requested path, even
> though it's not.
> 
> This is particular bad for the pseries machine type.  KVM HV limitations
> mean the guest can't use pagesizes larger than the host page size used to
> back RAM.  That means that such a fallback, rather than merely giving
> poorer performance that expected will cause the guest to freeze up early in

s/that expected/than expected/

> boot as it attempts to use large page mappings that can't work.
> 
> This patch addresses the problem by clearing the mem_path variable when we
> fall back to anonymous pages, meaning that subsequent attempts to
> determine the RAM page size will get an accurate result.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  numa.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> Paolo et al, as with my earlier patches adding some extensions to the
> helpers for determining backing page sizes, if there are no objections
> can I get an ack to merge this via my ppc tree?
> 
> diff --git a/numa.c b/numa.c
> index 1116c90af9..78a869e598 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -469,6 +469,7 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
>              /* Legacy behavior: if allocation failed, fall back to
>               * regular RAM allocation.
>               */
> +            mem_path = NULL;
>              memory_region_init_ram_nomigrate(mr, owner, name, ram_size, &error_fatal);
>          }
>  #else

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH for-2.13] Clear mem_path if we fall back to anonymous RAM allocation
  2018-04-19 16:08         ` Greg Kurz
@ 2018-04-20  2:17           ` David Gibson
  2018-04-20  7:13           ` Christian Borntraeger
  1 sibling, 0 replies; 12+ messages in thread
From: David Gibson @ 2018-04-20  2:17 UTC (permalink / raw)
  To: Greg Kurz
  Cc: David Hildenbrand, Christian Borntraeger, Cornelia Huck,
	Igor Mammedov, ehabkost, qemu-devel, qemu-s390x, qemu-ppc, clg,
	pbonzini

[-- Attachment #1: Type: text/plain, Size: 4813 bytes --]

On Thu, Apr 19, 2018 at 06:08:51PM +0200, Greg Kurz wrote:
> On Thu, 19 Apr 2018 16:11:37 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
> > On 19.04.2018 15:34, Christian Borntraeger wrote:
> > > 
> > > 
> > > On 04/19/2018 02:58 PM, Cornelia Huck wrote:  
> > >> On Thu, 19 Apr 2018 14:33:18 +0200
> > >> Igor Mammedov <imammedo@redhat.com> wrote:
> > >>  
> > >>> On Thu, 19 Apr 2018 17:21:23 +1000
> > >>> David Gibson <david@gibson.dropbear.id.au> wrote:
> > >>>  
> > >>>> If the -mem-path option is set, we attempt to map the guest's RAM from a
> > >>>> file in the given path; it's usually used to back guest RAM with hugepages.
> > >>>> If we're unable to (e.g. not enough free hugepages) then we fall back to
> > >>>> allocating normal anonymous pages.  This behaviour can be surprising, but a
> > >>>> comment in allocate_system_memory_nonnuma() suggests it's legacy behaviour
> > >>>> we can't change.
> > >>>>
> > >>>> What really isn't ok, though, is that in this case we leave mem_path set.
> > >>>> That means functions which attempt to determine the pagesize of main RAM
> > >>>> can erroneously think it is hugepage based on the requested path, even
> > >>>> though it's not.
> > >>>>
> > >>>> This is particular bad for the pseries machine type.  KVM HV limitations
> > >>>> mean the guest can't use pagesizes larger than the host page size used to
> > >>>> back RAM.  That means that such a fallback, rather than merely giving
> > >>>> poorer performance that expected will cause the guest to freeze up early in
> > >>>> boot as it attempts to use large page mappings that can't work.
> > >>>>
> > >>>> This patch addresses the problem by clearing the mem_path variable when we
> > >>>> fall back to anonymous pages, meaning that subsequent attempts to
> > >>>> determine the RAM page size will get an accurate result.
> > >>>>
> > >>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > >>>> ---
> > >>>>  numa.c | 1 +
> > >>>>  1 file changed, 1 insertion(+)
> > >>>>
> > >>>> Paolo et al, as with my earlier patches adding some extensions to the
> > >>>> helpers for determining backing page sizes, if there are no objections
> > >>>> can I get an ack to merge this via my ppc tree?
> > >>>>
> > >>>> diff --git a/numa.c b/numa.c
> > >>>> index 1116c90af9..78a869e598 100644
> > >>>> --- a/numa.c
> > >>>> +++ b/numa.c
> > >>>> @@ -469,6 +469,7 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
> > >>>>              /* Legacy behavior: if allocation failed, fall back to
> > >>>>               * regular RAM allocation.
> > >>>>               */
> > >>>> +            mem_path = NULL;
> > >>>>              memory_region_init_ram_nomigrate(mr, owner, name, ram_size, &error_fatal);
> > >>>>          }
> > >>>>  #else    
> > >>>
> > >>> mem_path is also used by kvm_s390_apply_cpu_model(),
> > >>> and in ccw_init() memory is initialized before CPUs are
> 
> Something similar happens with spapr: kvm_fixup_page_sizes() calls
> qemu_getrampagesize() during CPU start, which happens before the machine
> init calls allocate_system_memory_nonnuma(). Shouldn't we allocate memory
> before calling spapr_init_cpus() in spapr_machine_init() then ?

Note that the way kvm_fixup_page_sizes() works is broken in it's own
right - this patch was actually written as a prliminary to fixing
that.

> > >>> so if QEM was started with -mem-path, then before patch
> > >>> created CPU won't have CMM enabled and print warning:
> > >>>   
> > >>>  "CMM will not be enabled because it is not compatible with hugetlbfs."
> > >>>
> > >>> and after patch it might enable CMM if we clear mem_path.
> > >>> So question is do we care about this?  
> > >>
> > >> I don't quite remember the cmm semantics here -- Christian?  
> > > 
> > > The CMMA interface does not work on large pages. I think the kernel will react
> > > with EFAULT in some cases (cmma migration and others) so qemu will probably fail
> > > unexpectedly. 
> > > 
> > > But this patch seems to only clear mem-path if we do not allocate at all from
> > > hugetlbfs. So things should be ok, no?
> > > 
> > >   
> > 
> > This even looks like the right thing to me, as hugetlbfs was never
> > supported.
> > 
> 
> Unrelated to this patch, -mem-path can be passed something that doesn't sit
> in a hugetlbfs, in which case we use getpagesize()... is there a reason for
> kvm_s390_enable_cmma() to filter out this case as well ? Or should we rather
> check mem_path isn't NULL and points to a hugetlbfs ?
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.13] Clear mem_path if we fall back to anonymous RAM allocation
  2018-04-19 16:30 ` [Qemu-devel] " Greg Kurz
@ 2018-04-20  2:18   ` David Gibson
  2018-04-20 15:34     ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: David Gibson @ 2018-04-20  2:18 UTC (permalink / raw)
  To: Greg Kurz; +Cc: pbonzini, imammedo, ehabkost, clg, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2566 bytes --]

On Thu, Apr 19, 2018 at 06:30:37PM +0200, Greg Kurz wrote:
> On Thu, 19 Apr 2018 17:21:23 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > If the -mem-path option is set, we attempt to map the guest's RAM from a
> > file in the given path; it's usually used to back guest RAM with hugepages.
> > If we're unable to (e.g. not enough free hugepages) then we fall back to
> > allocating normal anonymous pages.  This behaviour can be surprising, but a
> > comment in allocate_system_memory_nonnuma() suggests it's legacy behaviour
> > we can't change.
> > 
> > What really isn't ok, though, is that in this case we leave mem_path set.
> > That means functions which attempt to determine the pagesize of main RAM
> > can erroneously think it is hugepage based on the requested path, even
> > though it's not.
> > 
> > This is particular bad for the pseries machine type.  KVM HV limitations
> > mean the guest can't use pagesizes larger than the host page size used to
> > back RAM.  That means that such a fallback, rather than merely giving
> > poorer performance that expected will cause the guest to freeze up early in
> 
> s/that expected/than expected/

Adjusted, thanks.

> 
> > boot as it attempts to use large page mappings that can't work.
> > 
> > This patch addresses the problem by clearing the mem_path variable when we
> > fall back to anonymous pages, meaning that subsequent attempts to
> > determine the RAM page size will get an accurate result.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  numa.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > Paolo et al, as with my earlier patches adding some extensions to the
> > helpers for determining backing page sizes, if there are no objections
> > can I get an ack to merge this via my ppc tree?
> > 
> > diff --git a/numa.c b/numa.c
> > index 1116c90af9..78a869e598 100644
> > --- a/numa.c
> > +++ b/numa.c
> > @@ -469,6 +469,7 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
> >              /* Legacy behavior: if allocation failed, fall back to
> >               * regular RAM allocation.
> >               */
> > +            mem_path = NULL;
> >              memory_region_init_ram_nomigrate(mr, owner, name, ram_size, &error_fatal);
> >          }
> >  #else
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH for-2.13] Clear mem_path if we fall back to anonymous RAM allocation
  2018-04-19 16:08         ` Greg Kurz
  2018-04-20  2:17           ` David Gibson
@ 2018-04-20  7:13           ` Christian Borntraeger
  1 sibling, 0 replies; 12+ messages in thread
From: Christian Borntraeger @ 2018-04-20  7:13 UTC (permalink / raw)
  To: Greg Kurz, David Hildenbrand
  Cc: Cornelia Huck, Igor Mammedov, David Gibson, ehabkost, qemu-devel,
	qemu-s390x, qemu-ppc, clg, pbonzini



On 04/19/2018 06:08 PM, Greg Kurz wrote:
> On Thu, 19 Apr 2018 16:11:37 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 19.04.2018 15:34, Christian Borntraeger wrote:
>>>
>>>
>>> On 04/19/2018 02:58 PM, Cornelia Huck wrote:  
>>>> On Thu, 19 Apr 2018 14:33:18 +0200
>>>> Igor Mammedov <imammedo@redhat.com> wrote:
>>>>  
>>>>> On Thu, 19 Apr 2018 17:21:23 +1000
>>>>> David Gibson <david@gibson.dropbear.id.au> wrote:
>>>>>  
>>>>>> If the -mem-path option is set, we attempt to map the guest's RAM from a
>>>>>> file in the given path; it's usually used to back guest RAM with hugepages.
>>>>>> If we're unable to (e.g. not enough free hugepages) then we fall back to
>>>>>> allocating normal anonymous pages.  This behaviour can be surprising, but a
>>>>>> comment in allocate_system_memory_nonnuma() suggests it's legacy behaviour
>>>>>> we can't change.
>>>>>>
>>>>>> What really isn't ok, though, is that in this case we leave mem_path set.
>>>>>> That means functions which attempt to determine the pagesize of main RAM
>>>>>> can erroneously think it is hugepage based on the requested path, even
>>>>>> though it's not.
>>>>>>
>>>>>> This is particular bad for the pseries machine type.  KVM HV limitations
>>>>>> mean the guest can't use pagesizes larger than the host page size used to
>>>>>> back RAM.  That means that such a fallback, rather than merely giving
>>>>>> poorer performance that expected will cause the guest to freeze up early in
>>>>>> boot as it attempts to use large page mappings that can't work.
>>>>>>
>>>>>> This patch addresses the problem by clearing the mem_path variable when we
>>>>>> fall back to anonymous pages, meaning that subsequent attempts to
>>>>>> determine the RAM page size will get an accurate result.
>>>>>>
>>>>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>>>>> ---
>>>>>>  numa.c | 1 +
>>>>>>  1 file changed, 1 insertion(+)
>>>>>>
>>>>>> Paolo et al, as with my earlier patches adding some extensions to the
>>>>>> helpers for determining backing page sizes, if there are no objections
>>>>>> can I get an ack to merge this via my ppc tree?
>>>>>>
>>>>>> diff --git a/numa.c b/numa.c
>>>>>> index 1116c90af9..78a869e598 100644
>>>>>> --- a/numa.c
>>>>>> +++ b/numa.c
>>>>>> @@ -469,6 +469,7 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
>>>>>>              /* Legacy behavior: if allocation failed, fall back to
>>>>>>               * regular RAM allocation.
>>>>>>               */
>>>>>> +            mem_path = NULL;
>>>>>>              memory_region_init_ram_nomigrate(mr, owner, name, ram_size, &error_fatal);
>>>>>>          }
>>>>>>  #else    
>>>>>
>>>>> mem_path is also used by kvm_s390_apply_cpu_model(),
>>>>> and in ccw_init() memory is initialized before CPUs are
> 
> Something similar happens with spapr: kvm_fixup_page_sizes() calls
> qemu_getrampagesize() during CPU start, which happens before the machine
> init calls allocate_system_memory_nonnuma(). Shouldn't we allocate memory
> before calling spapr_init_cpus() in spapr_machine_init() then ?
> 
>>>>> so if QEM was started with -mem-path, then before patch
>>>>> created CPU won't have CMM enabled and print warning:
>>>>>   
>>>>>  "CMM will not be enabled because it is not compatible with hugetlbfs."
>>>>>
>>>>> and after patch it might enable CMM if we clear mem_path.
>>>>> So question is do we care about this?  
>>>>
>>>> I don't quite remember the cmm semantics here -- Christian?  
>>>
>>> The CMMA interface does not work on large pages. I think the kernel will react
>>> with EFAULT in some cases (cmma migration and others) so qemu will probably fail
>>> unexpectedly. 
>>>
>>> But this patch seems to only clear mem-path if we do not allocate at all from
>>> hugetlbfs. So things should be ok, no?
>>>
>>>   
>>
>> This even looks like the right thing to me, as hugetlbfs was never
>> supported.
>>
> 
> Unrelated to this patch, -mem-path can be passed something that doesn't sit
> in a hugetlbfs, in which case we use getpagesize()... is there a reason for
> kvm_s390_enable_cmma() to filter out this case as well ? Or should we rather
> check mem_path isn't NULL and points to a hugetlbfs ?

cmm is somewhat special, so I prefer to have it only for non-mem-path memory
since I know that it works for anonymous pages. I would rather white list other
mechanisms if necessary in the future.
backings

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.13] Clear mem_path if we fall back to anonymous RAM allocation
  2018-04-20  2:18   ` David Gibson
@ 2018-04-20 15:34     ` Paolo Bonzini
  2018-04-21  9:20       ` David Gibson
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2018-04-20 15:34 UTC (permalink / raw)
  To: David Gibson, Greg Kurz; +Cc: imammedo, ehabkost, clg, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 280 bytes --]

On 20/04/2018 04:18, David Gibson wrote:
> Paolo et al, as with my earlier patches adding some extensions to the
> helpers for determining backing page sizes, if there are no objections
> can I get an ack to merge this via my ppc tree?

Yes, go ahead!

Thanks,

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.13] Clear mem_path if we fall back to anonymous RAM allocation
  2018-04-20 15:34     ` Paolo Bonzini
@ 2018-04-21  9:20       ` David Gibson
  0 siblings, 0 replies; 12+ messages in thread
From: David Gibson @ 2018-04-21  9:20 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Greg Kurz, imammedo, ehabkost, clg, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 606 bytes --]

On Fri, Apr 20, 2018 at 05:34:38PM +0200, Paolo Bonzini wrote:
> On 20/04/2018 04:18, David Gibson wrote:
> > Paolo et al, as with my earlier patches adding some extensions to the
> > helpers for determining backing page sizes, if there are no objections
> > can I get an ack to merge this via my ppc tree?
> 
> Yes, go ahead!

Thanks, applied to ppc-for-2.13.

> 
> Thanks,
> 
> Paolo
> 




-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-04-21  9:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-19  7:21 [Qemu-devel] [PATCH for-2.13] Clear mem_path if we fall back to anonymous RAM allocation David Gibson
2018-04-19 12:33 ` Igor Mammedov
2018-04-19 12:58   ` [Qemu-devel] [qemu-s390x] " Cornelia Huck
2018-04-19 13:34     ` Christian Borntraeger
2018-04-19 14:11       ` David Hildenbrand
2018-04-19 16:08         ` Greg Kurz
2018-04-20  2:17           ` David Gibson
2018-04-20  7:13           ` Christian Borntraeger
2018-04-19 16:30 ` [Qemu-devel] " Greg Kurz
2018-04-20  2:18   ` David Gibson
2018-04-20 15:34     ` Paolo Bonzini
2018-04-21  9:20       ` David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.