All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: 9 TiB vm memory creation
       [not found] <alpine.DEB.2.22.394.2202141048390.13781@anisinha-lenovo>
@ 2022-02-14 12:36 ` Igor Mammedov
  2022-02-14 14:37   ` David Hildenbrand
  0 siblings, 1 reply; 17+ messages in thread
From: Igor Mammedov @ 2022-02-14 12:36 UTC (permalink / raw)
  To: Ani Sinha, QEMU Developers, David Hildenbrand

On Mon, 14 Feb 2022 10:54:22 +0530 (IST)
Ani Sinha <ani@anisinha.ca> wrote:

> Hi Igor:
> 
> I failed to spawn a 9 Tib VM. The max I could do was a 2 TiB vm on my
> system with the following commandline before either the system
> destabilized or the OOM killed killed qemu
> 
> -m 2T,maxmem=9T,slots=1 \
> -object memory-backend-file,id=mem0,size=2T,mem-path=/data/temp/memfile,prealloc=off \
> -machine memory-backend=mem0 \
> -chardev file,path=/tmp/debugcon2.txt,id=debugcon \
> -device isa-debugcon,iobase=0x402,chardev=debugcon \
> 
> I have attached the debugcon output from 2 TiB vm.
> Is there any other commandline parameters or options I should try?
> 
> thanks
> ani

$ truncate -s 9T 9tb_sparse_disk.img
$ qemu-system-x86_64 -m 9T \
  -object memory-backend-file,id=mem0,size=9T,mem-path=9tb_sparse_disk.img,prealloc=off,share=on \
  -machine memory-backend=mem0

works for me till GRUB menu, with sufficient guest kernel
persuasion (i.e. CLI limit ram size to something reasonable) you can boot linux
guest on it and inspect SMBIOS tables comfortably.


With KVM enabled it bails out with:
   qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=1, start=0x100000000, size=0x8ff40000000: Invalid argument

all of that on a host with 32G of RAM/no swap.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 9 TiB vm memory creation
  2022-02-14 12:36 ` 9 TiB vm memory creation Igor Mammedov
@ 2022-02-14 14:37   ` David Hildenbrand
  2022-02-14 15:55       ` Igor Mammedov
  2022-02-15  7:00     ` Ani Sinha
  0 siblings, 2 replies; 17+ messages in thread
From: David Hildenbrand @ 2022-02-14 14:37 UTC (permalink / raw)
  To: Igor Mammedov, Ani Sinha, QEMU Developers

On 14.02.22 13:36, Igor Mammedov wrote:
> On Mon, 14 Feb 2022 10:54:22 +0530 (IST)
> Ani Sinha <ani@anisinha.ca> wrote:
> 
>> Hi Igor:
>>
>> I failed to spawn a 9 Tib VM. The max I could do was a 2 TiB vm on my
>> system with the following commandline before either the system
>> destabilized or the OOM killed killed qemu
>>
>> -m 2T,maxmem=9T,slots=1 \
>> -object memory-backend-file,id=mem0,size=2T,mem-path=/data/temp/memfile,prealloc=off \
>> -machine memory-backend=mem0 \
>> -chardev file,path=/tmp/debugcon2.txt,id=debugcon \
>> -device isa-debugcon,iobase=0x402,chardev=debugcon \
>>
>> I have attached the debugcon output from 2 TiB vm.
>> Is there any other commandline parameters or options I should try?
>>
>> thanks
>> ani
> 
> $ truncate -s 9T 9tb_sparse_disk.img
> $ qemu-system-x86_64 -m 9T \
>   -object memory-backend-file,id=mem0,size=9T,mem-path=9tb_sparse_disk.img,prealloc=off,share=on \
>   -machine memory-backend=mem0
> 
> works for me till GRUB menu, with sufficient guest kernel
> persuasion (i.e. CLI limit ram size to something reasonable) you can boot linux
> guest on it and inspect SMBIOS tables comfortably.
> 
> 
> With KVM enabled it bails out with:
>    qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=1, start=0x100000000, size=0x8ff40000000: Invalid argument
> 
> all of that on a host with 32G of RAM/no swap.
> 

#define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1)

~8 TiB (7,999999)



In QEMU, we have

static hwaddr kvm_max_slot_size = ~0;

And only s390x sets

kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES);

with

#define KVM_SLOT_MAX_BYTES (4UL * TiB)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 9 TiB vm memory creation
  2022-02-14 14:37   ` David Hildenbrand
@ 2022-02-14 15:55       ` Igor Mammedov
  2022-02-15  7:00     ` Ani Sinha
  1 sibling, 0 replies; 17+ messages in thread
From: Igor Mammedov @ 2022-02-14 15:55 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Ani Sinha, QEMU Developers, Paolo Bonzini, kvm

On Mon, 14 Feb 2022 15:37:53 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 14.02.22 13:36, Igor Mammedov wrote:
> > On Mon, 14 Feb 2022 10:54:22 +0530 (IST)
> > Ani Sinha <ani@anisinha.ca> wrote:
> >   
> >> Hi Igor:
> >>
> >> I failed to spawn a 9 Tib VM. The max I could do was a 2 TiB vm on my
> >> system with the following commandline before either the system
> >> destabilized or the OOM killed killed qemu
> >>
> >> -m 2T,maxmem=9T,slots=1 \
> >> -object memory-backend-file,id=mem0,size=2T,mem-path=/data/temp/memfile,prealloc=off \
> >> -machine memory-backend=mem0 \
> >> -chardev file,path=/tmp/debugcon2.txt,id=debugcon \
> >> -device isa-debugcon,iobase=0x402,chardev=debugcon \
> >>
> >> I have attached the debugcon output from 2 TiB vm.
> >> Is there any other commandline parameters or options I should try?
> >>
> >> thanks
> >> ani  
> > 
> > $ truncate -s 9T 9tb_sparse_disk.img
> > $ qemu-system-x86_64 -m 9T \
> >   -object memory-backend-file,id=mem0,size=9T,mem-path=9tb_sparse_disk.img,prealloc=off,share=on \
> >   -machine memory-backend=mem0
> > 
> > works for me till GRUB menu, with sufficient guest kernel
> > persuasion (i.e. CLI limit ram size to something reasonable) you can boot linux
> > guest on it and inspect SMBIOS tables comfortably.
> > 
> > 
> > With KVM enabled it bails out with:
> >    qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=1, start=0x100000000, size=0x8ff40000000: Invalid argument
> > 
> > all of that on a host with 32G of RAM/no swap.
> >
> >   
> 
> #define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1)
> 
> ~8 TiB (7,999999)

so essentially that's the our max for initial RAM
(ignoring initial RAM slots before 4Gb)

Are you aware of any attempts to make it larger?

But can we use extra pc-dimm devices for additional memory (with 8TiB limit)
as that will use another memslot?


> 
> In QEMU, we have
> 
> static hwaddr kvm_max_slot_size = ~0;
> 
> And only s390x sets
> 
> kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES);
> 
> with
> 
> #define KVM_SLOT_MAX_BYTES (4UL * TiB)
in QEMU default value is:
  static hwaddr kvm_max_slot_size = ~0
it is kernel side that's failing







^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 9 TiB vm memory creation
@ 2022-02-14 15:55       ` Igor Mammedov
  0 siblings, 0 replies; 17+ messages in thread
From: Igor Mammedov @ 2022-02-14 15:55 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Ani Sinha, Paolo Bonzini, QEMU Developers, kvm

On Mon, 14 Feb 2022 15:37:53 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 14.02.22 13:36, Igor Mammedov wrote:
> > On Mon, 14 Feb 2022 10:54:22 +0530 (IST)
> > Ani Sinha <ani@anisinha.ca> wrote:
> >   
> >> Hi Igor:
> >>
> >> I failed to spawn a 9 Tib VM. The max I could do was a 2 TiB vm on my
> >> system with the following commandline before either the system
> >> destabilized or the OOM killed killed qemu
> >>
> >> -m 2T,maxmem=9T,slots=1 \
> >> -object memory-backend-file,id=mem0,size=2T,mem-path=/data/temp/memfile,prealloc=off \
> >> -machine memory-backend=mem0 \
> >> -chardev file,path=/tmp/debugcon2.txt,id=debugcon \
> >> -device isa-debugcon,iobase=0x402,chardev=debugcon \
> >>
> >> I have attached the debugcon output from 2 TiB vm.
> >> Is there any other commandline parameters or options I should try?
> >>
> >> thanks
> >> ani  
> > 
> > $ truncate -s 9T 9tb_sparse_disk.img
> > $ qemu-system-x86_64 -m 9T \
> >   -object memory-backend-file,id=mem0,size=9T,mem-path=9tb_sparse_disk.img,prealloc=off,share=on \
> >   -machine memory-backend=mem0
> > 
> > works for me till GRUB menu, with sufficient guest kernel
> > persuasion (i.e. CLI limit ram size to something reasonable) you can boot linux
> > guest on it and inspect SMBIOS tables comfortably.
> > 
> > 
> > With KVM enabled it bails out with:
> >    qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=1, start=0x100000000, size=0x8ff40000000: Invalid argument
> > 
> > all of that on a host with 32G of RAM/no swap.
> >
> >   
> 
> #define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1)
> 
> ~8 TiB (7,999999)

so essentially that's the our max for initial RAM
(ignoring initial RAM slots before 4Gb)

Are you aware of any attempts to make it larger?

But can we use extra pc-dimm devices for additional memory (with 8TiB limit)
as that will use another memslot?


> 
> In QEMU, we have
> 
> static hwaddr kvm_max_slot_size = ~0;
> 
> And only s390x sets
> 
> kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES);
> 
> with
> 
> #define KVM_SLOT_MAX_BYTES (4UL * TiB)
in QEMU default value is:
  static hwaddr kvm_max_slot_size = ~0
it is kernel side that's failing








^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 9 TiB vm memory creation
  2022-02-14 15:55       ` Igor Mammedov
@ 2022-02-14 16:32         ` David Hildenbrand
  -1 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2022-02-14 16:32 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: Ani Sinha, QEMU Developers, Paolo Bonzini, kvm

>>>
>>> With KVM enabled it bails out with:
>>>    qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=1, start=0x100000000, size=0x8ff40000000: Invalid argument
>>>
>>> all of that on a host with 32G of RAM/no swap.
>>>
>>>   
>>
>> #define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1)
>>
>> ~8 TiB (7,999999)
> 
> so essentially that's the our max for initial RAM
> (ignoring initial RAM slots before 4Gb)
> 
> Are you aware of any attempts to make it larger?

Not really, I think for now only s390x had applicable machines where
you'd have that much memory on a single NUMA node.

> 
> But can we use extra pc-dimm devices for additional memory (with 8TiB limit)
> as that will use another memslot?

I remember that was the workaround for now for some extremely large VMs
where you'd want a single NUMA node or a lot of memory for a single NUMA
node.

> 
> 
>>
>> In QEMU, we have
>>
>> static hwaddr kvm_max_slot_size = ~0;
>>
>> And only s390x sets
>>
>> kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES);
>>
>> with
>>
>> #define KVM_SLOT_MAX_BYTES (4UL * TiB)
> in QEMU default value is:
>   static hwaddr kvm_max_slot_size = ~0
> it is kernel side that's failing

... and kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES) works around the
kernel limitation for s390x in user space.

I feel like the right thing would be to look into increasing the limit
in the kernel, and bail out if the kernel doesn't support it. Would
require a new kernel for starting gigantic VMs with a single large
memory backend, but then, it's a new use case.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 9 TiB vm memory creation
@ 2022-02-14 16:32         ` David Hildenbrand
  0 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2022-02-14 16:32 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: Ani Sinha, Paolo Bonzini, QEMU Developers, kvm

>>>
>>> With KVM enabled it bails out with:
>>>    qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=1, start=0x100000000, size=0x8ff40000000: Invalid argument
>>>
>>> all of that on a host with 32G of RAM/no swap.
>>>
>>>   
>>
>> #define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1)
>>
>> ~8 TiB (7,999999)
> 
> so essentially that's the our max for initial RAM
> (ignoring initial RAM slots before 4Gb)
> 
> Are you aware of any attempts to make it larger?

Not really, I think for now only s390x had applicable machines where
you'd have that much memory on a single NUMA node.

> 
> But can we use extra pc-dimm devices for additional memory (with 8TiB limit)
> as that will use another memslot?

I remember that was the workaround for now for some extremely large VMs
where you'd want a single NUMA node or a lot of memory for a single NUMA
node.

> 
> 
>>
>> In QEMU, we have
>>
>> static hwaddr kvm_max_slot_size = ~0;
>>
>> And only s390x sets
>>
>> kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES);
>>
>> with
>>
>> #define KVM_SLOT_MAX_BYTES (4UL * TiB)
> in QEMU default value is:
>   static hwaddr kvm_max_slot_size = ~0
> it is kernel side that's failing

... and kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES) works around the
kernel limitation for s390x in user space.

I feel like the right thing would be to look into increasing the limit
in the kernel, and bail out if the kernel doesn't support it. Would
require a new kernel for starting gigantic VMs with a single large
memory backend, but then, it's a new use case.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 9 TiB vm memory creation
  2022-02-14 14:37   ` David Hildenbrand
  2022-02-14 15:55       ` Igor Mammedov
@ 2022-02-15  7:00     ` Ani Sinha
  2022-02-15  7:11       ` Ani Sinha
                         ` (2 more replies)
  1 sibling, 3 replies; 17+ messages in thread
From: Ani Sinha @ 2022-02-15  7:00 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Ani Sinha, Igor Mammedov, QEMU Developers



On Mon, 14 Feb 2022, David Hildenbrand wrote:

> On 14.02.22 13:36, Igor Mammedov wrote:
> > On Mon, 14 Feb 2022 10:54:22 +0530 (IST)
> > Ani Sinha <ani@anisinha.ca> wrote:
> >
> >> Hi Igor:
> >>
> >> I failed to spawn a 9 Tib VM. The max I could do was a 2 TiB vm on my
> >> system with the following commandline before either the system
> >> destabilized or the OOM killed killed qemu
> >>
> >> -m 2T,maxmem=9T,slots=1 \
> >> -object memory-backend-file,id=mem0,size=2T,mem-path=/data/temp/memfile,prealloc=off \
> >> -machine memory-backend=mem0 \
> >> -chardev file,path=/tmp/debugcon2.txt,id=debugcon \
> >> -device isa-debugcon,iobase=0x402,chardev=debugcon \
> >>
> >> I have attached the debugcon output from 2 TiB vm.
> >> Is there any other commandline parameters or options I should try?
> >>
> >> thanks
> >> ani
> >
> > $ truncate -s 9T 9tb_sparse_disk.img
> > $ qemu-system-x86_64 -m 9T \
> >   -object memory-backend-file,id=mem0,size=9T,mem-path=9tb_sparse_disk.img,prealloc=off,share=on \
> >   -machine memory-backend=mem0
> >
> > works for me till GRUB menu, with sufficient guest kernel
> > persuasion (i.e. CLI limit ram size to something reasonable) you can boot linux
> > guest on it and inspect SMBIOS tables comfortably.
> >
> >
> > With KVM enabled it bails out with:
> >    qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=1, start=0x100000000, size=0x8ff40000000: Invalid argument
> >

I have seen this in my system but not always. Maybe I should have dug
deeper as to why i do see this all the time.

> > all of that on a host with 32G of RAM/no swap.
> >

My system in 16 Gib of main memory, no swap.

>
> #define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1)
>
> ~8 TiB (7,999999)

That's not 8 Tib, thats 2 GiB. But yes, 0x8ff40000000 is certainly greater
than 2 Gib * 4K (assuming 4K size pages).

So in kvm_main.c in kernel, likely we are hitting this:

	new.npages = mem->memory_size >> PAGE_SHIFT;

        if (new.npages > KVM_MEM_MAX_NR_PAGES)
                return -EINVAL;

>
>
>
> In QEMU, we have
>
> static hwaddr kvm_max_slot_size = ~0;
>
> And only s390x sets
>
> kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES);
>
> with
>
> #define KVM_SLOT_MAX_BYTES (4UL * TiB)
>


So seems in Igor's system its getting limited by kvm not qemu.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 9 TiB vm memory creation
  2022-02-15  7:00     ` Ani Sinha
@ 2022-02-15  7:11       ` Ani Sinha
  2022-02-15  7:29       ` Ani Sinha
  2022-02-15  7:55       ` David Hildenbrand
  2 siblings, 0 replies; 17+ messages in thread
From: Ani Sinha @ 2022-02-15  7:11 UTC (permalink / raw)
  To: Ani Sinha; +Cc: Igor Mammedov, QEMU Developers, David Hildenbrand




> > static hwaddr kvm_max_slot_size = ~0;
> >
> > And only s390x sets
> >
> > kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES);
> >
> > with
> >
> > #define KVM_SLOT_MAX_BYTES (4UL * TiB)
> >
>
>
> So seems in Igor's system its getting limited by kvm not qemu.

oops sorry, I read through the thread. we are all saying the same thing.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 9 TiB vm memory creation
  2022-02-15  7:00     ` Ani Sinha
  2022-02-15  7:11       ` Ani Sinha
@ 2022-02-15  7:29       ` Ani Sinha
  2022-02-15  7:55       ` David Hildenbrand
  2 siblings, 0 replies; 17+ messages in thread
From: Ani Sinha @ 2022-02-15  7:29 UTC (permalink / raw)
  To: Ani Sinha; +Cc: Igor Mammedov, QEMU Developers, David Hildenbrand


> > > With KVM enabled it bails out with:
> > >    qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=1, start=0x100000000, size=0x8ff40000000: Invalid argument
> > >
>
> I have seen this in my system but not always. Maybe I should have dug
> deeper as to why i do see this all the time.

Actually this would happen only wnen I was playing with memory larger than
8 TiB. So it makes sense.

I ran my script again and I can repro it right away:

2022-02-15T07:25:34.051320Z qemu-system-x86_64:
kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=1,
start=0x100000000, size=0x8ff40000000: Invalid argument
kvm_set_phys_mem: error registering slot: Invalid argument

The other thing I had to do was

# echo 1 > /proc/sys/vm/overcommit_memory

otherwise the firse mmap() in mmap_activate() fails.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 9 TiB vm memory creation
  2022-02-15  7:00     ` Ani Sinha
  2022-02-15  7:11       ` Ani Sinha
  2022-02-15  7:29       ` Ani Sinha
@ 2022-02-15  7:55       ` David Hildenbrand
  2022-02-15  8:12         ` Ani Sinha
  2 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2022-02-15  7:55 UTC (permalink / raw)
  To: Ani Sinha; +Cc: Igor Mammedov, QEMU Developers

On 15.02.22 08:00, Ani Sinha wrote:
> 
> 
> On Mon, 14 Feb 2022, David Hildenbrand wrote:
> 
>> On 14.02.22 13:36, Igor Mammedov wrote:
>>> On Mon, 14 Feb 2022 10:54:22 +0530 (IST)
>>> Ani Sinha <ani@anisinha.ca> wrote:
>>>
>>>> Hi Igor:
>>>>
>>>> I failed to spawn a 9 Tib VM. The max I could do was a 2 TiB vm on my
>>>> system with the following commandline before either the system
>>>> destabilized or the OOM killed killed qemu
>>>>
>>>> -m 2T,maxmem=9T,slots=1 \
>>>> -object memory-backend-file,id=mem0,size=2T,mem-path=/data/temp/memfile,prealloc=off \
>>>> -machine memory-backend=mem0 \
>>>> -chardev file,path=/tmp/debugcon2.txt,id=debugcon \
>>>> -device isa-debugcon,iobase=0x402,chardev=debugcon \
>>>>
>>>> I have attached the debugcon output from 2 TiB vm.
>>>> Is there any other commandline parameters or options I should try?
>>>>
>>>> thanks
>>>> ani
>>>
>>> $ truncate -s 9T 9tb_sparse_disk.img
>>> $ qemu-system-x86_64 -m 9T \
>>>   -object memory-backend-file,id=mem0,size=9T,mem-path=9tb_sparse_disk.img,prealloc=off,share=on \
>>>   -machine memory-backend=mem0
>>>
>>> works for me till GRUB menu, with sufficient guest kernel
>>> persuasion (i.e. CLI limit ram size to something reasonable) you can boot linux
>>> guest on it and inspect SMBIOS tables comfortably.
>>>
>>>
>>> With KVM enabled it bails out with:
>>>    qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=1, start=0x100000000, size=0x8ff40000000: Invalid argument
>>>
> 
> I have seen this in my system but not always. Maybe I should have dug
> deeper as to why i do see this all the time.
> 
>>> all of that on a host with 32G of RAM/no swap.
>>>
> 
> My system in 16 Gib of main memory, no swap.
> 
>>
>> #define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1)
>>
>> ~8 TiB (7,999999)
> 
> That's not 8 Tib, thats 2 GiB. But yes, 0x8ff40000000 is certainly greater
> than 2 Gib * 4K (assuming 4K size pages).

"pages" don't carry the unit "GiB/TiB", so I was talking about the
actual size with 4k pages (your setup, I assume)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 9 TiB vm memory creation
  2022-02-15  7:55       ` David Hildenbrand
@ 2022-02-15  8:12         ` Ani Sinha
  2022-02-15  8:38           ` David Hildenbrand
  0 siblings, 1 reply; 17+ messages in thread
From: Ani Sinha @ 2022-02-15  8:12 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Igor Mammedov, QEMU Developers

On Tue, Feb 15, 2022 at 1:25 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 15.02.22 08:00, Ani Sinha wrote:
> >
> >
> > On Mon, 14 Feb 2022, David Hildenbrand wrote:
> >
> >> On 14.02.22 13:36, Igor Mammedov wrote:
> >>> On Mon, 14 Feb 2022 10:54:22 +0530 (IST)
> >>> Ani Sinha <ani@anisinha.ca> wrote:
> >>>
> >>>> Hi Igor:
> >>>>
> >>>> I failed to spawn a 9 Tib VM. The max I could do was a 2 TiB vm on my
> >>>> system with the following commandline before either the system
> >>>> destabilized or the OOM killed killed qemu
> >>>>
> >>>> -m 2T,maxmem=9T,slots=1 \
> >>>> -object memory-backend-file,id=mem0,size=2T,mem-path=/data/temp/memfile,prealloc=off \
> >>>> -machine memory-backend=mem0 \
> >>>> -chardev file,path=/tmp/debugcon2.txt,id=debugcon \
> >>>> -device isa-debugcon,iobase=0x402,chardev=debugcon \
> >>>>
> >>>> I have attached the debugcon output from 2 TiB vm.
> >>>> Is there any other commandline parameters or options I should try?
> >>>>
> >>>> thanks
> >>>> ani
> >>>
> >>> $ truncate -s 9T 9tb_sparse_disk.img
> >>> $ qemu-system-x86_64 -m 9T \
> >>>   -object memory-backend-file,id=mem0,size=9T,mem-path=9tb_sparse_disk.img,prealloc=off,share=on \
> >>>   -machine memory-backend=mem0
> >>>
> >>> works for me till GRUB menu, with sufficient guest kernel
> >>> persuasion (i.e. CLI limit ram size to something reasonable) you can boot linux
> >>> guest on it and inspect SMBIOS tables comfortably.
> >>>
> >>>
> >>> With KVM enabled it bails out with:
> >>>    qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=1, start=0x100000000, size=0x8ff40000000: Invalid argument
> >>>
> >
> > I have seen this in my system but not always. Maybe I should have dug
> > deeper as to why i do see this all the time.
> >
> >>> all of that on a host with 32G of RAM/no swap.
> >>>
> >
> > My system in 16 Gib of main memory, no swap.
> >
> >>
> >> #define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1)
> >>
> >> ~8 TiB (7,999999)
> >
> > That's not 8 Tib, thats 2 GiB. But yes, 0x8ff40000000 is certainly greater
> > than 2 Gib * 4K (assuming 4K size pages).
>
> "pages" don't carry the unit "GiB/TiB", so I was talking about the
> actual size with 4k pages (your setup, I assume)

yes I got that after reading your email again.
The interesting question now is how is redhat QE running 9 TiB vm with kvm?

https://bugzilla-attachments.redhat.com/attachment.cgi?id=1795945


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 9 TiB vm memory creation
  2022-02-15  8:12         ` Ani Sinha
@ 2022-02-15  8:38           ` David Hildenbrand
  2022-02-15  9:40             ` Ani Sinha
  2022-02-15 10:44             ` Daniel P. Berrangé
  0 siblings, 2 replies; 17+ messages in thread
From: David Hildenbrand @ 2022-02-15  8:38 UTC (permalink / raw)
  To: Ani Sinha; +Cc: Igor Mammedov, QEMU Developers

On 15.02.22 09:12, Ani Sinha wrote:
> On Tue, Feb 15, 2022 at 1:25 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 15.02.22 08:00, Ani Sinha wrote:
>>>
>>>
>>> On Mon, 14 Feb 2022, David Hildenbrand wrote:
>>>
>>>> On 14.02.22 13:36, Igor Mammedov wrote:
>>>>> On Mon, 14 Feb 2022 10:54:22 +0530 (IST)
>>>>> Ani Sinha <ani@anisinha.ca> wrote:
>>>>>
>>>>>> Hi Igor:
>>>>>>
>>>>>> I failed to spawn a 9 Tib VM. The max I could do was a 2 TiB vm on my
>>>>>> system with the following commandline before either the system
>>>>>> destabilized or the OOM killed killed qemu
>>>>>>
>>>>>> -m 2T,maxmem=9T,slots=1 \
>>>>>> -object memory-backend-file,id=mem0,size=2T,mem-path=/data/temp/memfile,prealloc=off \
>>>>>> -machine memory-backend=mem0 \
>>>>>> -chardev file,path=/tmp/debugcon2.txt,id=debugcon \
>>>>>> -device isa-debugcon,iobase=0x402,chardev=debugcon \
>>>>>>
>>>>>> I have attached the debugcon output from 2 TiB vm.
>>>>>> Is there any other commandline parameters or options I should try?
>>>>>>
>>>>>> thanks
>>>>>> ani
>>>>>
>>>>> $ truncate -s 9T 9tb_sparse_disk.img
>>>>> $ qemu-system-x86_64 -m 9T \
>>>>>   -object memory-backend-file,id=mem0,size=9T,mem-path=9tb_sparse_disk.img,prealloc=off,share=on \
>>>>>   -machine memory-backend=mem0
>>>>>
>>>>> works for me till GRUB menu, with sufficient guest kernel
>>>>> persuasion (i.e. CLI limit ram size to something reasonable) you can boot linux
>>>>> guest on it and inspect SMBIOS tables comfortably.
>>>>>
>>>>>
>>>>> With KVM enabled it bails out with:
>>>>>    qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=1, start=0x100000000, size=0x8ff40000000: Invalid argument
>>>>>
>>>
>>> I have seen this in my system but not always. Maybe I should have dug
>>> deeper as to why i do see this all the time.
>>>
>>>>> all of that on a host with 32G of RAM/no swap.
>>>>>
>>>
>>> My system in 16 Gib of main memory, no swap.
>>>
>>>>
>>>> #define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1)
>>>>
>>>> ~8 TiB (7,999999)
>>>
>>> That's not 8 Tib, thats 2 GiB. But yes, 0x8ff40000000 is certainly greater
>>> than 2 Gib * 4K (assuming 4K size pages).
>>
>> "pages" don't carry the unit "GiB/TiB", so I was talking about the
>> actual size with 4k pages (your setup, I assume)
> 
> yes I got that after reading your email again.
> The interesting question now is how is redhat QE running 9 TiB vm with kvm?

As already indicated by me regarding s390x only having single large NUMA
nodes, x86 is usually using multiple NUMA nodes with such large memory.
And QE seems to be using virtual numa nodes:

Each of the 32 virtual numa nodes receive a:

  -object memory-backend-ram,id=ram-node20,size=309237645312,host-
   nodes=0-31,policy=bind

which results in a dedicated KVM memslot (just like each DIMM would)


32 * 309237645312 == 9 TiB :)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 9 TiB vm memory creation
  2022-02-15  8:38           ` David Hildenbrand
@ 2022-02-15  9:40             ` Ani Sinha
  2022-02-15  9:44               ` David Hildenbrand
  2022-02-15 10:44             ` Daniel P. Berrangé
  1 sibling, 1 reply; 17+ messages in thread
From: Ani Sinha @ 2022-02-15  9:40 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Igor Mammedov, QEMU Developers

On Tue, Feb 15, 2022 at 2:08 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 15.02.22 09:12, Ani Sinha wrote:
> > On Tue, Feb 15, 2022 at 1:25 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 15.02.22 08:00, Ani Sinha wrote:
> >>>
> >>>
> >>> On Mon, 14 Feb 2022, David Hildenbrand wrote:
> >>>
> >>>> On 14.02.22 13:36, Igor Mammedov wrote:
> >>>>> On Mon, 14 Feb 2022 10:54:22 +0530 (IST)
> >>>>> Ani Sinha <ani@anisinha.ca> wrote:
> >>>>>
> >>>>>> Hi Igor:
> >>>>>>
> >>>>>> I failed to spawn a 9 Tib VM. The max I could do was a 2 TiB vm on my
> >>>>>> system with the following commandline before either the system
> >>>>>> destabilized or the OOM killed killed qemu
> >>>>>>
> >>>>>> -m 2T,maxmem=9T,slots=1 \
> >>>>>> -object memory-backend-file,id=mem0,size=2T,mem-path=/data/temp/memfile,prealloc=off \
> >>>>>> -machine memory-backend=mem0 \
> >>>>>> -chardev file,path=/tmp/debugcon2.txt,id=debugcon \
> >>>>>> -device isa-debugcon,iobase=0x402,chardev=debugcon \
> >>>>>>
> >>>>>> I have attached the debugcon output from 2 TiB vm.
> >>>>>> Is there any other commandline parameters or options I should try?
> >>>>>>
> >>>>>> thanks
> >>>>>> ani
> >>>>>
> >>>>> $ truncate -s 9T 9tb_sparse_disk.img
> >>>>> $ qemu-system-x86_64 -m 9T \
> >>>>>   -object memory-backend-file,id=mem0,size=9T,mem-path=9tb_sparse_disk.img,prealloc=off,share=on \
> >>>>>   -machine memory-backend=mem0
> >>>>>
> >>>>> works for me till GRUB menu, with sufficient guest kernel
> >>>>> persuasion (i.e. CLI limit ram size to something reasonable) you can boot linux
> >>>>> guest on it and inspect SMBIOS tables comfortably.
> >>>>>
> >>>>>
> >>>>> With KVM enabled it bails out with:
> >>>>>    qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=1, start=0x100000000, size=0x8ff40000000: Invalid argument
> >>>>>
> >>>
> >>> I have seen this in my system but not always. Maybe I should have dug
> >>> deeper as to why i do see this all the time.
> >>>
> >>>>> all of that on a host with 32G of RAM/no swap.
> >>>>>
> >>>
> >>> My system in 16 Gib of main memory, no swap.
> >>>
> >>>>
> >>>> #define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1)
> >>>>
> >>>> ~8 TiB (7,999999)
> >>>
> >>> That's not 8 Tib, thats 2 GiB. But yes, 0x8ff40000000 is certainly greater
> >>> than 2 Gib * 4K (assuming 4K size pages).
> >>
> >> "pages" don't carry the unit "GiB/TiB", so I was talking about the
> >> actual size with 4k pages (your setup, I assume)
> >
> > yes I got that after reading your email again.
> > The interesting question now is how is redhat QE running 9 TiB vm with kvm?
>
> As already indicated by me regarding s390x only having single large NUMA
> nodes, x86 is usually using multiple NUMA nodes with such large memory.
> And QE seems to be using virtual numa nodes:
>
> Each of the 32 virtual numa nodes receive a:
>
>   -object memory-backend-ram,id=ram-node20,size=309237645312,host-
>    nodes=0-31,policy=bind
>
> which results in a dedicated KVM memslot (just like each DIMM would)
>
>
> 32 * 309237645312 == 9 TiB :)

ah, I should have looked closely at the other commandlines before
shooting off the email. Yes the limitation is per mem-slot and they
have 32 slots one per node.
ok so should we do
kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES);
from i386 kvm_arch_init()?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 9 TiB vm memory creation
  2022-02-15  9:40             ` Ani Sinha
@ 2022-02-15  9:44               ` David Hildenbrand
  2022-02-15  9:48                 ` Ani Sinha
  0 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2022-02-15  9:44 UTC (permalink / raw)
  To: Ani Sinha; +Cc: Igor Mammedov, QEMU Developers

On 15.02.22 10:40, Ani Sinha wrote:
> On Tue, Feb 15, 2022 at 2:08 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 15.02.22 09:12, Ani Sinha wrote:
>>> On Tue, Feb 15, 2022 at 1:25 PM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 15.02.22 08:00, Ani Sinha wrote:
>>>>>
>>>>>
>>>>> On Mon, 14 Feb 2022, David Hildenbrand wrote:
>>>>>
>>>>>> On 14.02.22 13:36, Igor Mammedov wrote:
>>>>>>> On Mon, 14 Feb 2022 10:54:22 +0530 (IST)
>>>>>>> Ani Sinha <ani@anisinha.ca> wrote:
>>>>>>>
>>>>>>>> Hi Igor:
>>>>>>>>
>>>>>>>> I failed to spawn a 9 Tib VM. The max I could do was a 2 TiB vm on my
>>>>>>>> system with the following commandline before either the system
>>>>>>>> destabilized or the OOM killed killed qemu
>>>>>>>>
>>>>>>>> -m 2T,maxmem=9T,slots=1 \
>>>>>>>> -object memory-backend-file,id=mem0,size=2T,mem-path=/data/temp/memfile,prealloc=off \
>>>>>>>> -machine memory-backend=mem0 \
>>>>>>>> -chardev file,path=/tmp/debugcon2.txt,id=debugcon \
>>>>>>>> -device isa-debugcon,iobase=0x402,chardev=debugcon \
>>>>>>>>
>>>>>>>> I have attached the debugcon output from 2 TiB vm.
>>>>>>>> Is there any other commandline parameters or options I should try?
>>>>>>>>
>>>>>>>> thanks
>>>>>>>> ani
>>>>>>>
>>>>>>> $ truncate -s 9T 9tb_sparse_disk.img
>>>>>>> $ qemu-system-x86_64 -m 9T \
>>>>>>>   -object memory-backend-file,id=mem0,size=9T,mem-path=9tb_sparse_disk.img,prealloc=off,share=on \
>>>>>>>   -machine memory-backend=mem0
>>>>>>>
>>>>>>> works for me till GRUB menu, with sufficient guest kernel
>>>>>>> persuasion (i.e. CLI limit ram size to something reasonable) you can boot linux
>>>>>>> guest on it and inspect SMBIOS tables comfortably.
>>>>>>>
>>>>>>>
>>>>>>> With KVM enabled it bails out with:
>>>>>>>    qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=1, start=0x100000000, size=0x8ff40000000: Invalid argument
>>>>>>>
>>>>>
>>>>> I have seen this in my system but not always. Maybe I should have dug
>>>>> deeper as to why i do see this all the time.
>>>>>
>>>>>>> all of that on a host with 32G of RAM/no swap.
>>>>>>>
>>>>>
>>>>> My system in 16 Gib of main memory, no swap.
>>>>>
>>>>>>
>>>>>> #define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1)
>>>>>>
>>>>>> ~8 TiB (7,999999)
>>>>>
>>>>> That's not 8 Tib, thats 2 GiB. But yes, 0x8ff40000000 is certainly greater
>>>>> than 2 Gib * 4K (assuming 4K size pages).
>>>>
>>>> "pages" don't carry the unit "GiB/TiB", so I was talking about the
>>>> actual size with 4k pages (your setup, I assume)
>>>
>>> yes I got that after reading your email again.
>>> The interesting question now is how is redhat QE running 9 TiB vm with kvm?
>>
>> As already indicated by me regarding s390x only having single large NUMA
>> nodes, x86 is usually using multiple NUMA nodes with such large memory.
>> And QE seems to be using virtual numa nodes:
>>
>> Each of the 32 virtual numa nodes receive a:
>>
>>   -object memory-backend-ram,id=ram-node20,size=309237645312,host-
>>    nodes=0-31,policy=bind
>>
>> which results in a dedicated KVM memslot (just like each DIMM would)
>>
>>
>> 32 * 309237645312 == 9 TiB :)
> 
> ah, I should have looked closely at the other commandlines before
> shooting off the email. Yes the limitation is per mem-slot and they
> have 32 slots one per node.
> ok so should we do
> kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES);
> from i386 kvm_arch_init()?


As I said, I'm not a friend of these workarounds in user space.

Assume you have one KVM memslot left and you hotplug a huge DIMM that
will consume more than one KVM memslot -- you're in trouble, because
hotplug will succeed but creating the second memslot will fail. So you
need additional logic in memory device code to special-case on these
corner cases.

We should try increasing the limit in KVM and handle it gracefully in
QEMU. But that's just my 2 cents.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 9 TiB vm memory creation
  2022-02-15  9:44               ` David Hildenbrand
@ 2022-02-15  9:48                 ` Ani Sinha
  2022-02-15  9:51                   ` David Hildenbrand
  0 siblings, 1 reply; 17+ messages in thread
From: Ani Sinha @ 2022-02-15  9:48 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Igor Mammedov, QEMU Developers

On Tue, Feb 15, 2022 at 3:14 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 15.02.22 10:40, Ani Sinha wrote:
> > On Tue, Feb 15, 2022 at 2:08 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 15.02.22 09:12, Ani Sinha wrote:
> >>> On Tue, Feb 15, 2022 at 1:25 PM David Hildenbrand <david@redhat.com> wrote:
> >>>>
> >>>> On 15.02.22 08:00, Ani Sinha wrote:
> >>>>>
> >>>>>
> >>>>> On Mon, 14 Feb 2022, David Hildenbrand wrote:
> >>>>>
> >>>>>> On 14.02.22 13:36, Igor Mammedov wrote:
> >>>>>>> On Mon, 14 Feb 2022 10:54:22 +0530 (IST)
> >>>>>>> Ani Sinha <ani@anisinha.ca> wrote:
> >>>>>>>
> >>>>>>>> Hi Igor:
> >>>>>>>>
> >>>>>>>> I failed to spawn a 9 Tib VM. The max I could do was a 2 TiB vm on my
> >>>>>>>> system with the following commandline before either the system
> >>>>>>>> destabilized or the OOM killed killed qemu
> >>>>>>>>
> >>>>>>>> -m 2T,maxmem=9T,slots=1 \
> >>>>>>>> -object memory-backend-file,id=mem0,size=2T,mem-path=/data/temp/memfile,prealloc=off \
> >>>>>>>> -machine memory-backend=mem0 \
> >>>>>>>> -chardev file,path=/tmp/debugcon2.txt,id=debugcon \
> >>>>>>>> -device isa-debugcon,iobase=0x402,chardev=debugcon \
> >>>>>>>>
> >>>>>>>> I have attached the debugcon output from 2 TiB vm.
> >>>>>>>> Is there any other commandline parameters or options I should try?
> >>>>>>>>
> >>>>>>>> thanks
> >>>>>>>> ani
> >>>>>>>
> >>>>>>> $ truncate -s 9T 9tb_sparse_disk.img
> >>>>>>> $ qemu-system-x86_64 -m 9T \
> >>>>>>>   -object memory-backend-file,id=mem0,size=9T,mem-path=9tb_sparse_disk.img,prealloc=off,share=on \
> >>>>>>>   -machine memory-backend=mem0
> >>>>>>>
> >>>>>>> works for me till GRUB menu, with sufficient guest kernel
> >>>>>>> persuasion (i.e. CLI limit ram size to something reasonable) you can boot linux
> >>>>>>> guest on it and inspect SMBIOS tables comfortably.
> >>>>>>>
> >>>>>>>
> >>>>>>> With KVM enabled it bails out with:
> >>>>>>>    qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=1, start=0x100000000, size=0x8ff40000000: Invalid argument
> >>>>>>>
> >>>>>
> >>>>> I have seen this in my system but not always. Maybe I should have dug
> >>>>> deeper as to why i do see this all the time.
> >>>>>
> >>>>>>> all of that on a host with 32G of RAM/no swap.
> >>>>>>>
> >>>>>
> >>>>> My system in 16 Gib of main memory, no swap.
> >>>>>
> >>>>>>
> >>>>>> #define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1)
> >>>>>>
> >>>>>> ~8 TiB (7,999999)
> >>>>>
> >>>>> That's not 8 Tib, thats 2 GiB. But yes, 0x8ff40000000 is certainly greater
> >>>>> than 2 Gib * 4K (assuming 4K size pages).
> >>>>
> >>>> "pages" don't carry the unit "GiB/TiB", so I was talking about the
> >>>> actual size with 4k pages (your setup, I assume)
> >>>
> >>> yes I got that after reading your email again.
> >>> The interesting question now is how is redhat QE running 9 TiB vm with kvm?
> >>
> >> As already indicated by me regarding s390x only having single large NUMA
> >> nodes, x86 is usually using multiple NUMA nodes with such large memory.
> >> And QE seems to be using virtual numa nodes:
> >>
> >> Each of the 32 virtual numa nodes receive a:
> >>
> >>   -object memory-backend-ram,id=ram-node20,size=309237645312,host-
> >>    nodes=0-31,policy=bind
> >>
> >> which results in a dedicated KVM memslot (just like each DIMM would)
> >>
> >>
> >> 32 * 309237645312 == 9 TiB :)
> >
> > ah, I should have looked closely at the other commandlines before
> > shooting off the email. Yes the limitation is per mem-slot and they
> > have 32 slots one per node.
> > ok so should we do
> > kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES);
> > from i386 kvm_arch_init()?
>
>
> As I said, I'm not a friend of these workarounds in user space.

Oh ok, did not realize you were against s390x like workarounds.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 9 TiB vm memory creation
  2022-02-15  9:48                 ` Ani Sinha
@ 2022-02-15  9:51                   ` David Hildenbrand
  0 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2022-02-15  9:51 UTC (permalink / raw)
  To: Ani Sinha; +Cc: Igor Mammedov, QEMU Developers

On 15.02.22 10:48, Ani Sinha wrote:
> On Tue, Feb 15, 2022 at 3:14 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 15.02.22 10:40, Ani Sinha wrote:
>>> On Tue, Feb 15, 2022 at 2:08 PM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 15.02.22 09:12, Ani Sinha wrote:
>>>>> On Tue, Feb 15, 2022 at 1:25 PM David Hildenbrand <david@redhat.com> wrote:
>>>>>>
>>>>>> On 15.02.22 08:00, Ani Sinha wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Mon, 14 Feb 2022, David Hildenbrand wrote:
>>>>>>>
>>>>>>>> On 14.02.22 13:36, Igor Mammedov wrote:
>>>>>>>>> On Mon, 14 Feb 2022 10:54:22 +0530 (IST)
>>>>>>>>> Ani Sinha <ani@anisinha.ca> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Igor:
>>>>>>>>>>
>>>>>>>>>> I failed to spawn a 9 Tib VM. The max I could do was a 2 TiB vm on my
>>>>>>>>>> system with the following commandline before either the system
>>>>>>>>>> destabilized or the OOM killed killed qemu
>>>>>>>>>>
>>>>>>>>>> -m 2T,maxmem=9T,slots=1 \
>>>>>>>>>> -object memory-backend-file,id=mem0,size=2T,mem-path=/data/temp/memfile,prealloc=off \
>>>>>>>>>> -machine memory-backend=mem0 \
>>>>>>>>>> -chardev file,path=/tmp/debugcon2.txt,id=debugcon \
>>>>>>>>>> -device isa-debugcon,iobase=0x402,chardev=debugcon \
>>>>>>>>>>
>>>>>>>>>> I have attached the debugcon output from 2 TiB vm.
>>>>>>>>>> Is there any other commandline parameters or options I should try?
>>>>>>>>>>
>>>>>>>>>> thanks
>>>>>>>>>> ani
>>>>>>>>>
>>>>>>>>> $ truncate -s 9T 9tb_sparse_disk.img
>>>>>>>>> $ qemu-system-x86_64 -m 9T \
>>>>>>>>>   -object memory-backend-file,id=mem0,size=9T,mem-path=9tb_sparse_disk.img,prealloc=off,share=on \
>>>>>>>>>   -machine memory-backend=mem0
>>>>>>>>>
>>>>>>>>> works for me till GRUB menu, with sufficient guest kernel
>>>>>>>>> persuasion (i.e. CLI limit ram size to something reasonable) you can boot linux
>>>>>>>>> guest on it and inspect SMBIOS tables comfortably.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> With KVM enabled it bails out with:
>>>>>>>>>    qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=1, start=0x100000000, size=0x8ff40000000: Invalid argument
>>>>>>>>>
>>>>>>>
>>>>>>> I have seen this in my system but not always. Maybe I should have dug
>>>>>>> deeper as to why i do see this all the time.
>>>>>>>
>>>>>>>>> all of that on a host with 32G of RAM/no swap.
>>>>>>>>>
>>>>>>>
>>>>>>> My system in 16 Gib of main memory, no swap.
>>>>>>>
>>>>>>>>
>>>>>>>> #define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1)
>>>>>>>>
>>>>>>>> ~8 TiB (7,999999)
>>>>>>>
>>>>>>> That's not 8 Tib, thats 2 GiB. But yes, 0x8ff40000000 is certainly greater
>>>>>>> than 2 Gib * 4K (assuming 4K size pages).
>>>>>>
>>>>>> "pages" don't carry the unit "GiB/TiB", so I was talking about the
>>>>>> actual size with 4k pages (your setup, I assume)
>>>>>
>>>>> yes I got that after reading your email again.
>>>>> The interesting question now is how is redhat QE running 9 TiB vm with kvm?
>>>>
>>>> As already indicated by me regarding s390x only having single large NUMA
>>>> nodes, x86 is usually using multiple NUMA nodes with such large memory.
>>>> And QE seems to be using virtual numa nodes:
>>>>
>>>> Each of the 32 virtual numa nodes receive a:
>>>>
>>>>   -object memory-backend-ram,id=ram-node20,size=309237645312,host-
>>>>    nodes=0-31,policy=bind
>>>>
>>>> which results in a dedicated KVM memslot (just like each DIMM would)
>>>>
>>>>
>>>> 32 * 309237645312 == 9 TiB :)
>>>
>>> ah, I should have looked closely at the other commandlines before
>>> shooting off the email. Yes the limitation is per mem-slot and they
>>> have 32 slots one per node.
>>> ok so should we do
>>> kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES);
>>> from i386 kvm_arch_init()?
>>
>>
>> As I said, I'm not a friend of these workarounds in user space.
> 
> Oh ok, did not realize you were against s390x like workarounds.
> 

s390x doesn't support DIMMs so it was "easy" for them to just do it that
way :)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 9 TiB vm memory creation
  2022-02-15  8:38           ` David Hildenbrand
  2022-02-15  9:40             ` Ani Sinha
@ 2022-02-15 10:44             ` Daniel P. Berrangé
  1 sibling, 0 replies; 17+ messages in thread
From: Daniel P. Berrangé @ 2022-02-15 10:44 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Ani Sinha, Igor Mammedov, QEMU Developers

On Tue, Feb 15, 2022 at 09:38:34AM +0100, David Hildenbrand wrote:
> On 15.02.22 09:12, Ani Sinha wrote:
> > On Tue, Feb 15, 2022 at 1:25 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 15.02.22 08:00, Ani Sinha wrote:
> >>>
> >>>
> >>> On Mon, 14 Feb 2022, David Hildenbrand wrote:
> >>>
> >>>> On 14.02.22 13:36, Igor Mammedov wrote:
> >>>>> On Mon, 14 Feb 2022 10:54:22 +0530 (IST)
> >>>>> Ani Sinha <ani@anisinha.ca> wrote:
> >>>>>
> >>>>>> Hi Igor:
> >>>>>>
> >>>>>> I failed to spawn a 9 Tib VM. The max I could do was a 2 TiB vm on my
> >>>>>> system with the following commandline before either the system
> >>>>>> destabilized or the OOM killed killed qemu
> >>>>>>
> >>>>>> -m 2T,maxmem=9T,slots=1 \
> >>>>>> -object memory-backend-file,id=mem0,size=2T,mem-path=/data/temp/memfile,prealloc=off \
> >>>>>> -machine memory-backend=mem0 \
> >>>>>> -chardev file,path=/tmp/debugcon2.txt,id=debugcon \
> >>>>>> -device isa-debugcon,iobase=0x402,chardev=debugcon \
> >>>>>>
> >>>>>> I have attached the debugcon output from 2 TiB vm.
> >>>>>> Is there any other commandline parameters or options I should try?
> >>>>>>
> >>>>>> thanks
> >>>>>> ani
> >>>>>
> >>>>> $ truncate -s 9T 9tb_sparse_disk.img
> >>>>> $ qemu-system-x86_64 -m 9T \
> >>>>>   -object memory-backend-file,id=mem0,size=9T,mem-path=9tb_sparse_disk.img,prealloc=off,share=on \
> >>>>>   -machine memory-backend=mem0
> >>>>>
> >>>>> works for me till GRUB menu, with sufficient guest kernel
> >>>>> persuasion (i.e. CLI limit ram size to something reasonable) you can boot linux
> >>>>> guest on it and inspect SMBIOS tables comfortably.
> >>>>>
> >>>>>
> >>>>> With KVM enabled it bails out with:
> >>>>>    qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=1, start=0x100000000, size=0x8ff40000000: Invalid argument
> >>>>>
> >>>
> >>> I have seen this in my system but not always. Maybe I should have dug
> >>> deeper as to why i do see this all the time.
> >>>
> >>>>> all of that on a host with 32G of RAM/no swap.
> >>>>>
> >>>
> >>> My system in 16 Gib of main memory, no swap.
> >>>
> >>>>
> >>>> #define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1)
> >>>>
> >>>> ~8 TiB (7,999999)
> >>>
> >>> That's not 8 Tib, thats 2 GiB. But yes, 0x8ff40000000 is certainly greater
> >>> than 2 Gib * 4K (assuming 4K size pages).
> >>
> >> "pages" don't carry the unit "GiB/TiB", so I was talking about the
> >> actual size with 4k pages (your setup, I assume)
> > 
> > yes I got that after reading your email again.
> > The interesting question now is how is redhat QE running 9 TiB vm with kvm?
> 
> As already indicated by me regarding s390x only having single large NUMA
> nodes, x86 is usually using multiple NUMA nodes with such large memory.

Yes, this is a documented requirement for KVM limits:

     https://access.redhat.com/articles/906543

   "3. Note that virtualized guests larger than 8 TB currently 
    require explicit virtual NUMA configuration, because the 
    maximum virtual NUMA node size is 8 TB."

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-02-15 10:46 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <alpine.DEB.2.22.394.2202141048390.13781@anisinha-lenovo>
2022-02-14 12:36 ` 9 TiB vm memory creation Igor Mammedov
2022-02-14 14:37   ` David Hildenbrand
2022-02-14 15:55     ` Igor Mammedov
2022-02-14 15:55       ` Igor Mammedov
2022-02-14 16:32       ` David Hildenbrand
2022-02-14 16:32         ` David Hildenbrand
2022-02-15  7:00     ` Ani Sinha
2022-02-15  7:11       ` Ani Sinha
2022-02-15  7:29       ` Ani Sinha
2022-02-15  7:55       ` David Hildenbrand
2022-02-15  8:12         ` Ani Sinha
2022-02-15  8:38           ` David Hildenbrand
2022-02-15  9:40             ` Ani Sinha
2022-02-15  9:44               ` David Hildenbrand
2022-02-15  9:48                 ` Ani Sinha
2022-02-15  9:51                   ` David Hildenbrand
2022-02-15 10:44             ` Daniel P. Berrangé

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.