From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52778) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Uz9iu-0000xS-8b for qemu-devel@nongnu.org; Tue, 16 Jul 2013 14:11:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Uz9is-0003OU-Js for qemu-devel@nongnu.org; Tue, 16 Jul 2013 14:11:40 -0400 Date: Tue, 16 Jul 2013 15:11:36 -0300 From: Eduardo Habkost Message-ID: <20130716181136.GD19826@otherpad.lan.raisama.net> References: <1373995321-2470-1-git-send-email-aarcange@redhat.com> <20130716173844.GC19826@otherpad.lan.raisama.net> <51E586E6.1060001@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51E586E6.1060001@redhat.com> Subject: Re: [Qemu-devel] [PATCH] fix guest physical bits to match host, to go beyond 1TB guests List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Andrea Arcangeli , qemu-devel@nongnu.org, Gleb Natapov , qemu-stable@nongnu.org On Tue, Jul 16, 2013 at 07:46:14PM +0200, Paolo Bonzini wrote: > Il 16/07/2013 19:38, Eduardo Habkost ha scritto: > > On Tue, Jul 16, 2013 at 07:22:01PM +0200, Andrea Arcangeli wrote: > >> Without this patch the guest physical bits are advertised as 40, not > >> 44 or more depending on the hardware capability of the host. > >> > >> That leads to guest kernel crashes with injection of page faults 9 > >> (see oops: 0009) as bits above 40 in the guest pagetables are > >> considered reserved. > >> > >> exregion-0206 [324572448] [17] ex_system_memory_space: System-Memory (width 32) R/W 0 Address=00000000FED00000 > >> BUG: unable to handle kernel paging request at ffffc9006030e000 > >> IP: [] acpi_ex_system_memory_space_handler+0x23e/0x2cb > >> PGD e01f875067 PUD 1001f075067 PMD e0178d8067 PTE 80000000fed00173 > >> Oops: 0009 [#1] SMP > >> > >> (see PUD with bit >=40 set) > > > > I am not sure I understand what caused this: if we are advertising 40 > > physical bits to the guest, why are we ending up with a PUD with > > bit >= 40 set? > > Because we create a guest that has bigger memory than what we advertise > in CPUID. That means we never supported guests that large, doesn't it? Why is it a qemu-stable candidate, then? > > >> > >> Signed-off-by: Andrea Arcangeli > >> Reported-by: Chegu Vinod > >> --- > >> target-i386/cpu.c | 6 ++++++ > >> 1 file changed, 6 insertions(+) > >> > >> diff --git a/target-i386/cpu.c b/target-i386/cpu.c > >> index e3f75a8..0e65673 100644 > >> --- a/target-i386/cpu.c > >> +++ b/target-i386/cpu.c > >> @@ -2108,6 +2108,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, > >> /* 64 bit processor */ > >> /* XXX: The physical address space is limited to 42 bits in exec.c. */ > >> *eax = 0x00003028; /* 48 bits virtual, 40 bits physical */ > >> + if (kvm_enabled()) { > >> + uint32_t _eax; > >> + host_cpuid(0x80000000, 0, &_eax, NULL, NULL, NULL); > >> + if (_eax >= 0x80000008) > >> + host_cpuid(0x80000008, 0, eax, NULL, NULL, NULL); > >> + } > > > > We can't expose a different virtual machine depending on host > > capabilities. What if we live-migrate between hosts with different > > physical address bit sizes? > > Same as for vPMU or leaf 0xD: who knows. In practice, this has an > effect only for guests with 1T or more memory, otherwise the physical > memory is smaller than 40 bits. Leafs 0xA and 0xD are buggy, and we need to disable that "passthrough mode" behavior by default, because it breaks live-migration. If we want to expose new features to the guest we need to make those features configurable, not expose them automatically depending on host capabilities. (I was aware of the problem on vPMU/0xA, but I didn't notice 0xD was broken was well. Thanks for noting.) For physical bit size, what about extending it in a backwards-compatible way? Something like this: *eax = 0x0003000; /* 48 bits virtual */ if (ram_size < 1TB) { physical_size = 40; /* Keeping backwards compatibility */ } else if (ram_size < 4TB) { physical_size = 42; } else { abort(); } if (supported_host_physical_size() < physical_size) { abort(); } *eax |= physical_size; (Of course, the abort() calls should be replaced with proper error reporting) > > Paolo > > >> } else { > >> if (env->features[FEAT_1_EDX] & CPUID_PSE36) { > >> *eax = 0x00000024; /* 36 bits physical */ > >> > > > > -- Eduardo