From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:58912)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1UzMnT-00020g-Sg
	for qemu-devel@nongnu.org; Wed, 17 Jul 2013 04:09:17 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1UzMnS-00072J-8d
	for qemu-devel@nongnu.org; Wed, 17 Jul 2013 04:09:15 -0400
Message-ID: <51E6511D.8070606@redhat.com>
Date: Wed, 17 Jul 2013 10:09:01 +0200
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <1373995321-2470-1-git-send-email-aarcange@redhat.com>
	<20130716173844.GC19826@otherpad.lan.raisama.net>
	<51E586E6.1060001@redhat.com>
	<20130716181136.GD19826@otherpad.lan.raisama.net>
	<51E59DEE.5030603@redhat.com>
	<20130716194238.GG11420@otherpad.lan.raisama.net>
In-Reply-To: <20130716194238.GG11420@otherpad.lan.raisama.net>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH] fix guest physical bits to match host,
 to go beyond 1TB guests
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Eduardo Habkost <ehabkost@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>, qemu-devel@nongnu.org, Gleb Natapov <gleb@redhat.com>, qemu-stable@nongnu.org

Il 16/07/2013 21:42, Eduardo Habkost ha scritto:
> On Tue, Jul 16, 2013 at 09:24:30PM +0200, Paolo Bonzini wrote:
>> Il 16/07/2013 20:11, Eduardo Habkost ha scritto:
>>> For physical bit size, what about extending it in a backwards-compatible
>>> way? Something like this:
>>>
>>>     *eax = 0x0003000; /* 48 bits virtual */
>>>     if (ram_size < 1TB) {
>>>         physical_size = 40; /* Keeping backwards compatibility */
>>>     } else if (ram_size < 4TB) {
>>>         physical_size = 42;
>>
>> Why not go straight up to 44?
> 
> I simply trusted the comment saying: "The physical address space is
> limited to 42 bits in exec.c", and assumed we had a 42-bit limit
> somewhere else.

Yeah, that's obsolete.  We now can go up to 64 (but actually only
support 52 because that's what Intel says will be the limit----4PB RAM
should be good for everyone, as Bill Gates used to say).

So far Intel has been upgrading the physical RAM size in x16 steps
(MAXPHYADDR was 36, then 40, then 44).  MAXPHYADDR is how Intel calls
what you wrote as physical_size.

>     if (ram_size < 1TB) {
>         physical_size = 40; /* Keeping backwards compatibility */
>     } else {
>         physical_size = msb(ram_size);
>     }
>     if (supported_host_physical_size() < physical_size) {
>         abort();
>     }

Not enough because there are processors with 36.  So perhaps, putting
together both of your ideas:

     if (supported_host_physical_size() < msb(ram_size)) {
         abort();
     }
     if (ram_size < 64GB && !some_compat_prop) {
         physical_size = 36;
     } else if (ram_size < 1TB) {
         physical_size = 40;
     } else {
         physical_size = 44;
     }

What do you think?

>> This makes sense too.  Though the best would be of course to use CPUID
>> values coming from the real processors, and only using 40 for backwards
>> compatibility.
> 
> We can't use the values coming from the real processors directly, or
> we will break live migration.

I said real processors, not host processors. :)

So a Core 2 has MAXPHYADDR=36, Nehalem has IIRC 40, Sandy Bridge has 44,
and so on.

> If we sent those CPUID bits as part of the migration stream, it would
> make it a little safer, but then it would be impossible for libvirt to
> tell if it is really possible to migrate from one host to another.

The libvirt problem still remains, since libvirt currently doesn't know
the MAXPHYADDRs and would have to learn them.

I guess the above "artificial" computation of MAXPHYADDR is enough.

Paolo