From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:39299)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <George.Dunlap@eu.citrix.com>) id 1Umhw5-0002z0-AN
	for qemu-devel@nongnu.org; Wed, 12 Jun 2013 06:05:54 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <George.Dunlap@eu.citrix.com>) id 1Umhvx-0007K9-0T
	for qemu-devel@nongnu.org; Wed, 12 Jun 2013 06:05:49 -0400
Received: from smtp.citrix.com ([66.165.176.89]:57721)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <George.Dunlap@eu.citrix.com>) id 1Umhvw-0007Jn-S0
	for qemu-devel@nongnu.org; Wed, 12 Jun 2013 06:05:40 -0400
Message-ID: <51B847E3.5010604@eu.citrix.com>
Date: Wed, 12 Jun 2013 11:05:23 +0100
From: George Dunlap <george.dunlap@eu.citrix.com>
MIME-Version: 1.0
References: <EE92950F97EE42469CA4F508D4691F5E016FAD15@SHSMSX104.ccr.corp.intel.com>
	<alpine.DEB.2.02.1306071246270.4589@kaball.uk.xensource.com>
	<51B1FF50.90406@eu.citrix.com>
	<alpine.DEB.2.02.1306071655060.4589@kaball.uk.xensource.com>
	<403610A45A2B5242BD291EDAE8B37D3010E56731@SHSMSX102.ccr.corp.intel.com>
	<CAFLBxZZfH8im-hTrma29Ag7CUR1HZEm=4b7ft_h5weukGL1BzQ@mail.gmail.com>
	<alpine.DEB.2.02.1306111735590.4548@kaball.uk.xensource.com>
	<51B83E7A02000078000DD6E9@nat28.tlf.novell.com>
In-Reply-To: <51B83E7A02000078000DD6E9@nat28.tlf.novell.com>
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable
 device with memory more than 3600M
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Jan Beulich <JBeulich@suse.com>
Cc: Tim Deegan <tim@xen.org>, Yongjie Ren <yongjie.ren@intel.com>, yanqiangjun@huawei.com, Keir Fraser <keir@xen.org>, Ian Campbell <Ian.Campbell@citrix.com>, hanweidong@huawei.com, Xudong Hao <xudong.hao@intel.com>, Stefano Stabellini <stefano.stabellini@eu.citrix.com>, luonengjun@huawei.com, qemu-devel@nongnu.org, wangzhenguo@huawei.com, xiaowei.yang@huawei.com, arei.gonglei@huawei.com, Paolo Bonzini <pbonzini@redhat.com>, YongweiX Xu <yongweix.xu@intel.com>, SongtaoX Liu <songtaox.liu@intel.com>, "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>

On 12/06/13 08:25, Jan Beulich wrote:
>>>> On 11.06.13 at 19:26, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:
>> I went through the code that maps the PCI MMIO regions in hvmloader
>> (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already
>> maps the PCI region to high memory if the PCI bar is 64-bit and the MMIO
>> region is larger than 512MB.
>>
>> Maybe we could just relax this condition and map the device memory to
>> high memory no matter the size of the MMIO region if the PCI bar is
>> 64-bit?
> I can only recommend not to: For one, guests not using PAE or
> PSE-36 can't map such space at all (and older OSes may not
> properly deal with 64-bit BARs at all). And then one would generally
> expect this allocation to be done top down (to minimize risk of
> running into RAM), and doing so is going to present further risks of
> incompatibilities with guest OSes (Linux for example learned only in
> 2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
> 3.10-rc5 ioremap_pte_range(), while using "u64 pfn", passes the
> PFN to pfn_pte(), the respective parameter of which is
> "unsigned long").
>
> I think this ought to be done in an iterative process - if all MMIO
> regions together don't fit below 4G, the biggest one should be
> moved up beyond 4G first, followed by the next to biggest one
> etc.

First of all, the proposal to move the PCI BAR up to the 64-bit range is 
a temporary work-around.  It should only be done if a device doesn't fit 
in the current MMIO range.

We have three options here:
1. Don't do anything
2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they 
don't fit
3. Convince qemu to allow MMIO regions to mask memory (or what it thinks 
is memory).
4. Add a mechanism to tell qemu that memory is being relocated.

Number 4 is definitely the right answer long-term, but we just don't 
have time to do that before the 4.3 release.  We're not sure yet if #3 
is possible; even if it is, it may have unpredictable knock-on effects.

Doing #2, it is true that many guests will be unable to access the 
device because of 32-bit limitations.  However, in #1, *no* guests will 
be able to access the device.  At least in #2, *many* guests will be 
able to do so.  In any case, apparently #2 is what KVM does, so having 
the limitation on guests is not without precedent.  It's also likely to 
be a somewhat tested configuration (unlike #3, for example).

  -George

From mboxrd@z Thu Jan  1 00:00:00 1970
From: George Dunlap <george.dunlap@eu.citrix.com>
Subject: Re: [Xen-devel] [BUG 1747]Guest could't find bootable
 device with memory more than 3600M
Date: Wed, 12 Jun 2013 11:05:23 +0100
Message-ID: <51B847E3.5010604@eu.citrix.com>
References: <EE92950F97EE42469CA4F508D4691F5E016FAD15@SHSMSX104.ccr.corp.intel.com>
	<alpine.DEB.2.02.1306071246270.4589@kaball.uk.xensource.com>
	<51B1FF50.90406@eu.citrix.com>
	<alpine.DEB.2.02.1306071655060.4589@kaball.uk.xensource.com>
	<403610A45A2B5242BD291EDAE8B37D3010E56731@SHSMSX102.ccr.corp.intel.com>
	<CAFLBxZZfH8im-hTrma29Ag7CUR1HZEm=4b7ft_h5weukGL1BzQ@mail.gmail.com>
	<alpine.DEB.2.02.1306111735590.4548@kaball.uk.xensource.com>
	<51B83E7A02000078000DD6E9@nat28.tlf.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit
Return-path: <qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org>
In-Reply-To: <51B83E7A02000078000DD6E9@nat28.tlf.novell.com>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org
Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org
To: Jan Beulich <JBeulich@suse.com>
Cc: Tim Deegan <tim@xen.org>, Yongjie Ren <yongjie.ren@intel.com>, yanqiangjun@huawei.com, Keir Fraser <keir@xen.org>, Ian Campbell <Ian.Campbell@citrix.com>, hanweidong@huawei.com, Xudong Hao <xudong.hao@intel.com>, Stefano Stabellini <stefano.stabellini@eu.citrix.com>, luonengjun@huawei.com, qemu-devel@nongnu.org, wangzhenguo@huawei.com, xiaowei.yang@huawei.com, arei.gonglei@huawei.com, Paolo Bonzini <pbonzini@redhat.com>, YongweiX Xu <yongweix.xu@intel.com>, SongtaoX Liu <songtaox.liu@intel.com>, "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
List-Id: xen-devel@lists.xenproject.org

On 12/06/13 08:25, Jan Beulich wrote:
>>>> On 11.06.13 at 19:26, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:
>> I went through the code that maps the PCI MMIO regions in hvmloader
>> (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already
>> maps the PCI region to high memory if the PCI bar is 64-bit and the MMIO
>> region is larger than 512MB.
>>
>> Maybe we could just relax this condition and map the device memory to
>> high memory no matter the size of the MMIO region if the PCI bar is
>> 64-bit?
> I can only recommend not to: For one, guests not using PAE or
> PSE-36 can't map such space at all (and older OSes may not
> properly deal with 64-bit BARs at all). And then one would generally
> expect this allocation to be done top down (to minimize risk of
> running into RAM), and doing so is going to present further risks of
> incompatibilities with guest OSes (Linux for example learned only in
> 2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
> 3.10-rc5 ioremap_pte_range(), while using "u64 pfn", passes the
> PFN to pfn_pte(), the respective parameter of which is
> "unsigned long").
>
> I think this ought to be done in an iterative process - if all MMIO
> regions together don't fit below 4G, the biggest one should be
> moved up beyond 4G first, followed by the next to biggest one
> etc.

First of all, the proposal to move the PCI BAR up to the 64-bit range is 
a temporary work-around.  It should only be done if a device doesn't fit 
in the current MMIO range.

We have three options here:
1. Don't do anything
2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they 
don't fit
3. Convince qemu to allow MMIO regions to mask memory (or what it thinks 
is memory).
4. Add a mechanism to tell qemu that memory is being relocated.

Number 4 is definitely the right answer long-term, but we just don't 
have time to do that before the 4.3 release.  We're not sure yet if #3 
is possible; even if it is, it may have unpredictable knock-on effects.

Doing #2, it is true that many guests will be unable to access the 
device because of 32-bit limitations.  However, in #1, *no* guests will 
be able to access the device.  At least in #2, *many* guests will be 
able to do so.  In any case, apparently #2 is what KVM does, so having 
the limitation on guests is not without precedent.  It's also likely to 
be a somewhat tested configuration (unlike #3, for example).

  -George