From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:40864)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <JBeulich@suse.com>) id 1Umi1h-0005Mw-8e
	for qemu-devel@nongnu.org; Wed, 12 Jun 2013 06:11:41 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <JBeulich@suse.com>) id 1Umi1e-0000gM-F3
	for qemu-devel@nongnu.org; Wed, 12 Jun 2013 06:11:37 -0400
Received: from nat28.tlf.novell.com ([130.57.49.28]:56462)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <JBeulich@suse.com>) id 1Umi1d-0000g4-Rb
	for qemu-devel@nongnu.org; Wed, 12 Jun 2013 06:11:34 -0400
Message-Id: <51B8657302000078000DD7FD@nat28.tlf.novell.com>
Date: Wed, 12 Jun 2013 11:11:31 +0100
From: "Jan Beulich" <JBeulich@suse.com>
References: <EE92950F97EE42469CA4F508D4691F5E016FAD15@SHSMSX104.ccr.corp.intel.com>
	<alpine.DEB.2.02.1306071246270.4589@kaball.uk.xensource.com>
	<51B1FF50.90406@eu.citrix.com>
	<alpine.DEB.2.02.1306071655060.4589@kaball.uk.xensource.com>
	<403610A45A2B5242BD291EDAE8B37D3010E56731@SHSMSX102.ccr.corp.intel.com>
	<CAFLBxZZfH8im-hTrma29Ag7CUR1HZEm=4b7ft_h5weukGL1BzQ@mail.gmail.com>
	<alpine.DEB.2.02.1306111735590.4548@kaball.uk.xensource.com>
	<51B83E7A02000078000DD6E9@nat28.tlf.novell.com>
	<51B847E3.5010604@eu.citrix.com>
In-Reply-To: <51B847E3.5010604@eu.citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Subject: Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable
 device with memory more than 3600M
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Tim Deegan <tim@xen.org>, Yongjie Ren <yongjie.ren@intel.com>, yanqiangjun@huawei.com, Keir Fraser <keir@xen.org>, Ian Campbell <Ian.Campbell@citrix.com>, hanweidong@huawei.com, Xudong Hao <xudong.hao@intel.com>, Stefano Stabellini <stefano.stabellini@eu.citrix.com>, luonengjun@huawei.com, qemu-devel@nongnu.org, wangzhenguo@huawei.com, xiaowei.yang@huawei.com, arei.gonglei@huawei.com, Paolo Bonzini <pbonzini@redhat.com>, YongweiX Xu <yongweix.xu@intel.com>, SongtaoX Liu <songtaox.liu@intel.com>, "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>

>>> On 12.06.13 at 12:05, George Dunlap <george.dunlap@eu.citrix.com> =
wrote:
> On 12/06/13 08:25, Jan Beulich wrote:
>>>>> On 11.06.13 at 19:26, Stefano Stabellini <stefano.stabellini@eu.citri=
x.com>=20
> wrote:
>>> I went through the code that maps the PCI MMIO regions in hvmloader
>>> (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it =
already
>>> maps the PCI region to high memory if the PCI bar is 64-bit and the =
MMIO
>>> region is larger than 512MB.
>>>
>>> Maybe we could just relax this condition and map the device memory to
>>> high memory no matter the size of the MMIO region if the PCI bar is
>>> 64-bit?
>> I can only recommend not to: For one, guests not using PAE or
>> PSE-36 can't map such space at all (and older OSes may not
>> properly deal with 64-bit BARs at all). And then one would generally
>> expect this allocation to be done top down (to minimize risk of
>> running into RAM), and doing so is going to present further risks of
>> incompatibilities with guest OSes (Linux for example learned only in
>> 2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
>> 3.10-rc5 ioremap_pte_range(), while using "u64 pfn", passes the
>> PFN to pfn_pte(), the respective parameter of which is
>> "unsigned long").
>>
>> I think this ought to be done in an iterative process - if all MMIO
>> regions together don't fit below 4G, the biggest one should be
>> moved up beyond 4G first, followed by the next to biggest one
>> etc.
>=20
> First of all, the proposal to move the PCI BAR up to the 64-bit range =
is=20
> a temporary work-around.  It should only be done if a device doesn't =
fit=20
> in the current MMIO range.
>=20
> We have three options here:
> 1. Don't do anything
> 2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they=20
> don't fit
> 3. Convince qemu to allow MMIO regions to mask memory (or what it =
thinks=20
> is memory).
> 4. Add a mechanism to tell qemu that memory is being relocated.
>=20
> Number 4 is definitely the right answer long-term, but we just don't=20
> have time to do that before the 4.3 release.  We're not sure yet if =
#3=20
> is possible; even if it is, it may have unpredictable knock-on effects.
>=20
> Doing #2, it is true that many guests will be unable to access the=20
> device because of 32-bit limitations.  However, in #1, *no* guests =
will=20
> be able to access the device.  At least in #2, *many* guests will be=20
> able to do so.  In any case, apparently #2 is what KVM does, so =
having=20
> the limitation on guests is not without precedent.  It's also likely =
to=20
> be a somewhat tested configuration (unlike #3, for example).

That's all fine with me. My objection was to Stefano's consideration
to assign high addresses to _all_ 64-bit capable BARs up, not just
the biggest one(s).

Jan

From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Jan Beulich" <JBeulich@suse.com>
Subject: Re: [Xen-devel] [BUG 1747]Guest could't find bootable
 device with memory more than 3600M
Date: Wed, 12 Jun 2013 11:11:31 +0100
Message-ID: <51B8657302000078000DD7FD@nat28.tlf.novell.com>
References: <EE92950F97EE42469CA4F508D4691F5E016FAD15@SHSMSX104.ccr.corp.intel.com>
	<alpine.DEB.2.02.1306071246270.4589@kaball.uk.xensource.com>
	<51B1FF50.90406@eu.citrix.com>
	<alpine.DEB.2.02.1306071655060.4589@kaball.uk.xensource.com>
	<403610A45A2B5242BD291EDAE8B37D3010E56731@SHSMSX102.ccr.corp.intel.com>
	<CAFLBxZZfH8im-hTrma29Ag7CUR1HZEm=4b7ft_h5weukGL1BzQ@mail.gmail.com>
	<alpine.DEB.2.02.1306111735590.4548@kaball.uk.xensource.com>
	<51B83E7A02000078000DD6E9@nat28.tlf.novell.com>
	<51B847E3.5010604@eu.citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
Return-path: <qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org>
In-Reply-To: <51B847E3.5010604@eu.citrix.com>
Content-Disposition: inline
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org
Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org
To: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Tim Deegan <tim@xen.org>, Yongjie Ren <yongjie.ren@intel.com>, yanqiangjun@huawei.com, Keir Fraser <keir@xen.org>, Ian Campbell <Ian.Campbell@citrix.com>, hanweidong@huawei.com, Xudong Hao <xudong.hao@intel.com>, Stefano Stabellini <stefano.stabellini@eu.citrix.com>, luonengjun@huawei.com, qemu-devel@nongnu.org, wangzhenguo@huawei.com, xiaowei.yang@huawei.com, arei.gonglei@huawei.com, Paolo Bonzini <pbonzini@redhat.com>, YongweiX Xu <yongweix.xu@intel.com>, SongtaoX Liu <songtaox.liu@intel.com>, "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
List-Id: xen-devel@lists.xenproject.org

>>> On 12.06.13 at 12:05, George Dunlap <george.dunlap@eu.citrix.com> =
wrote:
> On 12/06/13 08:25, Jan Beulich wrote:
>>>>> On 11.06.13 at 19:26, Stefano Stabellini <stefano.stabellini@eu.citri=
x.com>=20
> wrote:
>>> I went through the code that maps the PCI MMIO regions in hvmloader
>>> (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it =
already
>>> maps the PCI region to high memory if the PCI bar is 64-bit and the =
MMIO
>>> region is larger than 512MB.
>>>
>>> Maybe we could just relax this condition and map the device memory to
>>> high memory no matter the size of the MMIO region if the PCI bar is
>>> 64-bit?
>> I can only recommend not to: For one, guests not using PAE or
>> PSE-36 can't map such space at all (and older OSes may not
>> properly deal with 64-bit BARs at all). And then one would generally
>> expect this allocation to be done top down (to minimize risk of
>> running into RAM), and doing so is going to present further risks of
>> incompatibilities with guest OSes (Linux for example learned only in
>> 2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
>> 3.10-rc5 ioremap_pte_range(), while using "u64 pfn", passes the
>> PFN to pfn_pte(), the respective parameter of which is
>> "unsigned long").
>>
>> I think this ought to be done in an iterative process - if all MMIO
>> regions together don't fit below 4G, the biggest one should be
>> moved up beyond 4G first, followed by the next to biggest one
>> etc.
>=20
> First of all, the proposal to move the PCI BAR up to the 64-bit range =
is=20
> a temporary work-around.  It should only be done if a device doesn't =
fit=20
> in the current MMIO range.
>=20
> We have three options here:
> 1. Don't do anything
> 2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they=20
> don't fit
> 3. Convince qemu to allow MMIO regions to mask memory (or what it =
thinks=20
> is memory).
> 4. Add a mechanism to tell qemu that memory is being relocated.
>=20
> Number 4 is definitely the right answer long-term, but we just don't=20
> have time to do that before the 4.3 release.  We're not sure yet if =
#3=20
> is possible; even if it is, it may have unpredictable knock-on effects.
>=20
> Doing #2, it is true that many guests will be unable to access the=20
> device because of 32-bit limitations.  However, in #1, *no* guests =
will=20
> be able to access the device.  At least in #2, *many* guests will be=20
> able to do so.  In any case, apparently #2 is what KVM does, so =
having=20
> the limitation on guests is not without precedent.  It's also likely =
to=20
> be a somewhat tested configuration (unlike #3, for example).

That's all fine with me. My objection was to Stefano's consideration
to assign high addresses to _all_ 64-bit capable BARs up, not just
the biggest one(s).

Jan