From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40864) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Umi1h-0005Mw-8e for qemu-devel@nongnu.org; Wed, 12 Jun 2013 06:11:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Umi1e-0000gM-F3 for qemu-devel@nongnu.org; Wed, 12 Jun 2013 06:11:37 -0400 Received: from nat28.tlf.novell.com ([130.57.49.28]:56462) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Umi1d-0000g4-Rb for qemu-devel@nongnu.org; Wed, 12 Jun 2013 06:11:34 -0400 Message-Id: <51B8657302000078000DD7FD@nat28.tlf.novell.com> Date: Wed, 12 Jun 2013 11:11:31 +0100 From: "Jan Beulich" References: <51B1FF50.90406@eu.citrix.com> <403610A45A2B5242BD291EDAE8B37D3010E56731@SHSMSX102.ccr.corp.intel.com> <51B83E7A02000078000DD6E9@nat28.tlf.novell.com> <51B847E3.5010604@eu.citrix.com> In-Reply-To: <51B847E3.5010604@eu.citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Subject: Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: George Dunlap Cc: Tim Deegan , Yongjie Ren , yanqiangjun@huawei.com, Keir Fraser , Ian Campbell , hanweidong@huawei.com, Xudong Hao , Stefano Stabellini , luonengjun@huawei.com, qemu-devel@nongnu.org, wangzhenguo@huawei.com, xiaowei.yang@huawei.com, arei.gonglei@huawei.com, Paolo Bonzini , YongweiX Xu , SongtaoX Liu , "xen-devel@lists.xensource.com" >>> On 12.06.13 at 12:05, George Dunlap = wrote: > On 12/06/13 08:25, Jan Beulich wrote: >>>>> On 11.06.13 at 19:26, Stefano Stabellini =20 > wrote: >>> I went through the code that maps the PCI MMIO regions in hvmloader >>> (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it = already >>> maps the PCI region to high memory if the PCI bar is 64-bit and the = MMIO >>> region is larger than 512MB. >>> >>> Maybe we could just relax this condition and map the device memory to >>> high memory no matter the size of the MMIO region if the PCI bar is >>> 64-bit? >> I can only recommend not to: For one, guests not using PAE or >> PSE-36 can't map such space at all (and older OSes may not >> properly deal with 64-bit BARs at all). And then one would generally >> expect this allocation to be done top down (to minimize risk of >> running into RAM), and doing so is going to present further risks of >> incompatibilities with guest OSes (Linux for example learned only in >> 2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in >> 3.10-rc5 ioremap_pte_range(), while using "u64 pfn", passes the >> PFN to pfn_pte(), the respective parameter of which is >> "unsigned long"). >> >> I think this ought to be done in an iterative process - if all MMIO >> regions together don't fit below 4G, the biggest one should be >> moved up beyond 4G first, followed by the next to biggest one >> etc. >=20 > First of all, the proposal to move the PCI BAR up to the 64-bit range = is=20 > a temporary work-around. It should only be done if a device doesn't = fit=20 > in the current MMIO range. >=20 > We have three options here: > 1. Don't do anything > 2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they=20 > don't fit > 3. Convince qemu to allow MMIO regions to mask memory (or what it = thinks=20 > is memory). > 4. Add a mechanism to tell qemu that memory is being relocated. >=20 > Number 4 is definitely the right answer long-term, but we just don't=20 > have time to do that before the 4.3 release. We're not sure yet if = #3=20 > is possible; even if it is, it may have unpredictable knock-on effects. >=20 > Doing #2, it is true that many guests will be unable to access the=20 > device because of 32-bit limitations. However, in #1, *no* guests = will=20 > be able to access the device. At least in #2, *many* guests will be=20 > able to do so. In any case, apparently #2 is what KVM does, so = having=20 > the limitation on guests is not without precedent. It's also likely = to=20 > be a somewhat tested configuration (unlike #3, for example). That's all fine with me. My objection was to Stefano's consideration to assign high addresses to _all_ 64-bit capable BARs up, not just the biggest one(s). Jan From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jan Beulich" Subject: Re: [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M Date: Wed, 12 Jun 2013 11:11:31 +0100 Message-ID: <51B8657302000078000DD7FD@nat28.tlf.novell.com> References: <51B1FF50.90406@eu.citrix.com> <403610A45A2B5242BD291EDAE8B37D3010E56731@SHSMSX102.ccr.corp.intel.com> <51B83E7A02000078000DD6E9@nat28.tlf.novell.com> <51B847E3.5010604@eu.citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <51B847E3.5010604@eu.citrix.com> Content-Disposition: inline List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org To: George Dunlap Cc: Tim Deegan , Yongjie Ren , yanqiangjun@huawei.com, Keir Fraser , Ian Campbell , hanweidong@huawei.com, Xudong Hao , Stefano Stabellini , luonengjun@huawei.com, qemu-devel@nongnu.org, wangzhenguo@huawei.com, xiaowei.yang@huawei.com, arei.gonglei@huawei.com, Paolo Bonzini , YongweiX Xu , SongtaoX Liu , "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org >>> On 12.06.13 at 12:05, George Dunlap = wrote: > On 12/06/13 08:25, Jan Beulich wrote: >>>>> On 11.06.13 at 19:26, Stefano Stabellini =20 > wrote: >>> I went through the code that maps the PCI MMIO regions in hvmloader >>> (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it = already >>> maps the PCI region to high memory if the PCI bar is 64-bit and the = MMIO >>> region is larger than 512MB. >>> >>> Maybe we could just relax this condition and map the device memory to >>> high memory no matter the size of the MMIO region if the PCI bar is >>> 64-bit? >> I can only recommend not to: For one, guests not using PAE or >> PSE-36 can't map such space at all (and older OSes may not >> properly deal with 64-bit BARs at all). And then one would generally >> expect this allocation to be done top down (to minimize risk of >> running into RAM), and doing so is going to present further risks of >> incompatibilities with guest OSes (Linux for example learned only in >> 2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in >> 3.10-rc5 ioremap_pte_range(), while using "u64 pfn", passes the >> PFN to pfn_pte(), the respective parameter of which is >> "unsigned long"). >> >> I think this ought to be done in an iterative process - if all MMIO >> regions together don't fit below 4G, the biggest one should be >> moved up beyond 4G first, followed by the next to biggest one >> etc. >=20 > First of all, the proposal to move the PCI BAR up to the 64-bit range = is=20 > a temporary work-around. It should only be done if a device doesn't = fit=20 > in the current MMIO range. >=20 > We have three options here: > 1. Don't do anything > 2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they=20 > don't fit > 3. Convince qemu to allow MMIO regions to mask memory (or what it = thinks=20 > is memory). > 4. Add a mechanism to tell qemu that memory is being relocated. >=20 > Number 4 is definitely the right answer long-term, but we just don't=20 > have time to do that before the 4.3 release. We're not sure yet if = #3=20 > is possible; even if it is, it may have unpredictable knock-on effects. >=20 > Doing #2, it is true that many guests will be unable to access the=20 > device because of 32-bit limitations. However, in #1, *no* guests = will=20 > be able to access the device. At least in #2, *many* guests will be=20 > able to do so. In any case, apparently #2 is what KVM does, so = having=20 > the limitation on guests is not without precedent. It's also likely = to=20 > be a somewhat tested configuration (unlike #3, for example). That's all fine with me. My objection was to Stefano's consideration to assign high addresses to _all_ 64-bit capable BARs up, not just the biggest one(s). Jan