From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gordan Bobic Subject: Re: Bug: Limitation of <=2GB RAM in domU persists with 4.3.0 Date: Thu, 25 Jul 2013 23:23:09 +0100 Message-ID: <51F1A54D.6070906@bobich.net> References: <51EF04D8.1090600@bobich.net> <20130724140813.GH2518@phenom.dumpdata.com> <2aa84a31b7b17c2ea6d8483a281ad3f5@mail.shatteredsilicon.net> <20130724160639.GB5804@phenom.dumpdata.com> <8426aecf79e7f55c21bbe259014591a2@mail.shatteredsilicon.net> <20130724163102.GA6308@phenom.dumpdata.com> <51F051F1.5050806@bobich.net> <51F19D11.1090200@bobich.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <51F19D11.1090200@bobich.net> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: George Dunlap Cc: Andrew Cooper , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On 07/25/2013 10:48 PM, Gordan Bobic wrote: > On 07/25/2013 08:18 PM, George Dunlap wrote: >> On Wed, Jul 24, 2013 at 11:15 PM, Gordan Bobic wrote: >>> Attached are the logs (loglvl=all) and configs for 2GB (working) and 8GB >>> (screen corruption + domU crash + sometimes dom0 crashing with it). >>> >>> I can see in the xl-dmesg log in 8GB case that there is memory remapping >>> going on to allow for the lowmem MMIO hole, but it doesn't seem to help. >> >> There's a possibility that it's actually got nothing to do with >> relocation, but with bugs in your hardware. > > That wouldn't surprise me at all, unfortunately. :( > >> Can you try: >> * Set the guest memory to 3600 >> * Boot the guest, and check to make sure that xl dmesg shows does >> *not* relocate memory? >> * Report whether it crashes? > > xl dmesg from booting a Linux domU with 3600MB is attached. > The crash is never immediate, both Linux and Windows boot fine. But when > a large 3D application like a game loads, there is frame buffer > corruption immediately visible, and the domU will typically lock up some > seconds later. Infrequently, it will take the host down with it. > >> If it's a bug in the hardware, I would expect to see that memory was >> not relocated, but that the system will lock up anyway. > > That is indeed what seems to happen - the memory map looks OK with no > overlaps between PCI memory and ROM ranges and the usable or reserved > e820 regions. > >> Can you also do lspci -vvv in dom0 before assigning the device and >> attach the output? > > I have attached it, but not before assigning - I'll need to reboot for > that. Do you expect there to be a difference in mapping in dom0 before > and after assigning the device to domU? > >> The hardware bug we've seen is this: In order for the IOMMU to work >> properly, *all* DMA transactions must be passed up to the root bridge >> so the IOMMU can translate the addresses from guest address to host >> address. Unfortunately, an awful lot of bridges will not do this >> properly, which means that the address is not translated properly, >> which means that if a *guest* memory address overlaps the a *host* >> MMIO range, badness ensues. > > Hmm, looking at xl dmesg vs dom0 lspci, that does appear to be the case: > > xl dmesg: > (XEN) HVM24: E820 table: > (XEN) HVM24: [00]: 00000000:00000000 - 00000000:0009e000: RAM > (XEN) HVM24: [01]: 00000000:0009e000 - 00000000:000a0000: RESERVED > (XEN) HVM24: HOLE: 00000000:000a0000 - 00000000:000e0000 > (XEN) HVM24: [02]: 00000000:000e0000 - 00000000:00100000: RESERVED > (XEN) HVM24: [03]: 00000000:00100000 - 00000000:e0000000: RAM > (XEN) HVM24: HOLE: 00000000:e0000000 - 00000000:fc000000 > (XEN) HVM24: [04]: 00000000:fc000000 - 00000001:00000000: RESERVED > (XEN) HVM24: [05]: 00000001:00000000 - 00000001:00800000: RAM > > lspci: > 08:00.0 VGA compatible controller: nVidia Corporation GF100 > Region 0: Memory at f8000000 (32-bit, non-prefetchable) > [disabled] [size=32M] > Region 1: Memory at b8000000 (64-bit, prefetchable) [disabled] > [size=128M] > Region 3: Memory at b4000000 (64-bit, prefetchable) [disabled] > [size=64M] > > Unless I'm reading this wrong, it means that physical GPU region 0 is in > the domU reserved area, and GPU regions 1 and 2 and in the domU RAM area. > > b4000000 = 2880MB Correction - my other GPU has a BAR mapped lower, at 0xa8000000 which is 2688MB. So I upped my memory mapping to 2688MB, and lo and behold, that doesn't crash and games work just fine without frame buffer getting corrupted. Now, if I am understanding the basic nature of the problem correctly, this _could_ be worked around by ensuring that vBAR = pBAR since in that case there is no room for the mis-mapped memory overwrites to occur. Is that correct? I guess I could test this easily enough by applying the vBAR = pBAR hack. Gordan