From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sander Eikelenboom Subject: Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set Date: Thu, 6 Sep 2012 00:52:20 +0200 Message-ID: <679529047.20120906005220@eikelenboom.it> References: <1136369816.20120904183757@eikelenboom.it> <20120904163347.GH23361@phenom.dumpdata.com> <143844933.20120904191941@eikelenboom.it> <1813712325.20120904213459@eikelenboom.it> <048EAD622912254A9DEA24C1734613C18C864C3C5D@FTLPMAILBOX02.citrite.net> <20120905140600.GA5844@phenom.dumpdata.com> <1014998302.20120905163848@eikelenboom.it> <20120905201933.GA27814@phenom.dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120905201933.GA27814@phenom.dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: Robert Phillips , Ben Guthro , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org Wednesday, September 5, 2012, 10:19:33 PM, you wrote: > On Wed, Sep 05, 2012 at 04:38:48PM +0200, Sander Eikelenboom wrote: >> >> Wednesday, September 5, 2012, 4:06:01 PM, you wrote: >> >> > On Tue, Sep 04, 2012 at 04:27:20PM -0400, Robert Phillips wrote: >> >> Ben, >> >> >> >> You have asked me to provide the rationale behind the gnttab_old_mfn patch, which you emailed to Sander earlier today. >> >> Here are my findings. >> >> >> >> I found that xen_blkbk_map() in drivers/block/xen-blkback/blkback.c has changed from our previous version. It now calls gnttab_map_refs() in drivers/xen/grant-table.c. >> >> >> >> That function first calls HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, ... ) and then calls m2p_add_override() in p2m.c >> >> > And HYPERVISOR_grant_table_op .. would populate map_ops[i].bus_addr with the machine address.. >> >> >> which is where I made my change. >> >> >> >> The unpatched code was saving the pfn's old mfn in kmap_op->dev_bus_addr. >> >> >> >> kmap_op is of type struct gnttab_map_grant_ref. That data type is used to record grant table mappings so later they can be unmapped correctly. >> >> > Right, but the blkback makes a distinction by passing NULL as kmap_op, which means it should >> > use the old mechanism. Meaning that once the hypercall is done, the map_ops[i].bus_addr is not >> > used anymore.. >> >> >> >> >> The problem with saving the old mfn in kmap_op->dev_bus_addr is that it is later overwritten by __gnttab_map_grant_ref() in xen/common/grant_table.c >> >> > Uh, so the problem of saving the old mfn in dev_bus_addr has been there for a long long time then? >> > Even before this patch set? >> >> >> >> Since the storage holding the old mfn got overwritten, the unmapping was being done incorrectly. The balloon code detected that and bugged at drivers/xen/balloon.c:359 >> >> >> >> > Hmm, I believe the storage for holding the old mfn was/is page->index. >> >> >> >> My patch simply adds another member called old_mfn to struct gnttab_map_grant_ref rather than trying to overload dev_bus_addr. >> >> >> >> I don't know if Sander's bug is the same or related. The BUG_ON at drivers/xen/balloon.c:359 is quite general. It simply asserts that we are not trying to re-map a valid mapping. >> >> > Right. Somehow he ends up with valid mappings where there should be none. And lots of them. >> >> It's something between kernel v3.4.1 and v3.5.3, haven't had time to narrow it down yet. >> Any suggestions for specific commits i could try to quickly bisect this one ? > These are the ones that went in: > ea61fc0 xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back. > b9e0d95 xen: mark local pages as FOREIGN in the m2p_override > 6878c32 xen/blkfront: Add WARN to deal with misbehaving backends. > 5e62625 xen/setup: filter APERFMPERF cpuid feature out > 8c9ce60 xen/blkback: Copy id field when doing BLKIF_DISCARD. > 58b7b53 xen/balloon: Subtract from xen_released_pages the count that is populated. > 780dbcd xen/pci: Check for PCI bridge before using it. > 5e152e6 xen/events: Add WARN_ON when quick lookup found invalid type. > 5842f57 xen/hvc: Check HVM_PARAM_CONSOLE_[EVTCHN|PFN] for correctness. > a32c88b xen/hvc: Fix error cases around HVM_PARAM_CONSOLE_PFN > 2e5ad6b xen/hvc: Collapse error logic. > 7664810 xen: do not disable netfront in dom0 > 68c2c39 xen: do not map the same GSI twice in PVHVM guests. > 201a52b hvc_xen: NULL dereference on allocation failure > d79d595 xen: Add selfballoning memory reservation tunable. > d2fb4c5 xenbus: Add support for xenbus backend in stub domain > 2f1bd67 xen/smp: unbind irqworkX when unplugging vCPUs. > 87e4baa x86/xen/apic: Add missing #include > 323f90a xen-acpi-processor: Add missing #include > 8605067 xen-blkfront: module exit handling adjustments > e77c78c xen-blkfront: properly name all devices > f62805f xen: enter/exit lazy_mmu_mode around m2p_override calls > 211063d xen/acpi/sleep: Enable ACPI sleep via the __acpi_os_prepare_sleep > 1ff2b0c xen: implement IRQ_WORK_VECTOR handler > f447d56 xen: implement apic ipi interface > 83d51ab xen/setup: update VA mapping when releasing memory during setup > 96dc08b xen/setup: Combine the two hypercall functions - since they are quite similar. > 2e2fb75 xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM > ca11823 xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0 > 9438ef7 x86/apic: Fix UP boot crash > ab6ec39 xen/apic: implement io apic read with hypercall > 27abd14 Revert "xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries'" > 31b3c9d xen/x86: Implement x86_apic_ops > 4a8e2a3 x86/apic: Replace io_apic_ops with x86_io_apic_ops. > 977f857 PCI: move mutex locking out of pci_dev_reset function > 569ca5b xen/gnttab: add deferred freeing logic > 9fe2a70 debugfs: Add support to print u32 array in debugfs > 940713b xen/p2m: An early bootup variant of set_phys_to_machine > d509685 xen/p2m: Collapse early_alloc_p2m_middle redundant checks. > cef4cca xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument > 3f3aaea xen/p2m: Move code around to allow for better re-usage. > Narrowing this down (so ignore APIC bootup, drivers, etc) these could be it: > b9e0d95 xen: mark local pages as FOREIGN in the m2p_override > 58b7b53 xen/balloon: Subtract from xen_released_pages the count that is populated. > d79d595 xen: Add selfballoning memory reservation tunable. > f62805f xen: enter/exit lazy_mmu_mode around m2p_override calls > 83d51ab xen/setup: update VA mapping when releasing memory during setup > 96dc08b xen/setup: Combine the two hypercall functions - since they are quite similar. > 2e2fb75 xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM > ca11823 xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0 > 940713b xen/p2m: An early bootup variant of set_phys_to_machine > d509685 xen/p2m: Collapse early_alloc_p2m_middle redundant checks. > cef4cca xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument > 3f3aaea xen/p2m: Move code around to allow for better re-usage. > About nine of them deal with dom0_mem=max ballooning up right, so if you > ignore those: > b9e0d95 xen: mark local pages as FOREIGN in the m2p_override > d79d595 xen: Add selfballoning memory reservation tunable. > f62805f xen: enter/exit lazy_mmu_mode around m2p_override calls > Try reverting any of those. Ah i missed your email since my hostingprovider was down :-( But anyway done a git bisect in the mean time that leads to: [f62805f1f30a40e354bd036b4cb799863a39be4b] xen: enter/exit lazy_mmu_mode around m2p_override calls > And if nothing works there then we can try to revert the ones that > deal with 'dom0_mem=max:XX'.. > I also need to be able to reproduce this. You said you can only reproduce this > on your Intel box - is this a fast Intel machine? It also looks like you only > have 2GB in the machine - and reserve 1GB to the dom0. Machine is a quad core q9400 @ 2.66mhz, not very fast .. not very slow either > If you manually (so don't start the guest), balloon down - say to 512MB and then launch > a guest do you see this problem? Should i use xl mem-max domain-id mem or xl mem-set domain-id mem for that ? Perhaps a silly question, but why is it ballooning anyway ? I have set dom0's memory and there is enough left to create the domain ... or at least there should be ... -- Sander