From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christoffer Dall Subject: Re: [ARM] Bash often segfaults in Dom0 with the latest Xen Date: Wed, 5 Jun 2013 07:30:23 -0700 Message-ID: References: <51AE6DFD.3060308@linaro.org> <51AF25A2.5010207@linaro.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <51AF25A2.5010207@linaro.org> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Julien Grall Cc: Andre Przywara , Ian Campbell , Stefano Stabellini , xen-devel List-Id: xen-devel@lists.xenproject.org On 5 June 2013 04:48, Julien Grall wrote: > On 06/05/2013 02:38 AM, Christoffer Dall wrote: > >> On 4 June 2013 15:45, Julien Grall wrote: >>> Hi all, >>> >>> Since a couple of week, I'm tracking an issue with Xen on ARM with no luck. >>> >>> I'm run out of idea, so I send this email to have advice from the community. >>> >>> Most of the time bash will abort with random error in dom0: >>> - page fault (data and prefetch abort) >>> - memory corruption (malloc corruption and invalid pointer) >>> >>> It's easily to reproduce by doing ./configure on the xen tree. >>> >>> My environment is an arndale board: >>> - linux linaro 13.05 (using arndale_xen_dom0_defconfig and exynos5250_arndale.dts) >>> - opensuse 12.03 (http://en.opensuse.org/HCL:Arndale) >>> - xen upstream >>> >>> The linux tree can be retrieved from git://xenbits.xen.org/people/julieng/linux-arm.git >>> using the branch linaro-3.10. >>> The previous branch is based on the linaro tree with some patches for the dts and xen. >>> >>> The issue also occurs on the versatile express. But it's harder to reproduce. >>> Here the environment is: >>> - linux linaro 13.05 (using vexpress_xen_dom0_defconfig and vexpress_v2p_ca15_a7.dtb) >>> - ubuntu linaro 13.05 >>> - xen upstream >>> >>> I have tried different distributions and linux version, the issue was the same. >>> I made some testing to narrow down the bug and I came to the following test case: >>> >>> Only dom0 is running and each VCPUs are pinned to a specific cpu >>> (vcpu0 -> cpu0 and vcpu1 -> cpu1). >>> >>> The patch below removes WFI trap and by consequence avoid a VCPU to move to >>> another physical CPU. >>> ========================================= >>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c >>> index 6cfba1a..e89ca15 100644 >>> --- a/xen/arch/arm/traps.c >>> +++ b/xen/arch/arm/traps.c >>> @@ -62,7 +62,7 @@ void __cpuinit init_traps(void) >>> WRITE_SYSREG((vaddr_t)hyp_traps_vector, VBAR_EL2); >>> >>> /* Setup hypervisor traps */ >>> - WRITE_SYSREG(HCR_PTW|HCR_BSU_OUTER|HCR_AMO|HCR_IMO|HCR_VM|HCR_TWI|HCR_TSC, HCR_EL2); >>> + WRITE_SYSREG(HCR_PTW|HCR_BSU_OUTER|HCR_AMO|HCR_IMO|HCR_VM|HCR_TSC, HCR_EL2); >>> isb(); >>> } >>> >>> ========================================= >>> >>> If a bash process is assigned to a specific cpu with taskset, the process seems >>> to always run without any issue. >>> >>> taskset -c 0 ./configure >>> >>> I guess it's a caching issue, but each time I've tried to play with the caching >>> policy Linux was not booting. >>> >>> Thanks in advance for any advice. >> >> Some thoughts: >> >> - Does dom0 run with Stage-2 translation? If so, you should be able >> to disable caches in both Hyp mode and for dom0 by manipulating the >> hyp registers to try and exclude caches. If Linux doesn't boot under >> such configuration, something else is completely broken, as it must be >> transparent to your dom0. >> >> - Are you doing any swapping and/or page reclaiming? I wouldn't >> assume so for dom0, but if you are, you need to maintain the icache >> properly, since it can be aliasing, see >> http://lxr.linux.no/linux+v3.9.4/arch/arm/kvm/mmu.c#L495 (I doubt this >> is the case though) >> >> - All other cache accesses should be coherent across cores and are >> physically indexed/physically tagged so I don't see how this could be >> your issue. > > It was only an idea because I have noticed the memory was often corrupted. > >> - Do you always see the crash in user space or kernel space in dom0 or >> is it all over the map? > > > Only in user space in dom0. > Hmm, which kernel version is dom0 based on? Can you bisect the dom0 source to make sure it's not something introduced during development. You have this in your tree right: "9d1f5c ARM: 7641/1: memory: fix broken mmap..." ? -Christoffer