From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754407AbcBBMX4 (ORCPT ); Tue, 2 Feb 2016 07:23:56 -0500 Received: from foss.arm.com ([217.140.101.70]:55257 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754314AbcBBMXy (ORCPT ); Tue, 2 Feb 2016 07:23:54 -0500 Date: Tue, 2 Feb 2016 12:23:18 +0000 From: Mark Rutland To: Laura Abbott Cc: Laura Abbott , Catalin Marinas , Will Deacon , Ard Biesheuvel , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCHv2 2/3] arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC Message-ID: <20160202122317.GA32305@leverpostej> References: <1454111218-3461-1-git-send-email-labbott@fedoraproject.org> <1454111218-3461-3-git-send-email-labbott@fedoraproject.org> <20160201122911.GF674@leverpostej> <56AFCD09.8000807@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <56AFCD09.8000807@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 01, 2016 at 01:24:25PM -0800, Laura Abbott wrote: > On 02/01/2016 04:29 AM, Mark Rutland wrote: > >Hi, > > > >On Fri, Jan 29, 2016 at 03:46:57PM -0800, Laura Abbott wrote: > >> > >>ARCH_SUPPORTS_DEBUG_PAGEALLOC provides a hook to map and unmap > >>pages for debugging purposes. This requires memory be mapped > >>with PAGE_SIZE mappings since breaking down larger mappings > >>at runtime will lead to TLB conflicts. Check if debug_pagealloc > >>is enabled at runtime and if so, map everyting with PAGE_SIZE > >>pages. Implement the functions to actually map/unmap the > >>pages at runtime. > >> > >> > >>Signed-off-by: Laura Abbott > > > >I tried to apply atop of the arm64 for-next/pgtable branch, but git > >wasn't very happy about that -- which branch/patches is this based on? > > > >I'm not sure if I'm missing something, have something I shouldn't, or if > >my MTA is corrupting patches again... > > > > Hmmm, I based it off of your arm64-pagetable-rework-20160125 tag and > Ard's patch for vmalloc and set_memory_* . The patches seem to apply > on the for-next/pgtable branch as well so I'm guessing you are missing > Ard's patch. Yup, that was it. I evidently was paying far too little attention as I'd also missed the mm/ patch for the !CONFIG_DEBUG_PAGEALLOC case. Is there anything else in mm/ that I've potentially missed? I'm seeing a hang on Juno just after reaching userspace (splat below) with debug_pagealloc=on. It looks like something's gone wrong around find_vmap_area -- at least one CPU is forever awaiting vmap_area_lock, and presumably some other CPU has held it and gone into the weeds, leading to the RCU stalls and NMI lockup warnings. [ 31.037054] INFO: rcu_preempt detected stalls on CPUs/tasks: [ 31.042684] 0-...: (1 ticks this GP) idle=999/140000000000001/0 softirq=738/738 fqs=418 [ 31.050795] (detected by 1, t=5255 jiffies, g=340, c=339, q=50) [ 31.056935] rcu_preempt kthread starved for 4838 jiffies! g340 c339 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x0 [ 36.509055] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [kworker/2:2H:995] [ 36.521059] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [systemd-udevd:1048] [ 36.533056] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [systemd-udevd:1037] [ 36.545055] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [systemd-udevd:1036] [ 56.497055] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [upstart-file-br:1012] [ 94.057052] INFO: rcu_preempt detected stalls on CPUs/tasks: [ 94.062671] 0-...: (1 ticks this GP) idle=999/140000000000001/0 softirq=738/738 fqs=418 [ 94.070780] (detected by 1, t=21010 jiffies, g=340, c=339, q=50) [ 94.076981] rcu_preempt kthread starved for 20593 jiffies! g340 c339 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0 [ 157.077052] INFO: rcu_preempt detected stalls on CPUs/tasks: [ 157.082673] 0-...: (1 ticks this GP) idle=999/140000000000001/0 softirq=738/738 fqs=418 [ 157.090782] (detected by 2, t=36765 jiffies, g=340, c=339, q=50) [ 157.096986] rcu_preempt kthread starved for 36348 jiffies! g340 c339 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0 [ 220.097052] INFO: rcu_preempt detected stalls on CPUs/tasks: [ 220.102670] 0-...: (1 ticks this GP) idle=999/140000000000001/0 softirq=738/738 fqs=418 [ 220.110779] (detected by 2, t=52520 jiffies, g=340, c=339, q=50) [ 220.116971] rcu_preempt kthread starved for 52103 jiffies! g340 c339 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0 [ 283.117052] INFO: rcu_preempt detected stalls on CPUs/tasks: [ 283.122670] 0-...: (1 ticks this GP) idle=999/140000000000001/0 softirq=738/738 fqs=418 [ 283.130779] (detected by 1, t=68275 jiffies, g=340, c=339, q=50) [ 283.136973] rcu_preempt kthread starved for 67858 jiffies! g340 c339 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0 Typically show-backtrace-all-active-cpus(l) gives me something like: [ 183.282835] CPU: 0 PID: 998 Comm: systemd-udevd Tainted: G L 4.5.0-rc1+ #7 [ 183.290783] Hardware name: ARM Juno development board (r0) (DT) [ 183.296659] task: ffffffc97437a400 ti: ffffffc973ec8000 task.ti: ffffffc973ec8000 [ 183.304095] PC is at _raw_spin_lock+0x34/0x48 [ 183.308421] LR is at find_vmap_area+0x24/0xa0 [ 183.312746] pc : [] lr : [] pstate: 60000145 [ 183.320092] sp : ffffffc973ecb6c0 [ 183.323382] x29: ffffffc973ecb6c0 x28: ffffffbde7d50300 [ 183.328662] x27: ffffffffffffffff x26: ffffffbde7d50300 [ 183.333941] x25: 000000097e513000 x24: 0000000000000001 [ 183.339219] x23: 0000000000000000 x22: 0000000000000001 [ 183.344498] x21: ffffffc000a6dd90 x20: ffffffc000a6d000 [ 183.349778] x19: ffffffc97540c000 x18: 0000007fc4e8b960 [ 183.355057] x17: 0000007fac3088d4 x16: ffffffc0001be448 [ 183.360336] x15: 003b9aca00000000 x14: 0032aa26d4000000 [ 183.365614] x13: ffffffffa94f64df x12: 0000000000000018 [ 183.370894] x11: ffffffc97eecd730 x10: 0000000000000030 [ 183.376173] x9 : ffffffbde7d50340 x8 : ffffffc0008556a0 [ 183.381451] x7 : ffffffc0008556b8 x6 : ffffffc0008556d0 [ 183.386729] x5 : ffffffc0009d2000 x4 : 0000000000000001 [ 183.392008] x3 : 000000000000d033 x2 : 000000000000000b [ 183.397286] x1 : 00000000d038d033 x0 : ffffffc000a6dd90 [ 183.402563] I'll have a go with lock debugging. Otherwise do you have any ideas? Thanks, Mark.