From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751835AbdHBMt3 (ORCPT ); Wed, 2 Aug 2017 08:49:29 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:54166 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751180AbdHBMt1 (ORCPT ); Wed, 2 Aug 2017 08:49:27 -0400 Date: Wed, 2 Aug 2017 05:49:23 -0700 From: "Paul E. McKenney" To: Daniel Lezcano , john.stultz@linaro.org Cc: linux-kernel@vger.kernel.org, rostedt@goodmis.org Subject: Re: RCU stall when using function_graph Reply-To: paulmck@linux.vnet.ibm.com References: <20170801220405.GL3730@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170801220405.GL3730@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17080212-0024-0000-0000-000002B95BC1 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007471; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000216; SDB=6.00896479; UDB=6.00448472; IPR=6.00676667; BA=6.00005506; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00016496; XFM=3.00000015; UTC=2017-08-02 12:49:25 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17080212-0025-0000-0000-000044F624C4 Message-Id: <20170802124923.GA9213@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-08-02_07:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1706020000 definitions=main-1708020208 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 01, 2017 at 03:04:05PM -0700, Paul E. McKenney wrote: > > Hi Paul, > > > > I have been trying to set the function_graph tracer for ftrace and each time I > > get a CPU stall. > > > > How to reproduce: > > ----------------- > > > > echo function_graph > /sys/kernel/debug/tracing/current_tracer > > > > This error appears with v4.13-rc3 and v4.12-rc6. > > > > Is it something already reported ? > > I have seen this sort of thing, but only when actually dumping the trace > out, and I though those got fixed. You are seeing this just accumulating > the trace? > > These RCU CPU stall warnings usually occur when something grabs hold of > a CPU for too long, as in 21 seconds or so. One way that they can happen > is excessive lock contention, another is having the kernel run through > too much data at one shot. > > Adding Steven Rostedt on CC for his thoughts. And Peter Zijlstra suggested this in response to a similar splat: https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=locking/core&id=15377ef4fe0c86eb7fa1099575b2e86357d99e42 Thanx, Paul > > [ ... ] > > > > [ 472.934986] INFO: rcu_preempt detected stalls on CPUs/tasks: > > [ 472.941302] 0-...: (3 GPs behind) idle=d1a/2/0 softirq=2556/2557 fqs=3195 > > [ 472.948982] (detected by 3, t=6502 jiffies, g=549, c=548, q=10) > > [ 472.955180] Task dump for CPU 0: > > [ 472.958513] swapper/0 R running task 0 0 0 0x00000002 > > [ 550.954786] INFO: rcu_preempt detected stalls on CPUs/tasks: > > [ 550.960657] 0-...: (3 GPs behind) idle=d1a/2/0 softirq=2556/2557 fqs=12926 > > [ 550.967897] (detected by 3, t=26007 jiffies, g=549, c=548, q=27) > > [ 550.974181] Task dump for CPU 0: > > [ 550.977513] swapper/0 R running task 0 0 0 0x00000002 > > [ 628.974788] INFO: rcu_preempt detected stalls on CPUs/tasks: > > [ 628.980659] 0-...: (3 GPs behind) idle=d1a/2/0 softirq=2556/2557 fqs=22666 > > [ 628.987900] (detected by 3, t=45512 jiffies, g=549, c=548, q=43) > > [ 628.994184] Task dump for CPU 0: > > [ 628.997518] swapper/0 R running task 0 0 0 0x00000002 > > [ 706.994787] INFO: rcu_preempt detected stalls on CPUs/tasks: > > [ 707.000658] 0-...: (3 GPs behind) idle=d1a/2/0 softirq=2556/2557 fqs=31785 > > [ 707.007899] (detected by 3, t=65017 jiffies, g=549, c=548, q=74) > > [ 707.014183] Task dump for CPU 0: > > [ 707.017516] swapper/0 R running task 0 0 0 0x00000002 > > [ 707.024902] rcu_preempt kthread starved for 1261 jiffies! g549 c548 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1 > > [ 707.034548] rcu_preempt S 0 8 2 0x00000000 > > [ 726.049583] INFO: task jbd2/mmcblk1p2-:1120 blocked for more than 120 seconds. > > [ 726.061772] Not tainted 4.13.0-rc3-ktest-linaro-hikey #202 > > [ 726.074294] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [ 726.089027] jbd2/mmcblk1p2- D 0 1120 2 0x00000000 > > [ 726.126737] INFO: task rs:main Q:Reg:2272 blocked for more than 120 seconds. > > [ 726.140484] Not tainted 4.13.0-rc3-ktest-linaro-hikey #202 > > [ 726.154515] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [ 726.170140] rs:main Q:Reg D 0 2272 1 0x00000000 > > [ 726.205626] > > [ 726.205626] Showing all locks held in the system: > > [ 726.272952] > > [ 726.278194] ============================================= > > [ 726.278194] > > [ 785.014779] INFO: rcu_preempt detected stalls on CPUs/tasks: > > [ 785.020658] 0-...: (3 GPs behind) idle=d1a/2/0 softirq=2556/2557 fqs=41387 > > [ 785.027905] (detected by 3, t=84524 jiffies, g=549, c=548, q=94) > > [ 785.034194] Task dump for CPU 0: > > [ 785.037532] swapper/0 R running task 0 0 0 0x00000002 > > [ 846.879753] INFO: task jbd2/mmcblk1p2-:1120 blocked for more than 120 seconds. > > [ 846.892966] Not tainted 4.13.0-rc3-ktest-linaro-hikey #202 > > [ 846.909597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [ 846.926726] jbd2/mmcblk1p2- D 0 1120 2 0x00000000 > > [ 846.981356] INFO: task rs:main Q:Reg:2272 blocked for more than 120 seconds. > > [ 847.001593] Not tainted 4.13.0-rc3-ktest-linaro-hikey #202 > > [ 847.017642] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [ 847.034135] rs:main Q:Reg D 0 2272 1 0x00000000 > > [ 847.077209] > > [ 847.077209] Showing all locks held in the system: > > [ 847.180736] > > [ 847.186753] ============================================= > > [ 847.186753] > > [ 863.034778] INFO: rcu_preempt detected stalls on CPUs/tasks: > > [ 863.040660] 0-...: (3 GPs behind) idle=d1a/2/0 softirq=2556/2557 fqs=50881 > > [ 863.047907] (detected by 4, t=104030 jiffies, g=549, c=548, q=114) > > [ 863.054373] Task dump for CPU 0: > > [ 863.057711] swapper/0 R running task 0 0 0 0x00000002 > > [ 941.054780] INFO: rcu_preempt detected stalls on CPUs/tasks: > > [ 941.060659] 0-...: (3 GPs behind) idle=d1a/2/0 softirq=2556/2557 fqs=59726 > > [ 941.067903] (detected by 7, t=123535 jiffies, g=549, c=548, q=303) > > [ 941.074370] Task dump for CPU 0: > > [ 941.077708] swapper/0 R running task 0 0 0 0x00000002 > > [ 967.711682] INFO: task jbd2/mmcblk1p2-:1120 blocked for more than 120 seconds. > > [ 967.725731] Not tainted 4.13.0-rc3-ktest-linaro-hikey #202 > > [ 967.743347] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [ 967.765658] jbd2/mmcblk1p2- D 0 1120 2 0x00000000 > > [ 967.810595] INFO: task rs:main Q:Reg:2272 blocked for more than 120 seconds. > > [ 967.825360] Not tainted 4.13.0-rc3-ktest-linaro-hikey #202 > > [ 967.838595] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [ 967.854460] rs:main Q:Reg D 0 2272 1 0x00000000 > > [ 967.897804] > > [ 967.897804] Showing all locks held in the system: > > [ 967.997067] > > [ 968.002570] ============================================= > > > > > > [ ... ] > > > > etc ...