From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752307AbdK2WHH (ORCPT ); Wed, 29 Nov 2017 17:07:07 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:56152 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751883AbdK2WHF (ORCPT ); Wed, 29 Nov 2017 17:07:05 -0500 Date: Wed, 29 Nov 2017 14:07:03 -0800 From: "Paul E. McKenney" To: Thomas Gleixner Cc: kernel test robot , LKML , lkp@01.org Subject: Re: [lkp-robot] [torture] b151f93a71: INFO:rcu_preempt_detected_stalls_on_CPUs/tasks Reply-To: paulmck@linux.vnet.ibm.com References: <20171127215735.GP3624@linux.vnet.ibm.com> <20171128164144.GS3624@linux.vnet.ibm.com> <20171128170759.GU3624@linux.vnet.ibm.com> <20171128173554.GV3624@linux.vnet.ibm.com> <20171128204619.GA16740@linux.vnet.ibm.com> <20171128210810.GA22726@linux.vnet.ibm.com> <20171129190819.GA18159@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171129190819.GA18159@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17112922-0040-0000-0000-000003CA28E9 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008124; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000242; SDB=6.00953085; UDB=6.00481537; IPR=6.00733164; BA=6.00005721; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00018258; XFM=3.00000015; UTC=2017-11-29 22:07:04 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17112922-0041-0000-0000-000007BF573A Message-Id: <20171129220703.GA12908@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-11-29_08:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1711290286 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 29, 2017 at 11:08:19AM -0800, Paul E. McKenney wrote: > On Tue, Nov 28, 2017 at 01:08:10PM -0800, Paul E. McKenney wrote: > > On Tue, Nov 28, 2017 at 12:46:19PM -0800, Paul E. McKenney wrote: > > > On Tue, Nov 28, 2017 at 09:35:54AM -0800, Paul E. McKenney wrote: > > > > On Tue, Nov 28, 2017 at 06:10:08PM +0100, Thomas Gleixner wrote: > > > > > On Tue, 28 Nov 2017, Paul E. McKenney wrote: > > > > > > On Tue, Nov 28, 2017 at 05:47:35PM +0100, Thomas Gleixner wrote: > > > > > > diff --git a/kernel/time/timer.c b/kernel/time/timer.c > > > > > > index db774b0f217e..a3321bb565db 100644 > > > > > > --- a/kernel/time/timer.c > > > > > > +++ b/kernel/time/timer.c > > > > > > @@ -1803,7 +1803,7 @@ signed long __sched schedule_timeout(signed long timeout) > > > > > > idx = timer_get_idx(&timer.timer); > > > > > > idx_now = calc_wheel_index(j, base->clk); > > > > > > raw_spin_unlock_irqrestore(&base->lock, flags); > > > > > > - pr_info("%s: Waylayed timer idx: %u idx_now: %u\n", __func__, idx, idx_now); > > > > > > + pr_info("%s: Waylayed timer base->clk: %#lx jiffies: %#lx base->next_expiry: %#lx timer->flags: %#x timer->expires %#lx idx: %u idx_now: %u\n", __func__, base->clk, j, base->next_expiry, timer.timer.flags, timer.timer.expires, idx, idx_now); > > > > > > > > > > Please print idx and idx_now as hex values. It's simpler to decode that way. > > > > > > > > Here you go! Starting tests at this end, focusing on TREE01 and TREE04. > > > > BTW, TREE04 doesn't do any CPU hotplug, providing a counterexample to > > > > my long-held assumption that this only happened in the presence of CPU > > > > hotplug operations. > > > > > > And here is output with changes discussed on IRC. TREE04 managed to > > > have not one but two overlapping RCU CPU stall warnings, one for RCU-bh > > > and the second for RCU-sched. TREE04 and TREE04. HZ=1000. > > > > And here is the full patch, in all its lack of aesthetic appeal. > > And here is the list of waylaid timers from last night's testing. The big > pile of them from TREE01 at the end is due to wakeups from kthread_stop(), > I am guessing. The TREE04 run only had two of them, but they seem reliable > enough that I just might be able to bisect. I will try that. And it converged to 5c4991e24c69 ("sched/isolation: Split out new CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL"), which is a bit hard to believe. Please see below for the log. I will be retesting some of the allegedly good commits, just in case. > Did your setup reproduce the problem? Thanx, Paul ------------------------------------------------------------------------ # bad: [4fbd8d194f06c8a3fd2af1ce560ddb31f7ec8323] Linux 4.15-rc1 # good: [bebc6082da0a9f5d47a1ea2edc099bf671058bd4] Linux 4.14 git bisect start 'v4.15-rc1' 'v4.14' # bad: [1be2172e96e33bfa22a5c7a651f768ef30ce3984] Merge tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux git bisect bad 1be2172e96e33bfa22a5c7a651f768ef30ce3984 # bad: [2cd83ba5bede2f72cc6c79a19a1bddf576b50e88] Merge tag 'iommu-v4.15-rc1' of git://github.com/awilliam/linux-vfio git bisect bad 2cd83ba5bede2f72cc6c79a19a1bddf576b50e88 # bad: [449fcf3ab0baf3dde9952385e6789f2ca10c3980] Merge tag 'staging-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging git bisect bad 449fcf3ab0baf3dde9952385e6789f2ca10c3980 # bad: [43ff2f4db9d0f76452b77cfa645f02b471143b24] Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect bad 43ff2f4db9d0f76452b77cfa645f02b471143b24 # good: [f08d8bcc12de5a153e587027e77de83662eefb8a] Merge tag 'please-pull-gettime_vsyscall_update' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux git bisect good f08d8bcc12de5a153e587027e77de83662eefb8a # good: [31486372a1e9a66ec2e9e2903b8792bba7e503e1] Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good 31486372a1e9a66ec2e9e2903b8792bba7e503e1 # good: [9275b933d409d3a4efa08102ca813557b93fb0b9] resource: Fix resource_size.cocci warnings git bisect good 9275b933d409d3a4efa08102ca813557b93fb0b9 # bad: [765cc3a4b224e22bf524fabe40284a524f37cdd0] sched/core: Optimize sched_feat() for !CONFIG_SCHED_DEBUG builds git bisect bad 765cc3a4b224e22bf524fabe40284a524f37cdd0 # good: [93824900a2e242766f5fe6ae7697e3d7171aa234] sched/fair: Search a task from the tail of the queue git bisect good 93824900a2e242766f5fe6ae7697e3d7171aa234 # good: [7863406143d8bbbbda07a61285c5f4c217908dfd] sched/isolation: Move housekeeping related code to its own file git bisect good 7863406143d8bbbbda07a61285c5f4c217908dfd # bad: [de201559df872f83d0c08fb4effe3efd28e6cbc8] sched/isolation: Introduce housekeeping flags git bisect bad de201559df872f83d0c08fb4effe3efd28e6cbc8 # good: [7e56a1cf4b28f5739526877b8dbad623fae2e4e7] sched/isolation: Make the housekeeping cpumask private git bisect good 7e56a1cf4b28f5739526877b8dbad623fae2e4e7 # good: [204c083a009378dfa751175b5fcddc75988bab6c] sched/isolation: Rename is_housekeeping_cpu() to housekeeping_cpu() git bisect good 204c083a009378dfa751175b5fcddc75988bab6c # bad: [5c4991e24c69737bd41fc2737b1e3980abbf73f9] sched/isolation: Split out new CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL git bisect bad 5c4991e24c69737bd41fc2737b1e3980abbf73f9 # first bad commit: [5c4991e24c69737bd41fc2737b1e3980abbf73f9] sched/isolation: Split out new CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============3156449320039556488==" MIME-Version: 1.0 From: Paul E. McKenney To: lkp@lists.01.org Subject: Re: [lkp-robot] [torture] b151f93a71: INFO:rcu_preempt_detected_stalls_on_CPUs/tasks Date: Wed, 29 Nov 2017 14:07:03 -0800 Message-ID: <20171129220703.GA12908@linux.vnet.ibm.com> In-Reply-To: <20171129190819.GA18159@linux.vnet.ibm.com> List-Id: --===============3156449320039556488== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Wed, Nov 29, 2017 at 11:08:19AM -0800, Paul E. McKenney wrote: > On Tue, Nov 28, 2017 at 01:08:10PM -0800, Paul E. McKenney wrote: > > On Tue, Nov 28, 2017 at 12:46:19PM -0800, Paul E. McKenney wrote: > > > On Tue, Nov 28, 2017 at 09:35:54AM -0800, Paul E. McKenney wrote: > > > > On Tue, Nov 28, 2017 at 06:10:08PM +0100, Thomas Gleixner wrote: > > > > > On Tue, 28 Nov 2017, Paul E. McKenney wrote: > > > > > > On Tue, Nov 28, 2017 at 05:47:35PM +0100, Thomas Gleixner wrote: > > > > > > diff --git a/kernel/time/timer.c b/kernel/time/timer.c > > > > > > index db774b0f217e..a3321bb565db 100644 > > > > > > --- a/kernel/time/timer.c > > > > > > +++ b/kernel/time/timer.c > > > > > > @@ -1803,7 +1803,7 @@ signed long __sched schedule_timeout(sign= ed long timeout) > > > > > > idx =3D timer_get_idx(&timer.timer); > > > > > > idx_now =3D calc_wheel_index(j, base->clk); > > > > > > raw_spin_unlock_irqrestore(&base->lock, flags); > > > > > > - pr_info("%s: Waylayed timer idx: %u idx_now: %u\n", __func__= , idx, idx_now); > > > > > > + pr_info("%s: Waylayed timer base->clk: %#lx jiffies: %#lx ba= se->next_expiry: %#lx timer->flags: %#x timer->expires %#lx idx: %u idx_now= : %u\n", __func__, base->clk, j, base->next_expiry, timer.timer.flags, time= r.timer.expires, idx, idx_now); > > > > > = > > > > > Please print idx and idx_now as hex values. It's simpler to decod= e that way. > > > > = > > > > Here you go! Starting tests at this end, focusing on TREE01 and TR= EE04. > > > > BTW, TREE04 doesn't do any CPU hotplug, providing a counterexample = to > > > > my long-held assumption that this only happened in the presence of = CPU > > > > hotplug operations. > > > = > > > And here is output with changes discussed on IRC. TREE04 managed to > > > have not one but two overlapping RCU CPU stall warnings, one for RCU-= bh > > > and the second for RCU-sched. TREE04 and TREE04. HZ=3D1000. > > = > > And here is the full patch, in all its lack of aesthetic appeal. > = > And here is the list of waylaid timers from last night's testing. The big > pile of them from TREE01 at the end is due to wakeups from kthread_stop(), > I am guessing. The TREE04 run only had two of them, but they seem reliab= le > enough that I just might be able to bisect. I will try that. And it converged to 5c4991e24c69 ("sched/isolation: Split out new CONFIG_CPU_ISOLATION=3Dy config from CONFIG_NO_HZ_FULL"), which is a bit hard to believe. Please see below for the log. I will be retesting some of the allegedly good commits, just in case. > Did your setup reproduce the problem? Thanx, Paul ------------------------------------------------------------------------ # bad: [4fbd8d194f06c8a3fd2af1ce560ddb31f7ec8323] Linux 4.15-rc1 # good: [bebc6082da0a9f5d47a1ea2edc099bf671058bd4] Linux 4.14 git bisect start 'v4.15-rc1' 'v4.14' # bad: [1be2172e96e33bfa22a5c7a651f768ef30ce3984] Merge tag 'modules-for-v4= .15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux git bisect bad 1be2172e96e33bfa22a5c7a651f768ef30ce3984 # bad: [2cd83ba5bede2f72cc6c79a19a1bddf576b50e88] Merge tag 'iommu-v4.15-rc= 1' of git://github.com/awilliam/linux-vfio git bisect bad 2cd83ba5bede2f72cc6c79a19a1bddf576b50e88 # bad: [449fcf3ab0baf3dde9952385e6789f2ca10c3980] Merge tag 'staging-4.15-r= c1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging git bisect bad 449fcf3ab0baf3dde9952385e6789f2ca10c3980 # bad: [43ff2f4db9d0f76452b77cfa645f02b471143b24] Merge branch 'x86-platfor= m-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect bad 43ff2f4db9d0f76452b77cfa645f02b471143b24 # good: [f08d8bcc12de5a153e587027e77de83662eefb8a] Merge tag 'please-pull-g= ettime_vsyscall_update' of git://git.kernel.org/pub/scm/linux/kernel/git/ae= gl/linux git bisect good f08d8bcc12de5a153e587027e77de83662eefb8a # good: [31486372a1e9a66ec2e9e2903b8792bba7e503e1] Merge branch 'perf-core-= for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good 31486372a1e9a66ec2e9e2903b8792bba7e503e1 # good: [9275b933d409d3a4efa08102ca813557b93fb0b9] resource: Fix resource_s= ize.cocci warnings git bisect good 9275b933d409d3a4efa08102ca813557b93fb0b9 # bad: [765cc3a4b224e22bf524fabe40284a524f37cdd0] sched/core: Optimize sche= d_feat() for !CONFIG_SCHED_DEBUG builds git bisect bad 765cc3a4b224e22bf524fabe40284a524f37cdd0 # good: [93824900a2e242766f5fe6ae7697e3d7171aa234] sched/fair: Search a tas= k from the tail of the queue git bisect good 93824900a2e242766f5fe6ae7697e3d7171aa234 # good: [7863406143d8bbbbda07a61285c5f4c217908dfd] sched/isolation: Move ho= usekeeping related code to its own file git bisect good 7863406143d8bbbbda07a61285c5f4c217908dfd # bad: [de201559df872f83d0c08fb4effe3efd28e6cbc8] sched/isolation: Introduc= e housekeeping flags git bisect bad de201559df872f83d0c08fb4effe3efd28e6cbc8 # good: [7e56a1cf4b28f5739526877b8dbad623fae2e4e7] sched/isolation: Make th= e housekeeping cpumask private git bisect good 7e56a1cf4b28f5739526877b8dbad623fae2e4e7 # good: [204c083a009378dfa751175b5fcddc75988bab6c] sched/isolation: Rename = is_housekeeping_cpu() to housekeeping_cpu() git bisect good 204c083a009378dfa751175b5fcddc75988bab6c # bad: [5c4991e24c69737bd41fc2737b1e3980abbf73f9] sched/isolation: Split ou= t new CONFIG_CPU_ISOLATION=3Dy config from CONFIG_NO_HZ_FULL git bisect bad 5c4991e24c69737bd41fc2737b1e3980abbf73f9 # first bad commit: [5c4991e24c69737bd41fc2737b1e3980abbf73f9] sched/isolat= ion: Split out new CONFIG_CPU_ISOLATION=3Dy config from CONFIG_NO_HZ_FULL --===============3156449320039556488==--