From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1750903AbdEERME (ORCPT <rfc822;w@1wt.eu>);
        Fri, 5 May 2017 13:12:04 -0400
Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:44421 "EHLO
        mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL)
        by vger.kernel.org with ESMTP id S1750784AbdEERMD (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 5 May 2017 13:12:03 -0400
Date: Fri, 5 May 2017 10:11:59 -0700
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Tejun Heo <tj@kernel.org>
Cc: jiangshanlai@gmail.com, linux-kernel@vger.kernel.org
Subject: Re: WARN_ON_ONCE() in process_one_work()?
Reply-To: paulmck@linux.vnet.ibm.com
References: <20170501165747.GA993@linux.vnet.ibm.com>
 <20170501183807.GA7054@linux.vnet.ibm.com>
 <20170501184402.GB8921@htj.duckdns.org>
 <20170501185819.GJ3956@linux.vnet.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170501185819.GJ3956@linux.vnet.ibm.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-GCONF: 00
x-cbid: 17050517-0048-0000-0000-000001786789
X-IBM-SpamModules-Scores: 
X-IBM-SpamModules-Versions: BY=3.00007026; HX=3.00000240; KW=3.00000007;
 PH=3.00000004; SC=3.00000209; SDB=6.00856594; UDB=6.00424127; IPR=6.00635862;
 BA=6.00005326; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000;
 ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015326; XFM=3.00000014;
 UTC=2017-05-05 17:12:01
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 17050517-0049-0000-0000-000041031107
Message-Id: <20170505171159.GA10296@linux.vnet.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-05-05_12:,,
 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0
 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam
 adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000
 definitions=main-1705050169
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Mon, May 01, 2017 at 11:58:19AM -0700, Paul E. McKenney wrote:
> On Mon, May 01, 2017 at 02:44:02PM -0400, Tejun Heo wrote:
> > Hello, Paul.
> > 
> > On Mon, May 01, 2017 at 11:38:07AM -0700, Paul E. McKenney wrote:
> > > On Mon, May 01, 2017 at 09:57:47AM -0700, Paul E. McKenney wrote:
> > > > Hello!
> > > > 
> > > > I am hitting this WARN_ON_ONCE() in process_one_work() and am wondering
> > > > what I did wrong to make this happen:
> > > 
> > > Oh, wait...  Rescuer, it says.  Might this be due to the fact that RCU's
> > > expedited grace periods block within a workqueue handler?  Might this
> > > in turn run the system out of workqueue kthreads?  If this is the likely
> > > cause, my approach would be to rework the expected-grace-period workqueue
> > > handler to return when waiting for the grace period to complete, and to
> > > replace the current wakeup with a schedule_work() or something similar.
> > 
> > That should be completely fine.  It could just be that the rescuer
> > path has a bug around CPU hotplug handling.  Can you please confirm
> > either way on the cpuset usage?
> 
> I have no explicit cpuset usage or affinity of the workqueue handlers
> themselves.
> 
> However, this is thus far only happening in CONFIG_NO_HZ_FULL=y runs, in
> this case, with the kernel boot parameter nohz_full=2-9 out of 16 CPUs.
> IIRC, this sets up a "housekeeping" cpuset that pushes normal tasks away
> from the nohz_full CPUs.
> 
> I do build with CONFIG_HOTPLUG_CPU=y, and the test does a lot of
> hotplugging.  Also, other kthreads (but again, not the workqueue handlers)
> do a lot of explicit CPU-affinity manipulation.

Just following up...  I have hit this bug a couple of times over the
past few days.  Anything I can do to help?

							Thanx, Paul