From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752303AbaIMAin (ORCPT ); Fri, 12 Sep 2014 20:38:43 -0400 Received: from e35.co.us.ibm.com ([32.97.110.153]:44174 "EHLO e35.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751665AbaIMAim (ORCPT ); Fri, 12 Sep 2014 20:38:42 -0400 Date: Fri, 12 Sep 2014 17:38:37 -0700 From: "Paul E. McKenney" To: Fengguang Wu Cc: Christoph Lameter , Shan Wei , Jet Chen , Su Tao , Yuanhan Liu , LKP , linux-kernel@vger.kernel.org, bobby.prani@gmail.com, Tejun Heo Subject: Re: [rcu] BUG: unable to handle kernel NULL pointer dereference at 000000da Message-ID: <20140913003837.GO4775@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20140901084403.GA18808@localhost> <20140912190238.GJ4775@linux.vnet.ibm.com> <20140912192659.GM4775@linux.vnet.ibm.com> <20140913002005.GA9550@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140913002005.GA9550@localhost> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14091300-6688-0000-0000-000004B9C018 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Sep 13, 2014 at 08:20:05AM +0800, Fengguang Wu wrote: > On Fri, Sep 12, 2014 at 12:26:59PM -0700, Paul E. McKenney wrote: > > On Fri, Sep 12, 2014 at 02:19:57PM -0500, Christoph Lameter wrote: > > > On Fri, 12 Sep 2014, Paul E. McKenney wrote: > > > > > > > So, I am not seeing this failure in my testing, but my best guess is > > > > that the problem is due to the fact that force_quiescent_state() is > > > > sometimes invoked with preemption enabled, which breaks __this_cpu_read() > > > > though perhaps with very low probability. The common-case call (from > > > > __call_rcu_core()) -does- have preemption disabled, in fact, it has > > > > interrupts disabled. > > > > > > How could __this_cpu_read() break in a way that would make a difference to > > > the code? There was no disabling/enabling of preemption before the patch > > > and there is nothing like that after the patch. If there was a race then > > > it still exists. The modification certainly cannot create a race. > > > > Excellent question. Yet Fengguang's tests show breakage. > > > > Fengguang, any possibility of a false positive here? > > Yes, it is possible. I find the first bad commit and its parent > commit's kernels are built in 2 different machines which might > cause subtle changes. I'll redo the bisect. Thank you, Fengguang, and please let me know how it goes! Thanx, Paul From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============5422116957638800634==" MIME-Version: 1.0 From: Paul E. McKenney To: lkp@lists.01.org Subject: Re: [rcu] BUG: unable to handle kernel NULL pointer dereference at 000000da Date: Sat, 13 Sep 2014 00:38:41 +0000 Message-ID: <20140913003837.GO4775@linux.vnet.ibm.com> In-Reply-To: <20140913002005.GA9550@localhost> List-Id: --===============5422116957638800634== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Sat, Sep 13, 2014 at 08:20:05AM +0800, Fengguang Wu wrote: > On Fri, Sep 12, 2014 at 12:26:59PM -0700, Paul E. McKenney wrote: > > On Fri, Sep 12, 2014 at 02:19:57PM -0500, Christoph Lameter wrote: > > > On Fri, 12 Sep 2014, Paul E. McKenney wrote: > > > = > > > > So, I am not seeing this failure in my testing, but my best guess is > > > > that the problem is due to the fact that force_quiescent_state() is > > > > sometimes invoked with preemption enabled, which breaks __this_cpu_= read() > > > > though perhaps with very low probability. The common-case call (fr= om > > > > __call_rcu_core()) -does- have preemption disabled, in fact, it has > > > > interrupts disabled. > > > = > > > How could __this_cpu_read() break in a way that would make a differen= ce to > > > the code? There was no disabling/enabling of preemption before the pa= tch > > > and there is nothing like that after the patch. If there was a race t= hen > > > it still exists. The modification certainly cannot create a race. > > = > > Excellent question. Yet Fengguang's tests show breakage. > > = > > Fengguang, any possibility of a false positive here? > = > Yes, it is possible. I find the first bad commit and its parent > commit's kernels are built in 2 different machines which might > cause subtle changes. I'll redo the bisect. Thank you, Fengguang, and please let me know how it goes! Thanx, Paul --===============5422116957638800634==--