From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751970AbdJCPFl (ORCPT ); Tue, 3 Oct 2017 11:05:41 -0400 Received: from mx1.redhat.com ([209.132.183.28]:47142 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750812AbdJCPFk (ORCPT ); Tue, 3 Oct 2017 11:05:40 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com D1DCB2CE912 Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=jpoimboe@redhat.com Date: Tue, 3 Oct 2017 10:05:38 -0500 From: Josh Poimboeuf To: Fengguang Wu Cc: Byungchul Park , Ingo Molnar , "Peter Zijlstra (Intel)" , linux-kernel@vger.kernel.org, LKP Subject: Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2 Message-ID: <20171003150538.57dq7xe6afikpzyp@treble> References: <20171003140634.r2jzujgl62ox4uzh@wfg-t540p.sh.intel.com> <20171003143147.ypotpg4f3dql4gtf@treble> <20171003144136.ar7z3pavhvizfabs@treble> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20171003144136.ar7z3pavhvizfabs@treble> User-Agent: Mutt/1.6.0.1 (2016-04-01) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Tue, 03 Oct 2017 15:05:40 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 03, 2017 at 09:41:36AM -0500, Josh Poimboeuf wrote: > On Tue, Oct 03, 2017 at 09:31:47AM -0500, Josh Poimboeuf wrote: > > On Tue, Oct 03, 2017 at 10:06:34PM +0800, Fengguang Wu wrote: > > > Hi Byungchul, > > > > > > This patch triggers a NULL-dereference bug at update_stack_state(). > > > Although its parent commit also has a NULL-dereference bug, however > > > the call stack looks rather different. Both dmesg files are attached. > > > > > > It also triggers this warning, which is being discussed in another > > > thread, so CC Josh. The full dmesg attached, too. > > > > > > Please press Enter to activate this console. > > > [ 138.605622] WARNING: kernel stack regs at be299c9a in procd:340 has bad 'bp' value 000001be > > > [ 138.605627] unwind stack type:0 next_sp: (null) mask:0x2 graph_idx:0 > > > [ 138.605631] be299c9a: 299ceb00 (0x299ceb00) > > > [ 138.605633] be299c9e: 2281f1be (0x2281f1be) > > > [ 138.605634] be299ca2: 299cebb6 (0x299cebb6) > > > > I suspect the bug is in: > > > > ce07a9415f26 ("locking/lockdep: Make check_prev_add() able to handle external stack_trace") > > > > It converts the stack-allocated stack_trace struct from static to > > non-static, yet still adds it to a list. Does this fix it? > > Actually, I spoke too soon. It's not actually adding the pointer to the > list, it's copying its contents. So never mind... I don't know the lockdep code, but one more comment from the peanut gallery. This code looks suspect to me: /* * Stop saving stack_trace if save_trace() was * called at least once: */ if (save && ret == 2) save = NULL; >>From looking at check_prev_add(), a return value of 2 doesn't necessarily imply that save_trace() was called. If the check_redundant() call returns 0, then check_prev_add() can return 2, and the trace will still be uninitialized, but save will be set to NULL even though save_trace() hasn't been called. Then a subsequent call to check_prev_add() could add an uninitialized stack_trace struct to the dependency list. I could be wrong, but it's at least something the lockdep folks might want to look at. -- Josh From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============0322214072052034052==" MIME-Version: 1.0 From: Josh Poimboeuf To: lkp@lists.01.org Subject: Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2 Date: Tue, 03 Oct 2017 10:05:38 -0500 Message-ID: <20171003150538.57dq7xe6afikpzyp@treble> In-Reply-To: <20171003144136.ar7z3pavhvizfabs@treble> List-Id: --===============0322214072052034052== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Tue, Oct 03, 2017 at 09:41:36AM -0500, Josh Poimboeuf wrote: > On Tue, Oct 03, 2017 at 09:31:47AM -0500, Josh Poimboeuf wrote: > > On Tue, Oct 03, 2017 at 10:06:34PM +0800, Fengguang Wu wrote: > > > Hi Byungchul, > > > = > > > This patch triggers a NULL-dereference bug at update_stack_state(). > > > Although its parent commit also has a NULL-dereference bug, however > > > the call stack looks rather different. Both dmesg files are attached. > > > = > > > It also triggers this warning, which is being discussed in another > > > thread, so CC Josh. The full dmesg attached, too. > > > = > > > Please press Enter to activate this console. > > > [ 138.605622] WARNING: kernel stack regs at be299c9a in proc= d:340 has bad 'bp' value 000001be > > > [ 138.605627] unwind stack type:0 next_sp: (null) mask:0x2 = graph_idx:0 > > > [ 138.605631] be299c9a: 299ceb00 (0x299ceb00) > > > [ 138.605633] be299c9e: 2281f1be (0x2281f1be) > > > [ 138.605634] be299ca2: 299cebb6 (0x299cebb6) > > = > > I suspect the bug is in: > > = > > ce07a9415f26 ("locking/lockdep: Make check_prev_add() able to handle = external stack_trace") > > = > > It converts the stack-allocated stack_trace struct from static to > > non-static, yet still adds it to a list. Does this fix it? > = > Actually, I spoke too soon. It's not actually adding the pointer to the > list, it's copying its contents. So never mind... I don't know the lockdep code, but one more comment from the peanut gallery. This code looks suspect to me: /* * Stop saving stack_trace if save_trace() was * called at least once: */ if (save && ret =3D=3D 2) save =3D NULL; >>From looking at check_prev_add(), a return value of 2 doesn't necessarily imply that save_trace() was called. If the check_redundant() call returns 0, then check_prev_add() can return 2, and the trace will still be uninitialized, but save will be set to NULL even though save_trace() hasn't been called. Then a subsequent call to check_prev_add() could add an uninitialized stack_trace struct to the dependency list. I could be wrong, but it's at least something the lockdep folks might want to look at. -- = Josh --===============0322214072052034052==--