From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30CA2CA9EB6 for ; Wed, 23 Oct 2019 13:48:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F2CEE20663 for ; Wed, 23 Oct 2019 13:48:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F2CEE20663 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linutronix.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A799D6B0005; Wed, 23 Oct 2019 09:48:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A51266B0006; Wed, 23 Oct 2019 09:48:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 966D06B0007; Wed, 23 Oct 2019 09:48:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0247.hostedemail.com [216.40.44.247]) by kanga.kvack.org (Postfix) with ESMTP id 7930D6B0005 for ; Wed, 23 Oct 2019 09:48:03 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 0DA0552AD for ; Wed, 23 Oct 2019 13:48:03 +0000 (UTC) X-FDA: 76075178046.06.doll93_463af4814df2f X-HE-Tag: doll93_463af4814df2f X-Filterd-Recvd-Size: 3490 Received: from Galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf24.hostedemail.com (Postfix) with ESMTP for ; Wed, 23 Oct 2019 13:48:02 +0000 (UTC) Received: from [5.158.153.52] (helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1iNGzV-0002uQ-RP; Wed, 23 Oct 2019 15:47:57 +0200 Date: Wed, 23 Oct 2019 15:47:57 +0200 (CEST) From: Thomas Gleixner To: Cyrill Gorcunov cc: LKML , Ingo Molnar , Peter Zijlstra , linux-mm@kvack.org, Catalin Marinas Subject: Re: [BUG -tip] kmemleak and stacktrace cause page faul In-Reply-To: Message-ID: References: <20191019114421.GK9698@uranus.lan> <20191022142325.GD12121@uranus.lan> <20191022145619.GE12121@uranus.lan> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, 23 Oct 2019, Thomas Gleixner wrote: > On Tue, 22 Oct 2019, Cyrill Gorcunov wrote: > Ergo ep must be a valid pointer pointing to the statically allocated and > statically initialized estack_pages array. > > /* Guard page? */ > if (!ep->size) > > How on earth can dereferencing ep crash the machine? > > return false; > > That does not make any sense. > > Surely, we should not even try to decode exception stack when > cea_exception_stacks is not yet initialized, but that does not explain > anything what you are observing. So looking at your actual crash: [ 0.027246] BUG: unable to handle page fault for address: 0000000000001ff0 So this derefences the stack pointer address. [ 0.082275] stk 0x1010 k 1 begin 0x0 end 0xd000 estack_pages 0xffffffff82014880 ep 0xffffffff82014888 ep is pointing correctly to estack_pages[1] which is bogus because 0x1010 is not a valid stack value, but dereferencing ep does not make it crash. The crash farther down: end = begin + (unsigned long)ep->size; ==> end = 0x2000 regs = (struct pt_regs *)end - 1; ==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0 info->type = ep->type; info->begin = (unsigned long *)begin; info->end = (unsigned long *)end; ----> info->next_sp = (unsigned long *)regs->sp; This is the crashing instruction trying to access 0x1ff0 And you are right this happens because cea_exception_stacks is not yet initialized which makes begin = 0 and therefore point into nirvana. So the fix is trivial. Thanks, tglx 8<------------ --- a/arch/x86/kernel/dumpstack_64.c +++ b/arch/x86/kernel/dumpstack_64.c @@ -94,6 +94,13 @@ static bool in_exception_stack(unsigned BUILD_BUG_ON(N_EXCEPTION_STACKS != 6); begin = (unsigned long)__this_cpu_read(cea_exception_stacks); + /* + * Handle the case where stack trace is collected _before_ + * cea_exception_stacks had been initialized. + */ + if (!begin) + return false; + end = begin + sizeof(struct cea_exception_stacks); /* Bail if @stack is outside the exception stack area. */ if (stk < begin || stk >= end)