From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1754804AbcK2Q3a (ORCPT <rfc822;w@1wt.eu>);
        Tue, 29 Nov 2016 11:29:30 -0500
Received: from mx2.suse.de ([195.135.220.15]:52239 "EHLO mx2.suse.de"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1750922AbcK2Q3X (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 29 Nov 2016 11:29:23 -0500
Date: Tue, 29 Nov 2016 17:29:20 +0100
From: Petr Mladek <pmladek@suse.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
        Josh Poimboeuf <jpoimboe@redhat.com>,
        Vince Weaver <vincent.weaver@maine.edu>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Ingo Molnar <mingo@redhat.com>,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        "dvyukov@google.com" <dvyukov@google.com>
Subject: Re: perf: fuzzer BUG: KASAN: stack-out-of-bounds in __unwind_start
Message-ID: <20161129162920.GF21230@pathway.suse.cz>
References: <alpine.DEB.2.20.1611241229180.25241@macbook-air>
 <20161128215411.fkis7bbimjy4v4j7@treble>
 <20161129004021.GL3924@linux.vnet.ibm.com>
 <20161129055241.6dy2dt4q4ptazk2s@treble>
 <20161129124323.GJ3092@twins.programming.kicks-ass.net>
 <20161129151004.GU3924@linux.vnet.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20161129151004.GU3924@linux.vnet.ibm.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue 2016-11-29 07:10:04, Paul E. McKenney wrote:
> On Tue, Nov 29, 2016 at 01:43:23PM +0100, Peter Zijlstra wrote:
> > On Mon, Nov 28, 2016 at 11:52:41PM -0600, Josh Poimboeuf wrote:
> > 
> > > Did a little digging on git blame and found the following commit (which
> > > seems to be the cause of the KASAN warning and missing stack dump):
> > > 
> > >   bc1dce514e9b ("rcu: Don't use NMIs to dump other CPUs' stacks")
> > > 
> > > I presume this commit is still needed because of the NMI printk deadlock
> > > issues which were discussed at Kernel Summit.  I guess those issues need
> > > to be sorted out before the above commit can be reverted.
> > 
> > Also, I most always run with these here patches applied:
> > 
> >   https://lkml.kernel.org/r/20161018170830.405990950@infradead.org
> > 
> > People are very busy polishing the turd we call printk, but from where
> > I'm sitting its terminally and unfixably broken.

I still hope that we could do better :-)


> > I should certainly add a revert of the above commit to the stack of
> > patches I carry.
> 
> This isn't making me feel particularly confident about switching RCU
> CPU stall warnings back to NMIs...  ;-)

IMHO, trigger_single_cpu_backtrace() is pretty safe at the moment.
It uses per-CPU buffers a lockless way in NMI context. It even makes
sure that the buffers are flushed to the main log buffer and console
once it is back from NMI.

By other words, the deadlocks in NMI context should be gone. The
NMI buffers are flushed using the classic printk(). Therefore
the risk is the same as when you use printk() directly
in rcu_dump_cpu_stacks() now.

Best Regards,
Petr