From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751987AbcGWAcK (ORCPT ); Fri, 22 Jul 2016 20:32:10 -0400 Received: from mail-vk0-f52.google.com ([209.85.213.52]:33541 "EHLO mail-vk0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750728AbcGWAcI (ORCPT ); Fri, 22 Jul 2016 20:32:08 -0400 MIME-Version: 1.0 In-Reply-To: References: From: Andy Lutomirski Date: Fri, 22 Jul 2016 17:31:47 -0700 Message-ID: Subject: Re: [PATCH 00/19] x86/dumpstack: rewrite x86 stack dump code To: Linus Torvalds Cc: Josh Poimboeuf , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , "the arch/x86 maintainers" , Linux Kernel Mailing List , Steven Rostedt , Brian Gerst , Kees Cook , Peter Zijlstra , Frederic Weisbecker , Byungchul Park Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 22, 2016 at 5:22 PM, Linus Torvalds wrote: > > So without having yet looked at the code, I want people to understand > that to a very real degree, the stack tracer that the *oopsing* code > (ie what all the usual kernel fault handlers use) is very very special > code and needs to be handled very carefully, and needs to be extra > robust, even in the presence of stack corruption, and even in the > presence of the dwarf info being totally corrupted. Because we've very > much had both things happen. > > It is very possible that we should have two different stack tracers - > the stupid "for oopses only" code that doesn't necessarily give the > perfect trace, but is very anal and happily gives old stale addresses > (which can be very useful for seeing what happened just before the > "real" stack trace), and then a separate stack trace engine that is > clever and gets things right, and if that one faults it can depend on > the normal kernel fault handling picking up the pieces. I think that Josh's code has the potential to be extremely robust *and* give more correct results when possible. One thing I intend to review when v2 shows up is that it's as conservative as it needs to be to avoid ever dereferencing an out-of-bounds pointer. And Josh's oops printer carefully walks and prints out all addresses on the stack (complete with question marks) even if the unwinder doesn't find them. > > Yes, the current stack tracer is crufty. No, it's not perfect. But it > is very well tested, and has held up. That should not be dismissed. > I think you may be giving the current tracer slightly more credit than it's due. In my stack guard page patchset, I fixed two separate issues, one of which caused recursive faults and one of which caused it to output nothing at all. So maybe *now* it's very robust :) But it's still an umaintainable mess IMO, and Josh's patchset helps a *lot*. --Andy