From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933745AbcIOGhx (ORCPT ); Thu, 15 Sep 2016 02:37:53 -0400 Received: from mail-wm0-f66.google.com ([74.125.82.66]:35158 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1764099AbcIOGhw (ORCPT ); Thu, 15 Sep 2016 02:37:52 -0400 Date: Thu, 15 Sep 2016 08:37:47 +0200 From: Ingo Molnar To: Andy Lutomirski , Linus Torvalds Cc: x86@kernel.org, Borislav Petkov , linux-kernel@vger.kernel.org, Brian Gerst , Jann Horn , Thomas Gleixner , "H. Peter Anvin" Subject: Re: [PATCH 08/12] x86/dumpstack: Pin the target stack in save_stack_trace_tsk() Message-ID: <20160915063747.GC13992@gmail.com> References: <7cd5e328dc75d8ccd912e5783665de34503f7c63.1473801993.git.luto@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7cd5e328dc75d8ccd912e5783665de34503f7c63.1473801993.git.luto@kernel.org> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Andy Lutomirski wrote: > This will prevent a crash if the target task dies before or while > dumping its stack once we start freeing task stacks early. > > Signed-off-by: Andy Lutomirski > --- > arch/x86/kernel/stacktrace.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c > index 4738f5e0f2ab..b3f32fbe3ba4 100644 > --- a/arch/x86/kernel/stacktrace.c > +++ b/arch/x86/kernel/stacktrace.c > @@ -79,9 +79,14 @@ void save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace) > > void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace) > { > + if (!try_get_task_stack(tsk)) > + return; > + > dump_trace(tsk, NULL, NULL, 0, &save_stack_ops_nosched, trace); > if (trace->nr_entries < trace->max_entries) > trace->entries[trace->nr_entries++] = ULONG_MAX; > + > + put_task_stack(tsk); > } > EXPORT_SYMBOL_GPL(save_stack_trace_tsk); So I very much like the first half of the series, the thread_info merge into task_struct (yay!) and I am in the process of applying those bits to -tip. So the above not (yet) fully correct patch is in preparation to a later change you are doing in this series: [PATCH 11/12] sched: Free the stack early if CONFIG_THREAD_INFO_IN_TASK But I am having serious second thoughts about the fact that we now have to sprinke non-obvious try_get_task_stack()/put_task_stack() pairs into debugging code that never really needed anything like that, which pairs are easy to get wrong, easy to miss - and generally if we miss them will it will result in a slightly buggy, very unpleasant, racy, crashy behavior of the kernel. Can we just ... not do the delayed freeing? I mean, the cache hotness arguments don't look overly convincing to me: if we just start not using a piece of formerly cache hot memory then those cache lines will eventually decay naturally in whatever LRU scheme the CPU is using and will be reused without much harm done. It's just "another day in CPU land" that happens all the time. Yes, we could reduce the latency of the freeing and thus slightly improve the locality of other bits ... but is it really measurable or noticeable in any fashion. It's like task/thread fork/clone/exit is _that_ much of a fast path. And then there's also the whole CONFIG_THREAD_INFO_IN_TASK #ifdeffery spreading. I.e I just don't see the justification for this kind of object life time assymetry and hard to debug fragility. Am I missing something? Thanks, Ingo