From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752653AbeAWU5L (ORCPT ); Tue, 23 Jan 2018 15:57:11 -0500 Received: from mail-qt0-f193.google.com ([209.85.216.193]:46055 "EHLO mail-qt0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751681AbeAWU5K (ORCPT ); Tue, 23 Jan 2018 15:57:10 -0500 X-Google-Smtp-Source: AH8x226KOdgt7Z/NO2xUr56zFPWRbOvOVjWrvxmkKCFlrZ3vRe0WL/KnYZDZSvm8hh0gP3WogTU32g== Date: Tue, 23 Jan 2018 12:57:06 -0800 From: Tejun Heo To: Rik van Riel Cc: Peter Zijlstra , Ingo Molnar , linux-kernel@vger.kernel.org, kernel-team@fb.com, Steven Rostedt , Sergey Senozhatsky , Petr Mladek Subject: Re: [PATCH] lockdep: Avoid triggering hardlockup from debug_show_all_locks() Message-ID: <20180123205706.GH1771050@devbig577.frc2.facebook.com> References: <20180122220055.GB1771050@devbig577.frc2.facebook.com> <1516734237.31954.17.camel@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1516734237.31954.17.camel@fb.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, (cc'ing Steven, Sergey and Petr who are working on printk) On Tue, Jan 23, 2018 at 02:03:57PM -0500, Rik van Riel wrote: > On Mon, 2018-01-22 at 14:00 -0800, Tejun Heo wrote: > > debug_show_all_locks() iterates all tasks and print held locks whole > > holding tasklist_lock. This can take a while on a slow console > > device > > and may end up triggering NMI hardlockup detector if someone else > > ends > > up waiting for tasklist_lock. > > > > Touch the NMI watchdog while printing the held locks to avoid > > spuriously triggering the hardlockup detector. > > > > Signed-off-by: Tejun Heo > > On this patch: > > Acked-by: Rik van Riel > > > However, it seems like we run into things like > this on a fairly regular (though not very frequent) > basis. Would it make sense to go through the code > and add sprinkle around a few more touch_nmi_watchdog() > calls? > > After all, there are maybe a few dozen places where > we print out a lot of debugging information. Yeah, it's ridiculous how often printk ends up escalating otherwise recoverable situations into system crashes. I don't know what the right answer is. For spurious NMI hardlockups, maybe auditing debug paths and adding touch_nmi_watchdog() would be enough but that also is a pretty leaky approach. Thanks. -- tejun