From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755352AbcLSKVR (ORCPT ); Mon, 19 Dec 2016 05:21:17 -0500 Received: from merlin.infradead.org ([205.233.59.134]:53678 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755181AbcLSKU0 (ORCPT ); Mon, 19 Dec 2016 05:20:26 -0500 Date: Mon, 19 Dec 2016 11:20:24 +0100 From: Peter Zijlstra To: Sergey Senozhatsky Cc: Ingo Molnar , Linus Torvalds , linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: Re: [RFC][PATCH] spinlock_debug: report spinlock lockup from unlock Message-ID: <20161219102024.GC3107@twins.programming.kicks-ass.net> References: <20161217161911.4753-1-sergey.senozhatsky@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161217161911.4753-1-sergey.senozhatsky@gmail.com> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Dec 18, 2016 at 01:19:11AM +0900, Sergey Senozhatsky wrote: > There is a race window between the point when __spin_lock_debug() > detects spinlock lockup and the time when CPU that caused the > lockup receives its backtrace interrupt. > > Before __spin_lock_debug() triggers all_cpu_backtrace() it calls > spin_dump() to printk() the current state of the lock and CPU > backtrace. These printk() calls can take some time to print the > messages to serial console, for instance (we are not talking > about console_unlock() loop and a flood of messages from other > CPUs, but just spin_dump() printk() and serial console). > > All those preparation steps can give CPU that caused the lockup > enough time to run away, so when it receives a backtrace interrupt > it can look completely innocent. > > The patch extends `struct raw_spinlock' with additional variable > that stores jiffies of successful do_raw_spin_lock() and checks > in debug_spin_unlock() whether the spin_lock has been locked for > too long. So we will have a reliable backtrace from CPU that > locked up and a reliable backtrace from CPU that caused the > lockup. But why? Also, why jiffies, that's a horrible source of time.