From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52396C1B0F2 for ; Wed, 20 Jun 2018 08:30:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 17A93208CC for ; Wed, 20 Jun 2018 08:30:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 17A93208CC Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linutronix.de Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932385AbeFTIai (ORCPT ); Wed, 20 Jun 2018 04:30:38 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:59448 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750793AbeFTIad (ORCPT ); Wed, 20 Jun 2018 04:30:33 -0400 Received: from p4fea482e.dip0.t-ipconnect.de ([79.234.72.46] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1fVXqM-0000Sm-Bb; Wed, 20 Jun 2018 09:47:54 +0200 Date: Wed, 20 Jun 2018 09:47:53 +0200 (CEST) From: Thomas Gleixner To: Ricardo Neri cc: Ingo Molnar , "H. Peter Anvin" , Andi Kleen , Ashok Raj , Borislav Petkov , Tony Luck , "Ravi V. Shankar" , x86@kernel.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, Jacob Pan , "Rafael J. Wysocki" , Don Zickus , Nicholas Piggin , Michael Ellerman , Frederic Weisbecker , Alexei Starovoitov , Babu Moger , Mathieu Desnoyers , Masami Hiramatsu , Peter Zijlstra , Andrew Morton , Philippe Ombredanne , Colin Ian King , Byungchul Park , "Paul E. McKenney" , "Luis R. Rodriguez" , Waiman Long , Josh Poimboeuf , Randy Dunlap , Davidlohr Bueso , Christoffer Dall , Marc Zyngier , Kai-Heng Feng , Konrad Rzeszutek Wilk , David Rientjes , iommu@lists.linux-foundation.org Subject: Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI In-Reply-To: <20180620001535.GA27531@voyager> Message-ID: References: <1528851463-21140-1-git-send-email-ricardo.neri-calderon@linux.intel.com> <1528851463-21140-18-git-send-email-ricardo.neri-calderon@linux.intel.com> <20180615020314.GA11625@voyager> <20180616005103.GC6659@voyager> <20180620001535.GA27531@voyager> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 19 Jun 2018, Ricardo Neri wrote: > On Sat, Jun 16, 2018 at 03:24:49PM +0200, Thomas Gleixner wrote: > > The status register is useless in case of MSI. MSI is edge triggered .... > > > > The only register which gives you proper information is the counter > > register itself. That adds an massive overhead to each NMI, because the > > counter register access is synchronized to the HPET clock with hardware > > magic. Plus on larger systems, the HPET access is cross node and even > > slower. > > It starts to sound that the HPET is too slow to drive the hardlockup detector. > > Would it be possible to envision a variant of this implementation? In this > variant, the HPET only targets a single CPU. The actual hardlockup detector > is implemented by this single CPU sending interprocessor interrupts to the > rest of the CPUs. And these IPIs must be NMIs which need to have a software based indicator that the watchdog needs to be checked, which is going to create yet another can of race conditions and in the worst case 'unknown NMI' splats. Not pretty either. Thanks, tglx From mboxrd@z Thu Jan 1 00:00:00 1970 From: Thomas Gleixner Date: Wed, 20 Jun 2018 07:47:53 +0000 Subject: Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI Message-Id: List-Id: References: <1528851463-21140-1-git-send-email-ricardo.neri-calderon@linux.intel.com> <1528851463-21140-18-git-send-email-ricardo.neri-calderon@linux.intel.com> <20180615020314.GA11625@voyager> <20180616005103.GC6659@voyager> <20180620001535.GA27531@voyager> In-Reply-To: <20180620001535.GA27531@voyager> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Ricardo Neri Cc: "Rafael J. Wysocki" , Peter Zijlstra , Alexei Starovoitov , Kai-Heng Feng , "H. Peter Anvin" , sparclinux-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Ingo Molnar , Christoffer Dall , Davidlohr Bueso , Ashok Raj , Michael Ellerman , x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, David Rientjes , Andi Kleen , Waiman Long , Borislav Petkov , Masami Hiramatsu , Don Zickus , "Ravi V. Shankar" , Konrad Rzeszutek Wilk , Marc Zyngier , Frederic Weisbecker , Nicholas Piggin On Tue, 19 Jun 2018, Ricardo Neri wrote: > On Sat, Jun 16, 2018 at 03:24:49PM +0200, Thomas Gleixner wrote: > > The status register is useless in case of MSI. MSI is edge triggered .... > > > > The only register which gives you proper information is the counter > > register itself. That adds an massive overhead to each NMI, because the > > counter register access is synchronized to the HPET clock with hardware > > magic. Plus on larger systems, the HPET access is cross node and even > > slower. > > It starts to sound that the HPET is too slow to drive the hardlockup detector. > > Would it be possible to envision a variant of this implementation? In this > variant, the HPET only targets a single CPU. The actual hardlockup detector > is implemented by this single CPU sending interprocessor interrupts to the > rest of the CPUs. And these IPIs must be NMIs which need to have a software based indicator that the watchdog needs to be checked, which is going to create yet another can of race conditions and in the worst case 'unknown NMI' splats. Not pretty either. Thanks, tglx From mboxrd@z Thu Jan 1 00:00:00 1970 From: Thomas Gleixner Subject: Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI Date: Wed, 20 Jun 2018 09:47:53 +0200 (CEST) Message-ID: References: <1528851463-21140-1-git-send-email-ricardo.neri-calderon@linux.intel.com> <1528851463-21140-18-git-send-email-ricardo.neri-calderon@linux.intel.com> <20180615020314.GA11625@voyager> <20180616005103.GC6659@voyager> <20180620001535.GA27531@voyager> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20180620001535.GA27531@voyager> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Ricardo Neri Cc: "Rafael J. Wysocki" , Peter Zijlstra , Alexei Starovoitov , Kai-Heng Feng , "H. Peter Anvin" , sparclinux-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Ingo Molnar , Christoffer Dall , Davidlohr Bueso , Ashok Raj , Michael Ellerman , x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, David Rientjes , Andi Kleen , Waiman Long , Borislav Petkov , Masami Hiramatsu , Don Zickus , "Ravi V. Shankar" , Konrad Rzeszutek Wilk , Marc Zyngier , Frederic Weisbecker , Nicholas Piggin List-Id: iommu@lists.linux-foundation.org On Tue, 19 Jun 2018, Ricardo Neri wrote: > On Sat, Jun 16, 2018 at 03:24:49PM +0200, Thomas Gleixner wrote: > > The status register is useless in case of MSI. MSI is edge triggered .... > > > > The only register which gives you proper information is the counter > > register itself. That adds an massive overhead to each NMI, because the > > counter register access is synchronized to the HPET clock with hardware > > magic. Plus on larger systems, the HPET access is cross node and even > > slower. > > It starts to sound that the HPET is too slow to drive the hardlockup detector. > > Would it be possible to envision a variant of this implementation? In this > variant, the HPET only targets a single CPU. The actual hardlockup detector > is implemented by this single CPU sending interprocessor interrupts to the > rest of the CPUs. And these IPIs must be NMIs which need to have a software based indicator that the watchdog needs to be checked, which is going to create yet another can of race conditions and in the worst case 'unknown NMI' splats. Not pretty either. Thanks, tglx