From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86382C77B7C for ; Fri, 5 May 2023 16:35:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231615AbjEEQfv (ORCPT ); Fri, 5 May 2023 12:35:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34874 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233003AbjEEQfq (ORCPT ); Fri, 5 May 2023 12:35:46 -0400 Received: from mail-qk1-x733.google.com (mail-qk1-x733.google.com [IPv6:2607:f8b0:4864:20::733]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3E0AA191C8 for ; Fri, 5 May 2023 09:35:45 -0700 (PDT) Received: by mail-qk1-x733.google.com with SMTP id af79cd13be357-75178b082a5so91664085a.1 for ; Fri, 05 May 2023 09:35:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1683304543; x=1685896543; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=EpDnC2craXa71PgQOtY/fZMYoEOsNHjFRztUygfINOs=; b=nDSG83gvAxsI1q5Lf2iXaJn5kgyIzRmDZ88V/iuExJBMKAZ8ItX2XgzNS+HhRZBT4Z CYf8PwhRLPKs/vqjZugnfa99wmW2iBYXxN31X8IFnP+Q89BrEBQjJhUhpn7U2Ng5zkvx AMspwL2r6mcXM74FVBZXvqTKaYpGltgFnBVwI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683304543; x=1685896543; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EpDnC2craXa71PgQOtY/fZMYoEOsNHjFRztUygfINOs=; b=G4+1or+zvIJgX7UK27BmMEZ5ZsQvOgaUaCAKbRZjcp69pXQiLaA2Qso97N1hPS+hjb sV09bsWcWtS8lL+VUc4+Zz7g6cQSo/f2k1veNAR1dMWRQogOlICiwBHwk7NLLL+Y5a8H wJRpK8V81Vw6+BgSBmHGhIlE8WqyYWB8R41Z/K0KjFJLiyoMdaF7BFoWl0eOTYzm7mBW qQXrrS/hv/hzgQW/pqLfjZCkYVeHdmYuFn5AlOrnOb8+lLmSW7xHQaiIVTsGfWoGxQaQ xSdmptFybSUqQxkoyYHwsI+d13C9r0dAigxTv58e1bqw5wq2cH7TTN04tLUbQA2lUmRa bgBw== X-Gm-Message-State: AC+VfDxJ2Z1LF2Pl1rhyESX2OLnjk6cSHLvhc8Ne8t7gxfxd+7UtW5mg wN8rPUsDfflVGnWfHRPwbySX1RZQ583COgENsbc= X-Google-Smtp-Source: ACHHUZ6aVByVT71VYjq7a5kjTVRwv631Z0CUVIBPxn0M3ezeq8I87+gqWsD1WChnyrEnyV2C1Q7MaA== X-Received: by 2002:a05:6214:20a5:b0:5fd:7701:88c5 with SMTP id 5-20020a05621420a500b005fd770188c5mr2303408qvd.6.1683304542745; Fri, 05 May 2023 09:35:42 -0700 (PDT) Received: from mail-qt1-f174.google.com (mail-qt1-f174.google.com. [209.85.160.174]) by smtp.gmail.com with ESMTPSA id o16-20020a0ce410000000b005dd8b93459esm712383qvl.54.2023.05.05.09.35.39 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 05 May 2023 09:35:40 -0700 (PDT) Received: by mail-qt1-f174.google.com with SMTP id d75a77b69052e-3f0a2f8216fso1093211cf.0 for ; Fri, 05 May 2023 09:35:39 -0700 (PDT) X-Received: by 2002:a05:622a:1981:b0:3ed:210b:e698 with SMTP id u1-20020a05622a198100b003ed210be698mr317734qtc.7.1683304539483; Fri, 05 May 2023 09:35:39 -0700 (PDT) MIME-Version: 1.0 References: <20230504221349.1535669-1-dianders@chromium.org> <20230504151100.v4.13.I6bf789d21d0c3d75d382e7e51a804a7a51315f2c@changeid> In-Reply-To: From: Doug Anderson Date: Fri, 5 May 2023 09:35:25 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v4 13/17] watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs To: Nicholas Piggin Cc: Petr Mladek , Andrew Morton , Sumit Garg , Mark Rutland , Matthias Kaehlcke , Stephane Eranian , Stephen Boyd , ricardo.neri@intel.com, Tzung-Bi Shih , Lecopzer Chen , kgdb-bugreport@lists.sourceforge.net, Masayoshi Mizuma , Guenter Roeck , Pingfan Liu , Andi Kleen , Ian Rogers , linux-arm-kernel@lists.infradead.org, linux-perf-users@vger.kernel.org, ito-yuichi@fujitsu.com, Randy Dunlap , Chen-Yu Tsai , christophe.leroy@csgroup.eu, davem@davemloft.net, sparclinux@vger.kernel.org, mpe@ellerman.id.au, Will Deacon , ravi.v.shankar@intel.com, linuxppc-dev@lists.ozlabs.org, Marc Zyngier , Catalin Marinas , Daniel Thompson , Colin Cross Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-perf-users@vger.kernel.org Hi, On Thu, May 4, 2023 at 7:36=E2=80=AFPM Nicholas Piggin = wrote: > > On Fri May 5, 2023 at 8:13 AM AEST, Douglas Anderson wrote: > > From: Colin Cross > > > > Implement a hardlockup detector that doesn't doesn't need any extra > > arch-specific support code to detect lockups. Instead of using > > something arch-specific we will use the buddy system, where each CPU > > watches out for another one. Specifically, each CPU will use its > > softlockup hrtimer to check that the next CPU is processing hrtimer > > interrupts by verifying that a counter is increasing. > > Powerpc's watchdog has an SMP checker, did you see it? No, I wasn't aware of it. Interesting, it seems to basically enable both types of hardlockup detectors together. If that really catches more lockups, it seems like we could do the same thing for the buddy system. If people want, I don't think it would be very hard to make the buddy system _not_ exclusive of the perf system. Instead of having the buddy system implement the "weak" functions I could just call the buddy functions in the right places directly and leave the "weak" functions for a more traditional hardlockup detector to implement. Opinions? Maybe after all this lands, the powerpc watchdog could move to use the common code? As evidenced by this patch series, there's not really a reason for the SMP detection to be platform specific. > It's all to > all rather than buddy which makes it more complicated but arguably > bit better functionality. Can you come up with an example crash where the "all to all" would work better than the simple buddy system provided by this patch? It seems like they would be equivalent, but I could be missing something. Specifically they both need at least one non-locked-up CPU to detect a problem. If one or more CPUs is locked up then we'll always detect it. I suppose maybe you could provide a better error message at lockup time saying that several CPUs were locked up and that could be helpful. For now, I'd keep the current buddy system the way it is and if you want to provide a patch improving things to be "all-to-all" in the future that would be interesting to review. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4D1DCC77B7C for ; Fri, 5 May 2023 16:36:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:Subject:Message-ID:Date:From: In-Reply-To:References:MIME-Version:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=qP6w91K/TqOPvVmPJgz1CeZTMIfmTBr5E89wDb74xB4=; b=ihoFf9Si43RJv9 fCSdhLAtj6d91HUl432PNtmA6c/pmgoDKEdcJCYHAoCGkIWwc4G/LNFsxdcIB3EHDx42wpfMeysLS TzMKp6g/MpiA4lp0g/2SSvZs4PGzElGrU5IeAkBrTmVsLLlKknE4kXOnZ+yXtSbISIayW+f+LBx+P 4eDVq+RYx49ByaGlVpfxQKFqVIWWDyfHodNtQivxOIpv3QNED0swUYbRutbwz7PQAcmnP1PmEdVex 41HtGNT79ixZRC4gGKy09kZEc968tswi8aCb1xfAlTsFX3HLiJoQxMMAGIeSmPjpLd7Zs6D0OBq3L JWyKWBliFG1MobDtKYGQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1puyPJ-00BJm6-1N; Fri, 05 May 2023 16:35:45 +0000 Received: from mail-qv1-xf2c.google.com ([2607:f8b0:4864:20::f2c]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1puyPG-00BJlM-39 for linux-arm-kernel@lists.infradead.org; Fri, 05 May 2023 16:35:44 +0000 Received: by mail-qv1-xf2c.google.com with SMTP id 6a1803df08f44-61b71b7803bso9761716d6.1 for ; Fri, 05 May 2023 09:35:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1683304540; x=1685896540; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=EpDnC2craXa71PgQOtY/fZMYoEOsNHjFRztUygfINOs=; b=M0zzG05hEOUbIksWlG2BEVnUDuaX2uMdOSQSkcGk465+Ph/cz9+/wQB7zJ4qUnlP1D GB9AHZiBlKhbz+9IIDi7YnawbjrnWM8eSOh+5u3V6xIyFJiF9/dTZe+1wPfi+wMoGgrT tEsaw228wz/X+T6ZS5jj/2NtfGZKKYDam5YrQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683304540; x=1685896540; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EpDnC2craXa71PgQOtY/fZMYoEOsNHjFRztUygfINOs=; b=fuxOwqujTmCJMi90PvQfPl3jcLxLcaI/+Y2XluGjhK+0ziVxBL779S9P6XGmAH2BjR Uoxvd9htErq2KI+GZLpTLO1P/XWKR+vKI5tlOPaNywDGbW671Z1iJ/+AnT5FB2JqpjqM HgEwANHWD211pGG+CGVzS+s/1tR6eMD2UQj2yCohrqh33KMfo9+TNceYWYtd486jeAEN aMLOUgoVJ+GiJJHkShGEgkmrZ6ofL5qrQIGxFYtkrS9Pn3LZu9p18DCOJ/FRVlOTTYgD 8pUiLeSJ/OdfZGhu0CS0LSUBOuToRjZXGy0LNxTg+XqNGyrHNAP2UeteGCN0ttpNtjs0 09PA== X-Gm-Message-State: AC+VfDx76RW9D35NfM+vhyfril3pflyKpLdsdvrhytoiB6DYwkDL1c/r ZVZHA1Gs696eWXjRjoJ6C1w5EzGBlqnru1VrYls= X-Google-Smtp-Source: ACHHUZ7NYdI+xpSz4bf0UaSzrM/PmLViGBkM+0qwU1liVEHuRindbadC6Gjv50h+eI26u7qZ9PFr4Q== X-Received: by 2002:ad4:5be8:0:b0:61b:5faf:9a7e with SMTP id k8-20020ad45be8000000b0061b5faf9a7emr2472712qvc.42.1683304540564; Fri, 05 May 2023 09:35:40 -0700 (PDT) Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com. [209.85.160.180]) by smtp.gmail.com with ESMTPSA id a10-20020a05620a102a00b0074acdb873a7sm691975qkk.86.2023.05.05.09.35.40 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 05 May 2023 09:35:40 -0700 (PDT) Received: by mail-qt1-f180.google.com with SMTP id d75a77b69052e-3ef34c49cb9so1091341cf.1 for ; Fri, 05 May 2023 09:35:40 -0700 (PDT) X-Received: by 2002:a05:622a:1981:b0:3ed:210b:e698 with SMTP id u1-20020a05622a198100b003ed210be698mr317734qtc.7.1683304539483; Fri, 05 May 2023 09:35:39 -0700 (PDT) MIME-Version: 1.0 References: <20230504221349.1535669-1-dianders@chromium.org> <20230504151100.v4.13.I6bf789d21d0c3d75d382e7e51a804a7a51315f2c@changeid> In-Reply-To: From: Doug Anderson Date: Fri, 5 May 2023 09:35:25 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v4 13/17] watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs To: Nicholas Piggin Cc: Petr Mladek , Andrew Morton , Sumit Garg , Mark Rutland , Matthias Kaehlcke , Stephane Eranian , Stephen Boyd , ricardo.neri@intel.com, Tzung-Bi Shih , Lecopzer Chen , kgdb-bugreport@lists.sourceforge.net, Masayoshi Mizuma , Guenter Roeck , Pingfan Liu , Andi Kleen , Ian Rogers , linux-arm-kernel@lists.infradead.org, linux-perf-users@vger.kernel.org, ito-yuichi@fujitsu.com, Randy Dunlap , Chen-Yu Tsai , christophe.leroy@csgroup.eu, davem@davemloft.net, sparclinux@vger.kernel.org, mpe@ellerman.id.au, Will Deacon , ravi.v.shankar@intel.com, linuxppc-dev@lists.ozlabs.org, Marc Zyngier , Catalin Marinas , Daniel Thompson , Colin Cross X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230505_093543_041667_A7282B48 X-CRM114-Status: GOOD ( 25.01 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org SGksCgpPbiBUaHUsIE1heSA0LCAyMDIzIGF0IDc6MzbigK9QTSBOaWNob2xhcyBQaWdnaW4gPG5w aWdnaW5AZ21haWwuY29tPiB3cm90ZToKPgo+IE9uIEZyaSBNYXkgNSwgMjAyMyBhdCA4OjEzIEFN IEFFU1QsIERvdWdsYXMgQW5kZXJzb24gd3JvdGU6Cj4gPiBGcm9tOiBDb2xpbiBDcm9zcyA8Y2Ny b3NzQGFuZHJvaWQuY29tPgo+ID4KPiA+IEltcGxlbWVudCBhIGhhcmRsb2NrdXAgZGV0ZWN0b3Ig dGhhdCBkb2Vzbid0IGRvZXNuJ3QgbmVlZCBhbnkgZXh0cmEKPiA+IGFyY2gtc3BlY2lmaWMgc3Vw cG9ydCBjb2RlIHRvIGRldGVjdCBsb2NrdXBzLiBJbnN0ZWFkIG9mIHVzaW5nCj4gPiBzb21ldGhp bmcgYXJjaC1zcGVjaWZpYyB3ZSB3aWxsIHVzZSB0aGUgYnVkZHkgc3lzdGVtLCB3aGVyZSBlYWNo IENQVQo+ID4gd2F0Y2hlcyBvdXQgZm9yIGFub3RoZXIgb25lLiBTcGVjaWZpY2FsbHksIGVhY2gg Q1BVIHdpbGwgdXNlIGl0cwo+ID4gc29mdGxvY2t1cCBocnRpbWVyIHRvIGNoZWNrIHRoYXQgdGhl IG5leHQgQ1BVIGlzIHByb2Nlc3NpbmcgaHJ0aW1lcgo+ID4gaW50ZXJydXB0cyBieSB2ZXJpZnlp bmcgdGhhdCBhIGNvdW50ZXIgaXMgaW5jcmVhc2luZy4KPgo+IFBvd2VycGMncyB3YXRjaGRvZyBo YXMgYW4gU01QIGNoZWNrZXIsIGRpZCB5b3Ugc2VlIGl0PwoKTm8sIEkgd2Fzbid0IGF3YXJlIG9m IGl0LiBJbnRlcmVzdGluZywgaXQgc2VlbXMgdG8gYmFzaWNhbGx5IGVuYWJsZQpib3RoIHR5cGVz IG9mIGhhcmRsb2NrdXAgZGV0ZWN0b3JzIHRvZ2V0aGVyLiBJZiB0aGF0IHJlYWxseSBjYXRjaGVz Cm1vcmUgbG9ja3VwcywgaXQgc2VlbXMgbGlrZSB3ZSBjb3VsZCBkbyB0aGUgc2FtZSB0aGluZyBm b3IgdGhlIGJ1ZGR5CnN5c3RlbS4gSWYgcGVvcGxlIHdhbnQsIEkgZG9uJ3QgdGhpbmsgaXQgd291 bGQgYmUgdmVyeSBoYXJkIHRvIG1ha2UKdGhlIGJ1ZGR5IHN5c3RlbSBfbm90XyBleGNsdXNpdmUg b2YgdGhlIHBlcmYgc3lzdGVtLiBJbnN0ZWFkIG9mIGhhdmluZwp0aGUgYnVkZHkgc3lzdGVtIGlt cGxlbWVudCB0aGUgIndlYWsiIGZ1bmN0aW9ucyBJIGNvdWxkIGp1c3QgY2FsbCB0aGUKYnVkZHkg ZnVuY3Rpb25zIGluIHRoZSByaWdodCBwbGFjZXMgZGlyZWN0bHkgYW5kIGxlYXZlIHRoZSAid2Vh ayIKZnVuY3Rpb25zIGZvciBhIG1vcmUgdHJhZGl0aW9uYWwgaGFyZGxvY2t1cCBkZXRlY3RvciB0 byBpbXBsZW1lbnQuCk9waW5pb25zPwoKTWF5YmUgYWZ0ZXIgYWxsIHRoaXMgbGFuZHMsIHRoZSBw b3dlcnBjIHdhdGNoZG9nIGNvdWxkIG1vdmUgdG8gdXNlIHRoZQpjb21tb24gY29kZT8gQXMgZXZp ZGVuY2VkIGJ5IHRoaXMgcGF0Y2ggc2VyaWVzLCB0aGVyZSdzIG5vdCByZWFsbHkgYQpyZWFzb24g Zm9yIHRoZSBTTVAgZGV0ZWN0aW9uIHRvIGJlIHBsYXRmb3JtIHNwZWNpZmljLgoKCj4gSXQncyBh bGwgdG8KPiBhbGwgcmF0aGVyIHRoYW4gYnVkZHkgd2hpY2ggbWFrZXMgaXQgbW9yZSBjb21wbGlj YXRlZCBidXQgYXJndWFibHkKPiBiaXQgYmV0dGVyIGZ1bmN0aW9uYWxpdHkuCgpDYW4geW91IGNv bWUgdXAgd2l0aCBhbiBleGFtcGxlIGNyYXNoIHdoZXJlIHRoZSAiYWxsIHRvIGFsbCIgd291bGQK d29yayBiZXR0ZXIgdGhhbiB0aGUgc2ltcGxlIGJ1ZGR5IHN5c3RlbSBwcm92aWRlZCBieSB0aGlz IHBhdGNoPyBJdApzZWVtcyBsaWtlIHRoZXkgd291bGQgYmUgZXF1aXZhbGVudCwgYnV0IEkgY291 bGQgYmUgbWlzc2luZyBzb21ldGhpbmcuClNwZWNpZmljYWxseSB0aGV5IGJvdGggbmVlZCBhdCBs ZWFzdCBvbmUgbm9uLWxvY2tlZC11cCBDUFUgdG8gZGV0ZWN0IGEKcHJvYmxlbS4gSWYgb25lIG9y IG1vcmUgQ1BVcyBpcyBsb2NrZWQgdXAgdGhlbiB3ZSdsbCBhbHdheXMgZGV0ZWN0IGl0LgpJIHN1 cHBvc2UgbWF5YmUgeW91IGNvdWxkIHByb3ZpZGUgYSBiZXR0ZXIgZXJyb3IgbWVzc2FnZSBhdCBs b2NrdXAKdGltZSBzYXlpbmcgdGhhdCBzZXZlcmFsIENQVXMgd2VyZSBsb2NrZWQgdXAgYW5kIHRo YXQgY291bGQgYmUKaGVscGZ1bC4gRm9yIG5vdywgSSdkIGtlZXAgdGhlIGN1cnJlbnQgYnVkZHkg c3lzdGVtIHRoZSB3YXkgaXQgaXMgYW5kCmlmIHlvdSB3YW50IHRvIHByb3ZpZGUgYSBwYXRjaCBp bXByb3ZpbmcgdGhpbmdzIHRvIGJlICJhbGwtdG8tYWxsIiBpbgp0aGUgZnV0dXJlIHRoYXQgd291 bGQgYmUgaW50ZXJlc3RpbmcgdG8gcmV2aWV3LgoKX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX18KbGludXgtYXJtLWtlcm5lbCBtYWlsaW5nIGxpc3QKbGludXgt YXJtLWtlcm5lbEBsaXN0cy5pbmZyYWRlYWQub3JnCmh0dHA6Ly9saXN0cy5pbmZyYWRlYWQub3Jn L21haWxtYW4vbGlzdGluZm8vbGludXgtYXJtLWtlcm5lbAo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F1302C77B7F for ; Fri, 5 May 2023 16:37:01 +0000 (UTC) Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4QCbth2553z3fKF for ; Sat, 6 May 2023 02:37:00 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=chromium.org header.i=@chromium.org header.a=rsa-sha256 header.s=google header.b=Kl/H69+q; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=chromium.org (client-ip=2607:f8b0:4864:20::536; helo=mail-pg1-x536.google.com; envelope-from=dianders@chromium.org; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=chromium.org header.i=@chromium.org header.a=rsa-sha256 header.s=google header.b=Kl/H69+q; dkim-atps=neutral Received: from mail-pg1-x536.google.com (mail-pg1-x536.google.com [IPv6:2607:f8b0:4864:20::536]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4QCbsj4ZpQz3bW0 for ; Sat, 6 May 2023 02:36:07 +1000 (AEST) Received: by mail-pg1-x536.google.com with SMTP id 41be03b00d2f7-5144043d9d1so1358006a12.3 for ; Fri, 05 May 2023 09:36:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1683304562; x=1685896562; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=EpDnC2craXa71PgQOtY/fZMYoEOsNHjFRztUygfINOs=; b=Kl/H69+qiWpBGXQ74zpfcCM7yoeMNF0HUetUNsCOziEgzg4oZNsUjufnBCJaY0lRDs JVCQnZlsFUOs8L2OLxAB756zQr9tomaOqy9cgrko6dhSjJKLchInoaKe+4SAgdjSQMO/ 0TtzyRAMOo3O4gaV98JVsmeLkXaCAihpei+ok= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683304562; x=1685896562; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EpDnC2craXa71PgQOtY/fZMYoEOsNHjFRztUygfINOs=; b=cJvgtNIsEHLnjLZGdj2UhkjdkBlvBrScn983/Jngy6nFjNIzvbNQCIpQPp7805fUVh j0obyaNX+u5NhS1I9ZE39LB8z6rfg6MW5eHRRPP/Uw8gqfAzLOLSiMAjGVhdYzoYWekm kRLsIvK+tFUge/kg8RonywJ35vKd9l/g9WEYhPFpfWEPZTvN1hI4DY/mbXsFrc++KFYQ SvgXjY23ZMD+NW2CnisEJxvnxfzRW6hy+84h/pX4dj2BceTUq+mUrDjPp1IqkMGW2U9h sGBsdd0314o/uT/JyKe6RBioRojhn/14+dPMlvIqXDk6I2c1jA3PNXxx/AXNLeTmnGA7 HOqA== X-Gm-Message-State: AC+VfDyG/AbKCqHdP9RP5THZPuu5u/w6yi3YOE4E0K2lpdW7mzCJsdvU 5+QqhLpPiBUzNOMBK8G+MIGZXT+rclRUQEoYJe0= X-Google-Smtp-Source: ACHHUZ4+7/NWl13nU61lgOakxqqBYV77/HuvAjeTFZqSez1hPYhiNRyI09BXhOpup9hN5I1PCNd6tw== X-Received: by 2002:a17:90a:b007:b0:24d:f880:5192 with SMTP id x7-20020a17090ab00700b0024df8805192mr2019713pjq.19.1683304562076; Fri, 05 May 2023 09:36:02 -0700 (PDT) Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com. [209.85.214.169]) by smtp.gmail.com with ESMTPSA id v8-20020a17090a898800b0024e268985b1sm7147783pjn.9.2023.05.05.09.36.00 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 05 May 2023 09:36:01 -0700 (PDT) Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-1a7ff4a454eso603665ad.0 for ; Fri, 05 May 2023 09:36:00 -0700 (PDT) X-Received: by 2002:a05:622a:1981:b0:3ed:210b:e698 with SMTP id u1-20020a05622a198100b003ed210be698mr317734qtc.7.1683304539483; Fri, 05 May 2023 09:35:39 -0700 (PDT) MIME-Version: 1.0 References: <20230504221349.1535669-1-dianders@chromium.org> <20230504151100.v4.13.I6bf789d21d0c3d75d382e7e51a804a7a51315f2c@changeid> In-Reply-To: From: Doug Anderson Date: Fri, 5 May 2023 09:35:25 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v4 13/17] watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs To: Nicholas Piggin Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , Ian Rogers , Randy Dunlap , Lecopzer Chen , ravi.v.shankar@intel.com, kgdb-bugreport@lists.sourceforge.net, ricardo.neri@intel.com, Stephane Eranian , sparclinux@vger.kernel.org, Guenter Roeck , Will Deacon , Daniel Thompson , Andi Kleen , Chen-Yu Tsai , Matthias Kaehlcke , Catalin Marinas , Masayoshi Mizuma , Petr Mladek , Tzung-Bi Shih , Colin Cross , Stephen Boyd , Pingfan Liu , linux-arm-kernel@lists.infradead.org, Sumit Garg , ito-yuichi@fujitsu.com, linux-perf-users@vger.kernel.org, Marc Zyngier , Andrew Morton , linux ppc-dev@lists.ozlabs.org, davem@davemloft.net Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" Hi, On Thu, May 4, 2023 at 7:36=E2=80=AFPM Nicholas Piggin = wrote: > > On Fri May 5, 2023 at 8:13 AM AEST, Douglas Anderson wrote: > > From: Colin Cross > > > > Implement a hardlockup detector that doesn't doesn't need any extra > > arch-specific support code to detect lockups. Instead of using > > something arch-specific we will use the buddy system, where each CPU > > watches out for another one. Specifically, each CPU will use its > > softlockup hrtimer to check that the next CPU is processing hrtimer > > interrupts by verifying that a counter is increasing. > > Powerpc's watchdog has an SMP checker, did you see it? No, I wasn't aware of it. Interesting, it seems to basically enable both types of hardlockup detectors together. If that really catches more lockups, it seems like we could do the same thing for the buddy system. If people want, I don't think it would be very hard to make the buddy system _not_ exclusive of the perf system. Instead of having the buddy system implement the "weak" functions I could just call the buddy functions in the right places directly and leave the "weak" functions for a more traditional hardlockup detector to implement. Opinions? Maybe after all this lands, the powerpc watchdog could move to use the common code? As evidenced by this patch series, there's not really a reason for the SMP detection to be platform specific. > It's all to > all rather than buddy which makes it more complicated but arguably > bit better functionality. Can you come up with an example crash where the "all to all" would work better than the simple buddy system provided by this patch? It seems like they would be equivalent, but I could be missing something. Specifically they both need at least one non-locked-up CPU to detect a problem. If one or more CPUs is locked up then we'll always detect it. I suppose maybe you could provide a better error message at lockup time saying that several CPUs were locked up and that could be helpful. For now, I'd keep the current buddy system the way it is and if you want to provide a patch improving things to be "all-to-all" in the future that would be interesting to review.