From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B364CC77B7A for ; Fri, 19 May 2023 17:24:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231169AbjESRYG (ORCPT ); Fri, 19 May 2023 13:24:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59110 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230430AbjESRXx (ORCPT ); Fri, 19 May 2023 13:23:53 -0400 Received: from mail-il1-x132.google.com (mail-il1-x132.google.com [IPv6:2607:f8b0:4864:20::132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A1DBF19B0 for ; Fri, 19 May 2023 10:23:26 -0700 (PDT) Received: by mail-il1-x132.google.com with SMTP id e9e14a558f8ab-33823abc8a4so9154235ab.3 for ; Fri, 19 May 2023 10:23:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1684516955; x=1687108955; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=GYXxQ1/HyN04Wnp4N53Q5scLF2D44b2WUzFjxPSR5OI=; b=OEWsRnT2j6y++U3iH7PWgPz0kLUuzTOydXIKrJI21fNqomzsZoJ5gRBiC3I2HIUQBc yWOADd9aL/9I5Z+ROcQrKuuK9SBtWWaltSxR0Llzgp/i+Nacu8JM0ghP70fSeExyBFn3 0sBRQhYUBZHBATf87cB6nUcDyDhC518p9Zq8Y= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684516955; x=1687108955; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GYXxQ1/HyN04Wnp4N53Q5scLF2D44b2WUzFjxPSR5OI=; b=jCg7uFshtsUtiYgYJm4a8pLRqAuCloGJL8ymhYG5AAR/hsg32S8lfuctXyQkMv77Gt m6jNAfDAojhsojI+ozf/YkBZ2yB3HKGgcEOoHxPbjXBb2BKwN2Y7msM/q06C/pmo1ZoL hwW8I4nfcOZmq9XrP1ai71C8LaIfp9PtYnpdGwtFjelBokQjfsi/5s1F4lmkJR52vOpg nEcKHMtegP+wikl3RtpfhdYzdH3vISz6/wBIEyIsXJHcgUIxulnCJdLmBkhERez2/kGj kg0PqR3AfVnTeOIGaKEah/jdhT5IBWMT7VFxGZgIsmzGtmCV6xcVMQqZ2ONCJCOWrofR O0YA== X-Gm-Message-State: AC+VfDzlyzl1lLn1HvclmsNe3dhC3Lg0vPgiA3CQUROEhxk91bGrzqAx K81iMZ5cFBAbtsKJ67Go7Oxx7DFgjT5Zdsts8+8= X-Google-Smtp-Source: ACHHUZ6qeQujAR+0Sz3SxsqNh7veU2/mR/XlFgcLPPD3QfTqIiWpvQ2KYpHXinzSUSzJKAJtoUYvig== X-Received: by 2002:a92:680b:0:b0:32b:665d:c816 with SMTP id d11-20020a92680b000000b0032b665dc816mr1492342ilc.28.1684516954786; Fri, 19 May 2023 10:22:34 -0700 (PDT) Received: from mail-il1-f176.google.com (mail-il1-f176.google.com. [209.85.166.176]) by smtp.gmail.com with ESMTPSA id w1-20020a02b0c1000000b0041ac9dd35f0sm27623jah.132.2023.05.19.10.22.33 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 19 May 2023 10:22:33 -0700 (PDT) Received: by mail-il1-f176.google.com with SMTP id e9e14a558f8ab-335d6260e9bso6205ab.1 for ; Fri, 19 May 2023 10:22:33 -0700 (PDT) X-Received: by 2002:a05:6e02:1c8c:b0:331:8db2:5d13 with SMTP id w12-20020a056e021c8c00b003318db25d13mr246620ill.8.1684516953136; Fri, 19 May 2023 10:22:33 -0700 (PDT) MIME-Version: 1.0 References: <20230504221349.1535669-1-dianders@chromium.org> <20230504151100.v4.10.I00dfd6386ee00da25bf26d140559a41339b53e57@changeid> In-Reply-To: From: Doug Anderson Date: Fri, 19 May 2023 10:22:21 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v4 10/17] watchdog/hardlockup: Move perf hardlockup watchdog petting to watchdog.c To: Petr Mladek Cc: Andrew Morton , Sumit Garg , Mark Rutland , Matthias Kaehlcke , Stephane Eranian , Stephen Boyd , ricardo.neri@intel.com, Tzung-Bi Shih , Lecopzer Chen , kgdb-bugreport@lists.sourceforge.net, Masayoshi Mizuma , Guenter Roeck , Pingfan Liu , Andi Kleen , Ian Rogers , linux-arm-kernel@lists.infradead.org, linux-perf-users@vger.kernel.org, ito-yuichi@fujitsu.com, Randy Dunlap , Chen-Yu Tsai , christophe.leroy@csgroup.eu, davem@davemloft.net, sparclinux@vger.kernel.org, mpe@ellerman.id.au, Will Deacon , ravi.v.shankar@intel.com, npiggin@gmail.com, linuxppc-dev@lists.ozlabs.org, Marc Zyngier , Catalin Marinas , Daniel Thompson Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-perf-users@vger.kernel.org Hi, On Thu, May 11, 2023 at 8:46=E2=80=AFAM Petr Mladek wrot= e: > > > @@ -111,6 +125,11 @@ static void watchdog_hardlockup_interrupt_count(vo= id) > > > > void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) > > { > > + if (__this_cpu_read(watchdog_hardlockup_touch)) { > > + __this_cpu_write(watchdog_hardlockup_touch, false); > > + return; > > + } > > If we clear watchdog_hardlockup_touch() here then > watchdog_hardlockup_check() won't be called yet another > watchdog_hrtimer_sample_threshold perior. > > It means that any touch will cause ignoring one full period. > The is_hardlockup() check will be done after full two periods. > > It is not ideal, see below. > > > + > > /* > > * Check for a hardlockup by making sure the CPU's timer > > * interrupt is incrementing. The timer interrupt should have > > diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c > > index 9be90b2a2ea7..547917ebd5d3 100644 > > --- a/kernel/watchdog_perf.c > > +++ b/kernel/watchdog_perf.c > > @@ -112,11 +98,6 @@ static void watchdog_overflow_callback(struct perf_= event *event, > > /* Ensure the watchdog never gets throttled */ > > event->hw.interrupts =3D 0; > > > > - if (__this_cpu_read(watchdog_nmi_touch) =3D=3D true) { > > - __this_cpu_write(watchdog_nmi_touch, false); > > - return; > > - } > > The original code looks wrong. arch_touch_nmi_watchdog() caused > skipping only one period of the perf event. > > I would expect that it caused restarting the period, > something like: > > if (__this_cpu_read(watchdog_nmi_touch) =3D=3D true) { > /* > * Restart the period after which the interrupt > * counter is checked. > */ > __this_cpu_write(nmi_rearmed, 0); > __this_cpu_write(last_timestamp, now); > __this_cpu_write(watchdog_nmi_touch, false); > return; > } > > By other words, we should restart the period in the very next perf > event after the watchdog was touched. > > That said, the new code looks better than the original. > IMHO, the original code was prone to false positives. I had a little bit of a hard time following, but I _think_ the "tl;dr" of all the above is that my change is fine. If I misunderstood, please yell. > Best Regards, > Petr > > PS: It might be worth fixing this problem in a separate patch at the > beginning of this patchset. It might be a candidate for stable > backports. Done. It's now its own patch and early in the series. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A9CEBC77B7A for ; Fri, 19 May 2023 17:23:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:Subject:Message-ID:Date:From: In-Reply-To:References:MIME-Version:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=o2g4VFilDKuJ5A0rG8McosvdA/pxrwMUJUMmouY1qgI=; b=RbQ8y+ibOsDhvX cdMjMFNxwq3kT4//T2CE8o7Twxip56kNo/YHRsh6VWTo2cErqx5gi+Sb50mlC+qMygD8Cv4FAfDem ZIjhtF/6XAmxtuET9nc8Q6HzgtqTPWWnjYnbjYtco8d6M++GNGCKyWO5b2YNXFqAi093EKn0hbrDu D4ss+qoQg5F8DLeEFrZLtLZ6nb/889qsA3ictYt8uahw6hkeGSgHtZ23MVW7vT4l+aH8eVQpU5yJ9 JanRmVwZuUpyYGc0LOr7Cqhp6l+/DxMTsD1UDD/g57AtlBfYfLRcIRAmLhjabJK1rAivhnTDeTDj5 Jkg2NC0KMzffq9JRXc2A==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1q03oP-00GqmN-1B; Fri, 19 May 2023 17:22:41 +0000 Received: from mail-io1-xd32.google.com ([2607:f8b0:4864:20::d32]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1q03oM-00Gqlb-2N for linux-arm-kernel@lists.infradead.org; Fri, 19 May 2023 17:22:40 +0000 Received: by mail-io1-xd32.google.com with SMTP id ca18e2360f4ac-76c5749b10fso93505739f.2 for ; Fri, 19 May 2023 10:22:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1684516956; x=1687108956; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=GYXxQ1/HyN04Wnp4N53Q5scLF2D44b2WUzFjxPSR5OI=; b=HhxOEHQvPJW7YaQWF+JRUYK6iNexrFD3Flj0W+Z6JvtBid1E+GP7rFOO7PEs445CCg NnSosj/FFURW43iX89fywNK5+lFq50NPmhGGMzUSbh0Qd5ZYbNDFuxBnJ24zupyYvD7w m5pq+LPd2CWl2HEUc5XXI8JyEF2V0ZNGuwnos= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684516956; x=1687108956; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GYXxQ1/HyN04Wnp4N53Q5scLF2D44b2WUzFjxPSR5OI=; b=XsyDQ3iJ6QdUhCXrit2WNcF4rvEg82ih2X+o8fvx9NHgXhDeGKKqWnWsZuR2s5QS3w Z4mwAi4EoWp905FiQelCIQvmQjvZiWe4e8RFlceblY+dG/SWxggAnqj3N/7LQ+hYv3/U S8q0ziYPiZDoLViClw6gO9wwFAMEumcrzwmj3/tE55E7Wcmprj5JfL3RVN4fDPPiPSnk aD5y2BzlsMlJl3prRrTjh326zyYllE5DB8vpzxeyPeFMgi8v7kX7tTl0zHgO2+kWRXUE up/wJeV8pSxGaDy5q/Xzs/zAUnNOZR4LicAvo82KU9oDVLdARY2TNTq0RuMs4X/ZXaaz W0/g== X-Gm-Message-State: AC+VfDwI0P9HSAG93CdYLsj1VW7gTuljIU0wcVScsDbtgDdUPDpUwrUi xUW9flvndO3ix6bID7PLALEVSrJFkybnQnWeN4U= X-Google-Smtp-Source: ACHHUZ4/FMPPeUa5a7bc902l0N0oTt9VcC33W2vVGwMxsSkqt3eAdlEDChBOtKkaC9DPzLMQcCh5Ig== X-Received: by 2002:a5e:dd47:0:b0:76c:6469:bb20 with SMTP id u7-20020a5edd47000000b0076c6469bb20mr1507885iop.7.1684516956168; Fri, 19 May 2023 10:22:36 -0700 (PDT) Received: from mail-il1-f174.google.com (mail-il1-f174.google.com. [209.85.166.174]) by smtp.gmail.com with ESMTPSA id s26-20020a02b15a000000b0040fc9317650sm1300892jah.62.2023.05.19.10.22.33 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 19 May 2023 10:22:33 -0700 (PDT) Received: by mail-il1-f174.google.com with SMTP id e9e14a558f8ab-335d6260e9bso6185ab.1 for ; Fri, 19 May 2023 10:22:33 -0700 (PDT) X-Received: by 2002:a05:6e02:1c8c:b0:331:8db2:5d13 with SMTP id w12-20020a056e021c8c00b003318db25d13mr246620ill.8.1684516953136; Fri, 19 May 2023 10:22:33 -0700 (PDT) MIME-Version: 1.0 References: <20230504221349.1535669-1-dianders@chromium.org> <20230504151100.v4.10.I00dfd6386ee00da25bf26d140559a41339b53e57@changeid> In-Reply-To: From: Doug Anderson Date: Fri, 19 May 2023 10:22:21 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v4 10/17] watchdog/hardlockup: Move perf hardlockup watchdog petting to watchdog.c To: Petr Mladek Cc: Andrew Morton , Sumit Garg , Mark Rutland , Matthias Kaehlcke , Stephane Eranian , Stephen Boyd , ricardo.neri@intel.com, Tzung-Bi Shih , Lecopzer Chen , kgdb-bugreport@lists.sourceforge.net, Masayoshi Mizuma , Guenter Roeck , Pingfan Liu , Andi Kleen , Ian Rogers , linux-arm-kernel@lists.infradead.org, linux-perf-users@vger.kernel.org, ito-yuichi@fujitsu.com, Randy Dunlap , Chen-Yu Tsai , christophe.leroy@csgroup.eu, davem@davemloft.net, sparclinux@vger.kernel.org, mpe@ellerman.id.au, Will Deacon , ravi.v.shankar@intel.com, npiggin@gmail.com, linuxppc-dev@lists.ozlabs.org, Marc Zyngier , Catalin Marinas , Daniel Thompson X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230519_102238_774937_56A1D7E7 X-CRM114-Status: GOOD ( 29.06 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org SGksCgpPbiBUaHUsIE1heSAxMSwgMjAyMyBhdCA4OjQ24oCvQU0gUGV0ciBNbGFkZWsgPHBtbGFk ZWtAc3VzZS5jb20+IHdyb3RlOgo+Cj4gPiBAQCAtMTExLDYgKzEyNSwxMSBAQCBzdGF0aWMgdm9p ZCB3YXRjaGRvZ19oYXJkbG9ja3VwX2ludGVycnVwdF9jb3VudCh2b2lkKQo+ID4KPiA+ICB2b2lk IHdhdGNoZG9nX2hhcmRsb2NrdXBfY2hlY2sodW5zaWduZWQgaW50IGNwdSwgc3RydWN0IHB0X3Jl Z3MgKnJlZ3MpCj4gPiAgewo+ID4gKyAgICAgaWYgKF9fdGhpc19jcHVfcmVhZCh3YXRjaGRvZ19o YXJkbG9ja3VwX3RvdWNoKSkgewo+ID4gKyAgICAgICAgICAgICBfX3RoaXNfY3B1X3dyaXRlKHdh dGNoZG9nX2hhcmRsb2NrdXBfdG91Y2gsIGZhbHNlKTsKPiA+ICsgICAgICAgICAgICAgcmV0dXJu Owo+ID4gKyAgICAgfQo+Cj4gSWYgd2UgY2xlYXIgd2F0Y2hkb2dfaGFyZGxvY2t1cF90b3VjaCgp IGhlcmUgdGhlbgo+IHdhdGNoZG9nX2hhcmRsb2NrdXBfY2hlY2soKSB3b24ndCBiZSBjYWxsZWQg eWV0IGFub3RoZXIKPiB3YXRjaGRvZ19ocnRpbWVyX3NhbXBsZV90aHJlc2hvbGQgcGVyaW9yLgo+ Cj4gSXQgbWVhbnMgdGhhdCBhbnkgdG91Y2ggd2lsbCBjYXVzZSBpZ25vcmluZyBvbmUgZnVsbCBw ZXJpb2QuCj4gVGhlIGlzX2hhcmRsb2NrdXAoKSBjaGVjayB3aWxsIGJlIGRvbmUgYWZ0ZXIgZnVs bCB0d28gcGVyaW9kcy4KPgo+IEl0IGlzIG5vdCBpZGVhbCwgc2VlIGJlbG93Lgo+Cj4gPiArCj4g PiAgICAgICAvKgo+ID4gICAgICAgICogQ2hlY2sgZm9yIGEgaGFyZGxvY2t1cCBieSBtYWtpbmcg c3VyZSB0aGUgQ1BVJ3MgdGltZXIKPiA+ICAgICAgICAqIGludGVycnVwdCBpcyBpbmNyZW1lbnRp bmcuIFRoZSB0aW1lciBpbnRlcnJ1cHQgc2hvdWxkIGhhdmUKPiA+IGRpZmYgLS1naXQgYS9rZXJu ZWwvd2F0Y2hkb2dfcGVyZi5jIGIva2VybmVsL3dhdGNoZG9nX3BlcmYuYwo+ID4gaW5kZXggOWJl OTBiMmEyZWE3Li41NDc5MTdlYmQ1ZDMgMTAwNjQ0Cj4gPiAtLS0gYS9rZXJuZWwvd2F0Y2hkb2df cGVyZi5jCj4gPiArKysgYi9rZXJuZWwvd2F0Y2hkb2dfcGVyZi5jCj4gPiBAQCAtMTEyLDExICs5 OCw2IEBAIHN0YXRpYyB2b2lkIHdhdGNoZG9nX292ZXJmbG93X2NhbGxiYWNrKHN0cnVjdCBwZXJm X2V2ZW50ICpldmVudCwKPiA+ICAgICAgIC8qIEVuc3VyZSB0aGUgd2F0Y2hkb2cgbmV2ZXIgZ2V0 cyB0aHJvdHRsZWQgKi8KPiA+ICAgICAgIGV2ZW50LT5ody5pbnRlcnJ1cHRzID0gMDsKPiA+Cj4g PiAtICAgICBpZiAoX190aGlzX2NwdV9yZWFkKHdhdGNoZG9nX25taV90b3VjaCkgPT0gdHJ1ZSkg ewo+ID4gLSAgICAgICAgICAgICBfX3RoaXNfY3B1X3dyaXRlKHdhdGNoZG9nX25taV90b3VjaCwg ZmFsc2UpOwo+ID4gLSAgICAgICAgICAgICByZXR1cm47Cj4gPiAtICAgICB9Cj4KPiBUaGUgb3Jp Z2luYWwgY29kZSBsb29rcyB3cm9uZy4gYXJjaF90b3VjaF9ubWlfd2F0Y2hkb2coKSBjYXVzZWQK PiBza2lwcGluZyBvbmx5IG9uZSBwZXJpb2Qgb2YgdGhlIHBlcmYgZXZlbnQuCj4KPiBJIHdvdWxk IGV4cGVjdCB0aGF0IGl0IGNhdXNlZCByZXN0YXJ0aW5nIHRoZSBwZXJpb2QsCj4gc29tZXRoaW5n IGxpa2U6Cj4KPiAgICAgICAgIGlmIChfX3RoaXNfY3B1X3JlYWQod2F0Y2hkb2dfbm1pX3RvdWNo KSA9PSB0cnVlKSB7Cj4gICAgICAgICAgICAgICAgIC8qCj4gICAgICAgICAgICAgICAgICAqIFJl c3RhcnQgdGhlIHBlcmlvZCBhZnRlciB3aGljaCB0aGUgaW50ZXJydXB0Cj4gICAgICAgICAgICAg ICAgICAqIGNvdW50ZXIgaXMgY2hlY2tlZC4KPiAgICAgICAgICAgICAgICAgICovCj4gICAgICAg ICAgICAgICAgIF9fdGhpc19jcHVfd3JpdGUobm1pX3JlYXJtZWQsIDApOwo+ICAgICAgICAgICAg ICAgICBfX3RoaXNfY3B1X3dyaXRlKGxhc3RfdGltZXN0YW1wLCBub3cpOwo+ICAgICAgICAgICAg ICAgICBfX3RoaXNfY3B1X3dyaXRlKHdhdGNoZG9nX25taV90b3VjaCwgZmFsc2UpOwo+ICAgICAg ICAgICAgICAgICByZXR1cm47Cj4gICAgICAgICB9Cj4KPiBCeSBvdGhlciB3b3Jkcywgd2Ugc2hv dWxkIHJlc3RhcnQgdGhlIHBlcmlvZCBpbiB0aGUgdmVyeSBuZXh0IHBlcmYKPiBldmVudCBhZnRl ciB0aGUgd2F0Y2hkb2cgd2FzIHRvdWNoZWQuCj4KPiBUaGF0IHNhaWQsIHRoZSBuZXcgY29kZSBs b29rcyBiZXR0ZXIgdGhhbiB0aGUgb3JpZ2luYWwuCj4gSU1ITywgdGhlIG9yaWdpbmFsIGNvZGUg d2FzIHByb25lIHRvIGZhbHNlIHBvc2l0aXZlcy4KCkkgaGFkIGEgbGl0dGxlIGJpdCBvZiBhIGhh cmQgdGltZSBmb2xsb3dpbmcsIGJ1dCBJIF90aGlua18gdGhlICJ0bDtkciIKb2YgYWxsIHRoZSBh Ym92ZSBpcyB0aGF0IG15IGNoYW5nZSBpcyBmaW5lLiBJZiBJIG1pc3VuZGVyc3Rvb2QsIHBsZWFz ZQp5ZWxsLgoKCj4gQmVzdCBSZWdhcmRzLAo+IFBldHIKPgo+IFBTOiBJdCBtaWdodCBiZSB3b3J0 aCBmaXhpbmcgdGhpcyBwcm9ibGVtIGluIGEgc2VwYXJhdGUgcGF0Y2ggYXQgdGhlCj4gICAgIGJl Z2lubmluZyBvZiB0aGlzIHBhdGNoc2V0LiBJdCBtaWdodCBiZSBhIGNhbmRpZGF0ZSBmb3Igc3Rh YmxlCj4gICAgIGJhY2twb3J0cy4KCkRvbmUuIEl0J3Mgbm93IGl0cyBvd24gcGF0Y2ggYW5kIGVh cmx5IGluIHRoZSBzZXJpZXMuCgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fXwpsaW51eC1hcm0ta2VybmVsIG1haWxpbmcgbGlzdApsaW51eC1hcm0ta2VybmVs QGxpc3RzLmluZnJhZGVhZC5vcmcKaHR0cDovL2xpc3RzLmluZnJhZGVhZC5vcmcvbWFpbG1hbi9s aXN0aW5mby9saW51eC1hcm0ta2VybmVsCg== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D8660C7EE29 for ; Fri, 19 May 2023 17:38:26 +0000 (UTC) Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4QNDb5066jz3fN4 for ; Sat, 20 May 2023 03:38:25 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=chromium.org header.i=@chromium.org header.a=rsa-sha256 header.s=google header.b=a70enhXA; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=chromium.org (client-ip=2607:f8b0:4864:20::530; helo=mail-pg1-x530.google.com; envelope-from=dianders@chromium.org; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=chromium.org header.i=@chromium.org header.a=rsa-sha256 header.s=google header.b=a70enhXA; dkim-atps=neutral Received: from mail-pg1-x530.google.com (mail-pg1-x530.google.com [IPv6:2607:f8b0:4864:20::530]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4QNDFF68yqz3fWF for ; Sat, 20 May 2023 03:22:57 +1000 (AEST) Received: by mail-pg1-x530.google.com with SMTP id 41be03b00d2f7-51b33c72686so2403451a12.1 for ; Fri, 19 May 2023 10:22:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1684516975; x=1687108975; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=GYXxQ1/HyN04Wnp4N53Q5scLF2D44b2WUzFjxPSR5OI=; b=a70enhXAgF3t2Xd1czzveBDJABzkpPkjTw1RNVsKG5pGvBNv6mrUDp9a4oCXJqmSlh JJNMY4KENqzEMp2JbRRQgyhslsTvMR1Ge2stM8rf8nVxv3JJRGJ8syIPREoJ/8CRTxrc gU1KczXB2Hpcu4A74U3nReGUXgH6lel+BU0Fw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684516975; x=1687108975; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GYXxQ1/HyN04Wnp4N53Q5scLF2D44b2WUzFjxPSR5OI=; b=DX/lk89YIQZRA9B6Iw9o29KCyO3/T/QhowQm9v6RoE0Z71Ov/DI8ZBuIlunORXcVvX sIk7WT/LZz1kvjvj6twMbeBDjngtIRGbDQ5jfkcXPygkmgZUPKknlMWTXzf1i4Z49rn0 s2QDUDFb36c+WRr0RSZGRz09F+2Tkhk4hpIrO6UHA/sUNzunfs671mQkMVibWHk6V1t7 Ek8baEquDXXMxk2RAg2DyZwjwzlsTlpawYFtAGKs/b/12PEyGS+6HB080V8D2g+90RZd 1lSoJI0WTpwW1ZeBlXV1AzfyMMdpo9ZWUpc/ZGdEjppojhpAD6iD8iKyjaAkmmUoUYjh VqBQ== X-Gm-Message-State: AC+VfDyEgg9pJ365OW9y+Rtvzh3l7exoU5DuXlFOt63ygQIMFYtTq8QC GQvIqVH/SuliMlQL9ODx+Ue/igXFaYG9MrrMTIU= X-Google-Smtp-Source: ACHHUZ46+Csrh+wrVldaJ8uIxeb/WFJHE27+ztLUoxMt/R/A7r+lRNewf/ouu+MgfKkf4eTNiyyLhA== X-Received: by 2002:a17:90a:c7d2:b0:250:c118:cb4f with SMTP id gf18-20020a17090ac7d200b00250c118cb4fmr2857912pjb.34.1684516974811; Fri, 19 May 2023 10:22:54 -0700 (PDT) Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com. [209.85.214.177]) by smtp.gmail.com with ESMTPSA id co2-20020a17090afe8200b0024de3dff70esm1698374pjb.56.2023.05.19.10.22.54 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 19 May 2023 10:22:54 -0700 (PDT) Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-1ae3f74c98bso7795ad.1 for ; Fri, 19 May 2023 10:22:54 -0700 (PDT) X-Received: by 2002:a05:6e02:1c8c:b0:331:8db2:5d13 with SMTP id w12-20020a056e021c8c00b003318db25d13mr246620ill.8.1684516953136; Fri, 19 May 2023 10:22:33 -0700 (PDT) MIME-Version: 1.0 References: <20230504221349.1535669-1-dianders@chromium.org> <20230504151100.v4.10.I00dfd6386ee00da25bf26d140559a41339b53e57@changeid> In-Reply-To: From: Doug Anderson Date: Fri, 19 May 2023 10:22:21 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v4 10/17] watchdog/hardlockup: Move perf hardlockup watchdog petting to watchdog.c To: Petr Mladek Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , Ian Rogers , Randy Dunlap , Lecopzer Chen , kgdb-bugreport@lists.sourceforge.net, ricardo.neri@intel.com, Stephane Eranian , sparclinux@vger.kernel.org, Guenter Roeck , Will Deacon , Daniel Thompson , Andi Kleen , Chen-Yu Tsai , Matthias Kaehlcke , Catalin Marinas , Masayoshi Mizuma , ravi.v.shankar@intel.com, Tzung-Bi Shih , npiggin@gmail.com, Stephen Boyd , Pingfan Liu , linux-arm-kernel@lists.infradead.org, Sumit Garg , ito-yuichi@fujitsu.com, linux-perf-users@vger.kernel.org, Marc Zyngier , Andrew Morton , linuxppc-dev@lists.ozlabs.org, davem@davemloft.net Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" Hi, On Thu, May 11, 2023 at 8:46=E2=80=AFAM Petr Mladek wrot= e: > > > @@ -111,6 +125,11 @@ static void watchdog_hardlockup_interrupt_count(vo= id) > > > > void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) > > { > > + if (__this_cpu_read(watchdog_hardlockup_touch)) { > > + __this_cpu_write(watchdog_hardlockup_touch, false); > > + return; > > + } > > If we clear watchdog_hardlockup_touch() here then > watchdog_hardlockup_check() won't be called yet another > watchdog_hrtimer_sample_threshold perior. > > It means that any touch will cause ignoring one full period. > The is_hardlockup() check will be done after full two periods. > > It is not ideal, see below. > > > + > > /* > > * Check for a hardlockup by making sure the CPU's timer > > * interrupt is incrementing. The timer interrupt should have > > diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c > > index 9be90b2a2ea7..547917ebd5d3 100644 > > --- a/kernel/watchdog_perf.c > > +++ b/kernel/watchdog_perf.c > > @@ -112,11 +98,6 @@ static void watchdog_overflow_callback(struct perf_= event *event, > > /* Ensure the watchdog never gets throttled */ > > event->hw.interrupts =3D 0; > > > > - if (__this_cpu_read(watchdog_nmi_touch) =3D=3D true) { > > - __this_cpu_write(watchdog_nmi_touch, false); > > - return; > > - } > > The original code looks wrong. arch_touch_nmi_watchdog() caused > skipping only one period of the perf event. > > I would expect that it caused restarting the period, > something like: > > if (__this_cpu_read(watchdog_nmi_touch) =3D=3D true) { > /* > * Restart the period after which the interrupt > * counter is checked. > */ > __this_cpu_write(nmi_rearmed, 0); > __this_cpu_write(last_timestamp, now); > __this_cpu_write(watchdog_nmi_touch, false); > return; > } > > By other words, we should restart the period in the very next perf > event after the watchdog was touched. > > That said, the new code looks better than the original. > IMHO, the original code was prone to false positives. I had a little bit of a hard time following, but I _think_ the "tl;dr" of all the above is that my change is fine. If I misunderstood, please yell. > Best Regards, > Petr > > PS: It might be worth fixing this problem in a separate patch at the > beginning of this patchset. It might be a candidate for stable > backports. Done. It's now its own patch and early in the series.