From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A091C433EF for ; Fri, 8 Oct 2021 15:12:20 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 15EDD60F93 for ; Fri, 8 Oct 2021 15:12:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 15EDD60F93 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=Ut0/r5yK6GdLu9rbG5PvzohU2gzqozbAQOGDKCl0bGc=; b=X1PyS9WQnipxVz wk+yiDR92RX+6CQYng4P0OhEkCRXl+5/8Z5pbOeVn840ZnUgd5E0jAbwhwLvXT4dYol9Du+HhPDRJ QfSBXKHVDLmTGzWK4q4GxphHvIk4B0nPhZ6j0Emb67hviCWxt+ElWYb6EQUVL1LBsAUh1UZbRmw0+ Rmd4SPxnkGc8vYq4E8Fg0k6OH55RJ3Sx9gtu5mu6cHOT/4QpnOSyHM399KI0yscEQ0b/qDtLAV4JZ MWIU7vf5vROrU+VcBS0NLpRdVxMeOhVmjzm7OULCh4aHkJEe3PZpw8j7bnnem56EgwMgd0PnwJznH 9MVcbGqdYVYuTR4eofFA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mYrW2-003Cfj-2f; Fri, 08 Oct 2021 15:10:30 +0000 Received: from mail-pg1-x533.google.com ([2607:f8b0:4864:20::533]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mYrVy-003Cel-Af for linux-arm-kernel@lists.infradead.org; Fri, 08 Oct 2021 15:10:27 +0000 Received: by mail-pg1-x533.google.com with SMTP id q12so2672634pgq.12 for ; Fri, 08 Oct 2021 08:10:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=HrQMLNulJoJqcEgH5ag14CKeZC7bDRhJI4hv3UVlHGw=; b=iGj1rc4t8JyEZm69sGoZ+F4r1NMdloKM6khdy/QXwebqB3eHPyMae/nB4pQl4TMAYJ EN2bdIivFGDf6EcKTwM6hILirxhPkO+6nI7qePmt7IfnmkYn+u8f+1XL632RYsscalQ2 hQavFQVjAcrSi1QSYHJ2rrzoFQNYq9Fsbku9qJBdtMz7cPxijHLXn1gUBhZ4WL/4Tqfu GjC8L87rkDUSVp7GXke8sJZ3jf09HjRj1OEr+aWrlA9SHD/wzkrOE2EKbv+yCnOmr3p1 RlblX08SL2pmBBXV8qZJTRqw1JIA4WEmTsuR4aSS1nYGXqXiUF69vNwJjo/GSAvp+pf/ n6jA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=HrQMLNulJoJqcEgH5ag14CKeZC7bDRhJI4hv3UVlHGw=; b=cLfrq80Pofu9FH3gdlyIwMv0c3IcvbFL8s3oFpZ2uF02LjHwAF5p22KHwY8amNBTky QiPodA7U6dm54iZmXS4JnVITfmkEoYdhhpDh+cs1nWXoAn9O5Ly2FnzG4FcaJcCAp8Dd iAa5OLFhix4GANP5cM1iK3aRPMg/oXjEjdcE/CRDVz7f/Q3UtzTDawEEit1LJ2IBd3LI jxnzOnPUm+65/pAkh+IcnW10jVMxXI7b9J/kpUJBOUIJZBxDo8e1uLAmhgnex9HH4Q/P 8ghbyfschkQbq+K9w8S90GpUJIKnm/tEYpBKfJd4JIZnwP2tZHWb6gNYKEPBewMvjEXs 9zew== X-Gm-Message-State: AOAM532UR0GDAN1GmvtpqBl3ZEREcrCoWCC4Z0nIrZ8A8qGLwTKajzkv CC/ztphIEsKg8Oe2pmptZw== X-Google-Smtp-Source: ABdhPJwht56yioI0d2RMiaujCI60fKYPq/7RRjqtS9bLsnkYugcFncYYpz3sbEnzvgvsQOZdHqYTdQ== X-Received: by 2002:a63:454e:: with SMTP id u14mr5100266pgk.314.1633705824237; Fri, 08 Oct 2021 08:10:24 -0700 (PDT) Received: from piliu.users.ipa.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id 8sm2972551pfi.194.2021.10.08.08.10.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Oct 2021 08:10:23 -0700 (PDT) Date: Fri, 8 Oct 2021 23:10:12 +0800 From: Pingfan Liu To: Petr Mladek Cc: linux-kernel@vger.kernel.org, Sumit Garg , Catalin Marinas , Will Deacon , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Marc Zyngier , Kees Cook , Masahiro Yamada , Sami Tolvanen , Andrew Morton , Wang Qing , "Peter Zijlstra (Intel)" , Santosh Sivaraj , linux-arm-kernel@lists.infradead.org Subject: Re: [PATCHv2 3/4] kernel/watchdog: adapt the watchdog_hld interface for async model Message-ID: References: <20210923140951.35902-1-kernelfans@gmail.com> <20210923140951.35902-4-kernelfans@gmail.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20211008_081026_421990_C2C368AD X-CRM114-Status: GOOD ( 39.99 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, Oct 08, 2021 at 01:53:45PM +0800, Pingfan Liu wrote: > On Tue, Oct 05, 2021 at 09:03:17AM +0200, Petr Mladek wrote: > [...] > > > +static void lockup_detector_delay_init(struct work_struct *work); > > > +bool hld_detector_delay_initialized __initdata; > > > + > > > +struct wait_queue_head hld_detector_wait __initdata = > > > + __WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait); > > > + > > > +static struct work_struct detector_work __initdata = > > > + __WORK_INITIALIZER(detector_work, lockup_detector_delay_init); > > > + > > > +static void __init lockup_detector_delay_init(struct work_struct *work) > > > +{ > > > + int ret; > > > + > > > + wait_event(hld_detector_wait, hld_detector_delay_initialized); > > > + ret = watchdog_nmi_probe(); > > > + if (!ret) { > > > + nmi_watchdog_available = true; > > > + lockup_detector_setup(); > > > > Is it really safe to call the entire lockup_detector_setup() > > later? > > > > It manipulates also softlockup detector. And more importantly, > > the original call is before smp_init(). It means that it was > > running when only single CPU was on. > > > For the race analysis, lockup_detector_reconfigure() is on the centre stage. > Since proc_watchdog_update() can call lockup_detector_reconfigure() to > re-initialize both soft and hard lockup detector, so the race issue > should be already taken into consideration. > > > It seems that x86 has some problem with hardlockup detector as > > well. It later manipulates only the hardlockup detector. Also it uses > > cpus_read_lock() to prevent races with CPU hotplug, see > > fixup_ht_bug(). > > > Yes. But hardlockup_detector_perf_{stop,start}() can not meet the > requirement, since no perf_event is created yet. So there is no handy > interface to re-initialize hardlockup detector directly. > > > > > > + } else { > > > + WARN_ON(ret == -EBUSY); > > > + pr_info("Perf NMI watchdog permanently disabled\n"); > > > + } > > > +} > > > + > > > void __init lockup_detector_init(void) > > > { > > > + int ret; > > > + > > > if (tick_nohz_full_enabled()) > > > pr_info("Disabling watchdog on nohz_full cores by default\n"); > > > > > > cpumask_copy(&watchdog_cpumask, > > > housekeeping_cpumask(HK_FLAG_TIMER)); > > > > > > - if (!watchdog_nmi_probe()) > > > + ret = watchdog_nmi_probe(); > > > + if (!ret) > > > nmi_watchdog_available = true; > > > + else if (ret == -EBUSY) > > > + queue_work_on(smp_processor_id(), system_wq, &detector_work); > > > > IMHO, this is not acceptable. It will block one worker until someone > > wakes it. Only arm64 will have a code to wake up the work and only > > when pmu is successfully initialized. In all other cases, the worker > > will stay blocked forever. > > > What about consider -EBUSY and hld_detector_delay_initialized as a unit? ^^^ unity > If watchdog_nmi_probe() returns -EBUSY, then > set the state of ld_detector_delay_initialized as "waiting", and then moved to state "finished". > > And at the end of do_initcalls(), check the state is "finished". If not, > then throw a warning and wake up the worker. > > > The right solution is to do it the other way. Queue the work > > from arm64-specific code when armv8_pmu_driver_init() succeeded. > > > Could it be better if watchdog can provide a common framework for future > extension instead of arch specific? The 2nd argument is to avoid the > message "Perf NMI watchdog permanently disabled" while later enabling > it. (Please see > lockup_detector_init()->watchdog_nmi_probe()->hardlockup_detector_perf_init(), > but if providing arch specific probe method, it can be avoided) > Sorry for poor expression. I have not explained it completely for the second point. Since using arch specific watchdog_nmi_probe() to avoid misleading message "Perf NMI watchdog permanently disabled", then -EBUSY should be returned. And from watchdog level, it should know how to handle error, that is to say queue_work_on(smp_processor_id(), system_wq, &detector_work). Thanks, Pingfan > > Also I suggest to flush the work to make sure that it is finished > > before __init code gets removed. > > > Good point, and very interesting. I will look into it. > > > > > The open question is what code the work will call. As mentioned > > above, I am not sure that lockup_detector_delay_init() is safe. > > IMHO, we need to manipulate only hardlockup detector and > > we have to serialize it against CPU hotplug. > > > As explained ahead, it has already consider the race against CPU > hotplug. > > Thanks, > > Pingfan > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel